Selectivity Estimation of Range Queries in Data Streams using Micro-Clustering

Selectivity Estimation of Range Queries in Data Streams using Micro-Clustering

Sudhanshu Gupta and Deepak Garg

Computer Science and Engineering Department, Thapar University, India

 

Abstract: Selectivity estimation is an important task for query optimization. The common data mining techniques are not applicable on large, fast and continuous data streams as they require one pass processing of data. These requirements make range query estimation a challenging task. We propose a technique to perform range query estimation using micro-clustering. The technique maintains cluster statistics in terms of micro-clusters. These micro-clusters also maintain data distribution information of the cluster values using cosine coefficients. These cosine coefficients are used for estimating range queries. The estimation can be done over a range of data values spread over a number of clusters. The technique has been compared with cosine series technique for selectivity estimation. Experiments have been conducted on both synthetic and real datasets of varying sizes and results confirm that our technique offers substantial improvements in accuracy over other methods.

Keywords: Selectivity estimation, range query, data streams, micro-clustering.

Received September 22, 2012; accepted December 24, 2013

 

Read 1560 times Last modified on Monday, 07 September 2015 08:10
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…