Efficiently Processing of Top-K Typicality Query for Structured Data

Jaehui Park1 and Sang-goo Lee2, 1Electronics and Telecommunications Research Institute, Korea and 2Seoul National University, Korea; Jaehui Park1 and Sang-goo Lee2, 1Electronics and Telecommunications Research Institute, Korea and 2Seoul National University, Korea

Efficiently Processing of Top-K Typicality Query for Structured Data

Authors

Jaehui Park¹ and Sang-goo Lee², ¹Electronics and Telecommunications Research Institute, Korea and ²Seoul National University, Korea

Abstract

This work presents a novel ranking scheme for structured data. We show how to apply the notion of typicality analysis from cognitive science and how to use this notion to formulate the problem of ranking data with categorical attributes. First, we formalize the typicality query model for relational databases. We adopt Pearson correlation coefficient to quantify the extent of the typicality of an object. The correlation coefficient estimates the extent of statistical relationships between two variables based on the patterns of occurrences and absences of their values. Second, we develop a top-k query processing method for efficient computation. TPFilter prunes unpromising objects based on tight upper bounds and selectively joins tuples of highest typicality score. Our methods efficiently prune unpromising objects based on upper bounds. Experimental results show our approach is promising for real data.

Keywords

Typicality, Top-k query processing, Correlation, Lazy join, Upper bound

CS&IT Conference Proceedings

Efficiently Processing of Top-K Typicality Query for Structured Data