| Plan of attack Frequent Items / Heavy Hitters Counting Distinct Elements Clustering items in Streams (2008) | |||||||||||||||
Abstract | |||||||||||||||
| What is clustering? We have a bunch of items... we want to discover the clusters... Types of clustering What is the quantity we are trying to optimize? Two types of clustering K-centers Pick k points in the space, call these centers Assign each data point to its closest center Measure the diameter of each cluster: maximum distance between two points in the same cluster K-medians Pick k points in the space, call these centers Assign each data point to its closest center Measure the average distance from each point to its closest center (or sum of distances) The First? Application of Clustering A London physician plotted the location of cholera cases on a map during an outbreak in the 1850s. The locations indicated that cases were clustered around certain intersections where there were polluted wells-- thus exposing both the problem and the solution. Not all clustering is this easy (2 dimensional, small number of points.) Slide due to Nina Mishra HP labs Clustering is hard For both k-centers and k-medians, both versions are NP-complete We only know exponential algorithms to solve them So we look for approximate answers with guaranteed approximation ratios. Metric Spaces The algorithms will work for any metric space Metric space: a set of points and a distance measure d on pairs of points satisfying Identity: d(x,y) =0 ⇒ x=y Symmetry: d(x,y) = d(y,x) | |||||||||||||||
Details der Publikation | |||||||||||||||
| |||||||||||||||