Finding Frequent Items in Data Streams (2009)
Graham Cormode, Marios Hadjieleftheriou
The frequent items problem is to process a stream of items and find all items occurring more than a given fraction of the time. It is one of the most heavily studied problems in data stream mining,...
Anonymizing Bipartite Graph Data using Safe Groupings ABSTRACT (2009)
Graham Cormode, Divesh Srivastava
Private data often comes in the form of associations between entities, such as customers and products bought from a pharmacy, which are naturally represented in the form of a large, sparse bipartite...
Anonymizing Bipartite Graph Data using Safe Groupings ABSTRACT (2009)
Graham Cormode, Divesh Srivastava
Private data often comes in the form of associations between entities, such as customers and products bought from a pharmacy, which are naturally represented in the form of a large, sparse bipartite...
Via Embeddings, Graham Cormode, S. Muthukrishnan
Abstract. If the genetic maps of two species are modelled as permutations of (homologous) genes, the number of chromosomal rearrangements in the form of deletions, block moves, inversions etc. to...
Graham Cormode, S. Muthukrishnan
The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes and changes needed to convert R to S. Given a text string t of length n, and a pattern...
Today’s converged networks bring many new challenges for monitoring � Massive scale of data and connections � No centralized control, inability to police what is connected � Attacks,...
Graham Cormode, S. Muthukrishnan
Many network flows between (source, dest) pairs Want a snapshot at time t of the flows This defines a (massive) vector, and we ask: Summarise the current state How does state at time t compare with...
How to Increase the Acceptance Ratios of Top Conferences? (2008)
Graham Cormode, Artur Czumaj, S. Muthukrishnan
In the beginning was the pub. This work was triggered by a pub conversation where the authors observed that many resumés list acceptance ratios of conferences where their papers appear 4, boasting...
Graham Cormode, Stable Sketches
for L p, Experiments on tabular data – Too many data points to store: Doubling Algorithm for k-center clustering, Hierarchical Algorithm for k-median, Grid algorithms for k-median
ABSTRACT What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically (2008)
Most database management systems maintain statistics on the underlying relation. One of the important statistics is that of the “hot items ” in the relation: those that appear many times (most...
Approximate Continuous Querying over Distributed Streams (2008)
Graham Cormode, Minos Garofalakis
While traditional database systems optimize for performance on one-shot query processing, emerging largescale monitoring applications require continuous tracking of complex data-analysis queries over...
Skew is prevalent in data streams, and should be taken into account by algorithms that analyze the data. The problem of finding “biased quantiles” — that is, approximate quantiles which must be...
ABSTRACT What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically (2008)
Most database management systems maintain statistics on the underlying relation. One of the important statistics is that of the “hot items ” in the relation: those that appear many times (most...
Time-Decaying Sketches for Sensor Data Aggregation ∗ (2008)
Graham Cormode, Srikanta Tirthapura, Bojian Xu
We present a new sketch for summarizing network data. The sketch has the following properties which make it useful in communication-efficient aggregation in distributed streaming scenarios, such as...
Permutation Editing and Matching (2008)
Via Embeddings, Graham Cormode, S. Muthukrishnan
3 EECS, Case Western Reserve University, Cleveland, OH; cenk@cwru.edu Abstract. If the genetic maps of two species are modelled as permutations of (homologous) genes, the number of chromosomal...
Graham Cormode, S. Muthukrishnan
What is clustering? We have a bunch of items... we want to discover the clusters... Types of clustering What is the quantity we are trying to optimize? Two types of clustering K-centers Pick k points...
Amit Chakrabarti, Graham Cormode, Andrew Mcgregor
We describe a simple algorithm for approximating the empirical entropy of a stream of m values in a single pass, using O(ε −2 log(δ −1) log m) words of space. Our algorithm is based upon a...
ABSTRACT Space Efficient Mining of Multigraph Streams ∗ (2008)
The challenge of monitoring massive amounts of data generated by communication networks has led to the interest in data stream processing. We study streams of edges in massive communication...
Abstract Algorithms for Distributed Functional Monitoring (2008)
We study what we call functional monitoring problems. We have k players each tracking their inputs, say player i tracking a multiset Ai(t) up until time t, and communicating with a central...
ABSTRACT What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically (2008)
Most database management systems maintain statistics on the underlying relation. One of the important statistics is that of the “hot items ” in the relation: those that appear many times (most...
Abstract Algorithms for Distributed Functional Monitoring (2008)
We study what we call functional monitoring problems. We have k players each tracking their inputs, say player i tracking a multiset Ai(t) up until time t, and communicating with a central...
ABSTRACT Sketching Probabilistic Data Streams (2008)
The management of uncertain, probabilistic data has recently emerged as a useful paradigm for dealing with the inherent unreliabilities of several real-world application domains, including data...
Key differences between Web 1.0 and Web 2.0 (2008)
Graham Cormode; AT&T Labs-Research, Balachander Krishnamurthy
Web 2.0 is a buzzword introduced in 2003-04 which is commonly used to encompass various novel phenomena on the World Wide Web. Although largely a marketing term, some of the key attributes associated...
Histograms and Wavelets on Probabilistic Data (2008)
Cormode, Graham, Garofalakis, Minos
There is a growing realization that uncertain information is a first-class citizen in modern database management. As such, we need techniques to correctly and efficiently process uncertain data in...
Continuous Distributed Stream Querying using Sketches 1 (2008)
Graham Cormode, Minos Garofalakis
While traditional database systems optimize for performance on one-shot query processing, emerging largescale monitoring applications require continuous tracking of complex data-analysis queries over...
Permutation Editing and Matching via (2008)
Graham Cormode, S. Muthukrishnan
Abstract. If the genetic maps of two species are modelled as permutations of (homologous) genes, the number of chromosomal rearrangements in the form of deletions, block moves, inversions etc. to...
Graham Cormode, S. Muthukrishnan, Lower Bounds
– New approaches for this problem and its extensions. Some questions We see a large number of individual transactions (such as Amazon book sales) – What are the top sellers today? We are...
Skew is prevalent in data streams, and should be taken into account by algorithms that analyze the data. The problem of finding “biased quantiles” — that is, approximate quantiles which must be...
Fundamentals of Analyzing and Mining Data Streams (2008)
Abstract. Many scenarios, such as network analysis, utility monitoring, and financial applications, generate massive streams of data. These streams consist of millions or billions of simple updates...
Amit Chakrabarti, Graham Cormode, Andrew Mcgregor
We describe a simple algorithm for approximating the empirical entropy of a stream of m values in a single pass, using O(ε −2 log(δ −1) log m) words of space. Our algorithm is based upon a...
Some Key Concepts in Data Mining – Clustering (2008)
The notion of ‘clusters ’ is a very natural one, and occurs frequently in discussions of epidemiology. We hear about ‘cancer clusters’, areas where the number of reported cancer cases within...
Time-Decaying Sketches for Sensor Data Aggregation (2008)
Graham Cormode, Srikanta Tirthapura, Bojian Xu
We present a new sketch for summarizing network data. The sketch has the following properties which make it useful in communication-efficient aggregation in distributed streaming scenarios, such as...
ABSTRACT Sketching Probabilistic Data Streams (2008)
The management of uncertain, probabilistic data has recently emerged as a useful paradigm for dealing with the inherent unreliabilities of several real-world application domains, including data...
Key differences between Web 1.0 and Web 2.0 (2008)
Graham Cormode, Balachander Krishnamurthy
Web 2.0 is a buzzword introduced in 2003?04 which is commonly used to encompass various novel phenomena on the World Wide Web. Although largely a marketing term, some of the key attributes associated...
Key Differences between Web1.0 and Web2.0 (2008)
Graham Cormode, Er Krishnamurthy
Web 2.0 is a buzzword introduced in 2003/04 which is commonly used to encompass various novel phenomena on the World Wide Web. Although largely a marketing term, some of the key attributes associated...
Graham Cormode, S. Muthukrishnan
Data streams often consist of multiple signals. Consider a stream of multiple signals (i; a i;j) where i's correspond to the domain, j's index the dierent signals and a i;j 0 to the value...
How to Increase the Acceptance Ratios of Top Conferences? (2007)
Graham Cormode, Artur Czumaj, S. Muthukrishnan
In the beginning was the pub. This work was triggered by a pub conversation where the authors observed that many resumes list acceptance ratios of conferences where their papers appear, boasting the...
On estimating frequency moments of data streams (2007)
Abstract. Space-economical estimation of the pth frequency moments, defined as Fp = P n i=1 |fi|p, for p> 0, are of interest in estimating all-pairs distances in a large data matrix [14], machine...
Conquering the divide: Continuous clustering of distributed data streams (2007)
Data is often collected over a distributed network, but in many cases, is so voluminous that it is impractical and undesirable to collect it in a central location. Instead, we must perform...
Streaming in a Connected World: Querying and Tracking Distributed Data Streams (2007)
Graham Cormode, Minos Garofalakis
Today, a majority of data is fundamentally distributed in nature. Data for almost any task is collected over a broad area, and streams in at a much greater rate than ever before. In particular,...
A near-optimal algorithm for computing the entropy of a stream (2007)
Amit Chakrabarti, Graham Cormode
We describe a simple algorithm for approximating the empirical entropy of a stream of m values in a single pass, using O(ε −2 log(δ −1) log m) words of space. Our algorithm is based upon a...
On estimating frequency moments of data streams (2007)
Abstract. Space-economical estimation of the pth frequency moments, defined as Fp = �n i=1 |fi|p, for p> 0, are of interest in estimating all-pairs distances in a large data matrix [14], machine...
Conquering the divide: Continuous clustering of distributed data streams (2007)
Data is often collected over a distributed network, but in many cases, is so voluminous that it is impractical and undesirable to collect it in a central location. Instead, we must perform...
Fast approximate wavelet tracking on streams (2006)
Graham Cormode, Minos Garofalakis, Dimitris Sacharidis
Abstract. Recent years have seen growing interest in effective algorithms for summarizing and querying massive, high-speed data streams. Randomized sketch synopses provide accurate approximations for...
Combinatorial Algorithms for Compressed Sensing (2006)
Abstract — In sparse approximation theory, the fundamental problem is to reconstruct a signal A ∈ R n from linear measurements 〈A, ψi 〉 with respect to a dictionary of ψi’s. Recently,...
Combinatorial Algorithms for Compressed Sensing (2006)
Graham Cormode, S. Muthukrishnan
Abstract. In sparse approximation theory, the fundamental problem is to reconstruct a signal A ∈ R n from linear measurements 〈A, ψi 〉 with respect to a dictionary of ψi’s. Recently, there...
Fast approximate wavelet tracking on streams (2006)
Graham Cormode, Minos Garofalakis, Dimitris Sacharidis
Abstract. Recent years have seen growing interest in effective algorithms for summarizing and querying massive, high-speed data streams. Randomized sketch synopses provide accurate approximations for...
Emerging applications in sensor systems and network-wide IP traffic analysis present many technical challenges. They need distributed monitoring and continuous tracking of events. They have severe...
Fast approximate wavelet tracking on streams (2006)
Graham Cormode, Minos Garofalakis, Dimitris Sacharidis
Abstract. Recent years have seen growing interest in effective algorithms for summarizing and querying massive, high-speed data streams. Randomized sketch synopses provide accurate approximations for...
Effective computation of biased quantiles over data streams (2005)
Graham Cormode, S. Muthukrishnan
Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with...
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling (2005)
Graham Cormode, S. Muthukrishnan
Database management systems face the challenge of dealing with massive data distributions which arrive at high speeds while there is only small storage available for managing and mining them....
Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles (2005)
Graham Cormode, Minos Garofalakis
While traditional database systems optimize for performance on one-shot queries, emerging large-scale monitoring applications require continuous tracking of complex aggregates and data-distribution...
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling (2005)
Graham Cormode, S. Muthukrishnan
Database management systems face the challenge of dealing with massive data distributions which arrive at high speeds while there is only small storage available for managing and mining them....
Sketching streams through the net: Distributed approximate query tracking (2005)
While traditional database systems optimize for performance on one-shot query processing, emerging large-scale monitoring applications require continuous tracking of complex dataanalysis queries over...
Efficient strategies for continuous distributed tracking tasks (2005)
Graham Cormode, Minos Garofalakis
While traditional databases have focused on single query evaluation in a centralized setting, emerging applications require continuous tracking of queries on data that is widely distributed and...
Sketching streams through the net: Distributed approximate query tracking (2005)
Emerging large-scale monitoring applications require continuous tracking of complex data-analysis queries over collections of physically-distributed streams. Effective solutions have to be...
Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling (2005)
Graham Cormode, S. Muthukrishnan, Irina Rozenbaum
Emerging data stream management systems approach the challenge of massive data distributions which arrive at high speeds while there is only small storage by summarizing and mining the distributions...
Sketching Streams through the Net: Distributed Approximate Query Tracking (2005)
Graham Cormode, Minos Garofalakis
Emerging large-scale monitoring applications require continuous tracking of complex dataanalysis queries over collections of physicallydistributed streams. Effective solutions have to be...
Effective computation of biased quantiles over data streams (2005)
Graham Cormode, S. Muthukrishnan
Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with...
Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles (2005)
Graham Cormode, Minos Garofalakis
While traditional database systems optimize for performance on one-shot queries, emerging large-scale monitoring applications require continuous tracking of complex aggregates and data-distribution...
Efficient strategies for continuous distributed tracking tasks (2005)
Graham Cormode, Minos Garofalakis
While traditional databases have focused on single query evaluation in a centralized setting, emerging applications require continuous tracking of queries on data that is widely distributed and...
Effective computation of biased quantiles over data streams (2005)
Graham Cormode, S. Muthukrishnan
Skew is prevalentin many data sourcessuchas IP traffic streams. To continually summarize the distribution of such data, a highbiased set of quantiles (e.g., 50th, 90th and 99th percentiles) with...
What’s new: Finding significant differences in network data streams (2004)
Abstract — Monitoring and analyzing network traffic usage patterns is vital for managing IP Networks. An important problem is to provide network managers with information about changes in traffic,...
Holistic UDAFs at streaming speeds (2004)
Graham Cormode, S. Muthukrishnan
Many algorithms have been proposed to approximate holistic aggregates, such as quantiles and heavy hitters, over data streams. However, little work has been done to explore what techniques are...
An improved data stream summary: The Count-Min sketch and its applications (2004)
Graham Cormode, S. Muthukrishnan
Abstract. We introduce a new sublinear space data structure—the Count-Min Sketch — for summarizing data streams. Our sketch allows fundamental queries in data stream summarization such as point,...
What’s new: Finding significant differences in network data streams (2004)
Abstract — Monitoring and analyzing network traffic usage patterns is vital for managing IP Networks. An important problem is to provide network managers with information about changes in traffic,...
Diamond in the Rough: Finding Hierarchical Heavy Hitters in Multi-Dimensional Data (2004)
Graham Cormode, S. Muthukrishnan
Data items archived in data warehouses or those that arrive online as streams typically have attributes which take values from multiple hierarchies (e.g., time and geographic location; source and...
Diamond in the Rough: Finding Hierarchical Heavy Hitters in Multi-Dimensional Data (2004)
Graham Cormode, S. Muthukrishnan
Data items archived in data warehouses or those that arrive online as streams typically have attributes which take values from multiple hierarchies (e.g., time and geographic location; source and...
What's New: Finding Significant Differences in Network Data Streams (2004)
Graham Cormode, S. Muthukrishnan
Monitoring and analyzing network traffic usage patterns is vital for managing IP Networks. An important problem is to provide network managers with information about changes in traffic, informing...
Holistic UDAFs at Streaming Speeds (2004)
Graham Cormode, Theodore Johnson, Flip Korn, S. Muthukrishnan
Many algorithms have been proposed to approximate holistic aggregates, such as quantiles and heavy hitters, over data streams. However, little work has been done to explore what techniques are...
Diamond in the Rough: Finding Hierarchical Heavy Hitters in Multi-Dimensional Data (2004)
Graham Cormode, Flip Korn, S. Muthukrishnan, Divesh Srivastava
Data items archived in data warehouses or those that arrive online as streams typically have attributes which take values from multiple hierarchies (e.g., time and geographic location; source and...
On Automated Lesson Construction from Electronic Textbooks (2004)
Gultekin Ozsoyoglu, Nevzat H. Balkir, Z. Meral Ozsoyoglu, Senior Member, Senior Member, Ieee Computer Society, ...
Abstract—An electronic book may be viewed as an application with a multimedia database. We define an electronic textbook as an electronic book that is used in conjunction with instructional...
The hardness of the lemmings game, or oh no, more npcompleteness proofs (2004)
In the computer game ‘Lemmings’, the player must guide a tribe of green-haired lemming creatures to safety, and save them from an untimely demise. We formulate the decision problem which is,...
Holistic UDAFs at streaming speeds (2004)
Graham Cormode, S. Muthukrishnan
Many algorithms have been proposed to approximate holistic aggregates, such as quantiles and heavy hitters, over data streams. However, little work has been done to explore what techniques are...
Sequence distance embeddings / (2003)
Thesis (Ph.D.)--University of Warwick, Jan. 2003.
Stable distributions for stream computations: It’s as easy as 0,1,2 (2003)
-1. Introduction A surprising number of data stream problems are solved bymethods involving computations with stable distributions. This
What’s hot and what’s not: Tracking most frequent items dynamically (2003)
Graham Cormode, S. Muthukrishnan
Most database management systems maintain statistics on the underlying relation. One of the important statistics is that of the “hot items ” in the relation: those that appear many times (most...
Estimating dominance norms of multiple data streams (2003)
Graham Cormode, S. Muthukrishnan
Abstract. There is much focus in the algorithms and database communities on designing tools to manage and mine data streams. Typically, data streams consist of multiple signals. Formally, a stream of...
What's Hot and What's Not: Tracking Most Frequent Items Dynamically (2003)
Graham Cormode, S. Muthukrishnan
Most database management systems maintain statistics on the underlying relation. One of the important statistics is that of the "hot items" in the relation: those that appear many times...
Finding hierarchical heavy hitters in data streams (2003)
Graham Cormode, S. Muthukrishnan, Divesh Srivastava
Aggregation along hierarchies is a critical summary technique in a large variety of online applications including decision support, and network management (e.g., IP clustering, denial-of-service...
Topic Dependencies for Electronic Books (2002)
Electronics books consist of a number of topics, and information about dependencies between topics. We examine some theoretical problems related to handling these topic dependencies. In particular we...
Comparing data streams using hamming norms (how to zero in (2002)
Graham Cormode, Mayur Datar, Piotr Indyk, S. Muthukrishnan
Abstract—Massive data streams are now fundamental to many data processing applications. For example, Internet routers produce large scale diagnostic data streams. Such streams are rarely stored in...
Comparing data streams using hamming norms (how to zero in (2002)
Graham Cormode, Mayur Datar, S. Muthukrishnan, Piotr Indyk
Massive data streams are now fundamental to many data processing applications. For example, Internet routers produce large scale diagnostic data streams. Such streams are rarely stored in traditional...
A Santos [1994], "Compilation by transformation in the Glasgow Haskell Compiler (2000)
Gultekin Ozsoyoglu, N. Hurkan Balkir, Graham Cormode, Z. Meral Ozsoyoglu
Abstract 1 Electronic book is an application with a multimedia database of instructional resources, which include hyperlinked text, instructor’s audio/video clips, slides, animation, still images,...
Electronic Books in Digital Libraries (2000)
Gultekin Ozsoyoglu, N. Hurkan Balkir, Graham Cormode, Z. Meral Ozsoyoglu
Electronic book is an application with a multimedia database of instructional resources, which include hyperlinked text, instructor's audio/video clips, slides, animation, still images, etc. as...
Obliviously Approximating Sequence Distances (2000)
There are several applications for schemes which approximately find the distance between two sequences in a way that is `oblivious' of one of the sequences up until a final sublinear number of...
Communication Complexity of Document Exchange (2000)
Graham Cormode, Mike Paterson, Suleyman Cenk Sahinalp, Uzi Vishkin, Suleyman Cenk S
We address the problem of minimizing the communication involved in the exchange of similar documents. We consider two users, A and B, who hold documents x and y respectively. Neither of the users has...
Topic Dependencies for Electronic Books (1999)
Graham Cormode December, Graham Cormode
Electronics books consist of a number of topics, and information about dependencies between topics. We examine some theoretical problems related to handling these topic dependencies. In particular we...