Automatic Optimization of Parallel Dataflow Programs (2009)
Christopher Olston, Benjamin Reed, Adam Silberstein, Utkarsh Srivastava
Large-scale parallel dataflow systems, e.g., Dryad and Map-Reduce, have attracted significant attention recently. High-level dataflow languages such as Pig Latin and Sawzall are being layered on top...
Parallel Evaluation of Composite Aggregate Queries (2009)
Lei Chen, Christopher Olston, Raghu Ramakrishnan
Abstract — Aggregate measures summarizing subsets of data are valuable in exploratory analysis and decision support, especially when dependent aggregations can be easily specified and computed. A...
Relaxation in Text Search using Taxonomies (2009)
Marcus Fontoura, Vanja Josifovski, Ravi Kumar, Christopher Olston, Andrew Tomkins, Sergei Vassilvitskii
In this paper we propose a novel document retrieval model in which text queries are augmented with multi-dimensional taxonomy restrictions. These restrictions may be relaxed at a cost to result...
Interactive Analysis of Web-Scale Data (2009)
Christopher Olston, Edward Bortnikov, Khaled Elmeleegy, Flavio Junqueira, Benjamin Reed
We consider how to support interactive querying over webscale data. The basic approach is to view querying as a two-phase activity: first supply a query template, and later supply specific...
Scheduling Shared Scans of Large Data Files (2009)
Parag Agrawal, Daniel Kifer, Christopher Olston
We study how best to schedule scans of large data files, in the presence of many simultaneous requests to a common set of files. The objective is to maximize the overall rate of processing these...
Automatic Optimization of Parallel Dataflow Programs (2009)
Christopher Olston, Benjamin Reed, Adam Silberstein, Utkarsh Srivastava
Large-scale parallel dataflow systems, e.g., Dryad and Map-Reduce, have attracted significant attention recently. High-level dataflow languages such as Pig Latin and Sawzall are being layered on top...
Scalable Query Result Caching for Web Applications (2009)
Anastasia Ailamaki, Charles Garrod, Christopher Olston, Bruce Maggs, Amit Manjhi, Anthony Tomasic, ...
The backend database system is often the performance bottleneck when running web applications. A common approach to scale the database component is query result caching, but it faces the challenge of...
Anastasia Ailamaki, Charles Garrod, Christopher Olston, Bruce Maggs, Amit Manjhi, Google Inc, ...
The backend database system is often the performance bottleneck when running web applications. A common approach to scale the database component is query result caching, but it faces the challenge of...
Christopher Olston, Benjamin Reed, Utkarsh Srivastava
There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected...
Scalable Query Result Caching for Web Applications ABSTRACT (2009)
Anastasia Ailamaki, Charles Garrod, Christopher Olston
The backend database system is often the performance bottleneck when running web applications. A common approach to scale the database component is query result caching, but it faces the challenge of...
Relaxation in Text Search using Taxonomies (2009)
Marcus Fontoura, Vanja Josifovski, Ravi Kumar, Christopher Olston, Andrew Tomkins, Sergei Vassilvitskii
In this paper we propose a novel document retrieval model in which text queries are augmented with multi-dimensional taxonomy restrictions. These restrictions may be relaxed at a cost to result...
Christopher Olston, Benjamin Reed, Utkarsh Srivastava
There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected...
Parallel Evaluation of Composite Aggregate Queries (2009)
Lei Chen, Christopher Olston, Raghu Ramakrishnan
Abstract — Aggregate measures summarizing subsets of data are valuable in exploratory analysis and decision support, especially when dependent aggregations can be easily specified and computed. A...
Christopher Olston, Benjamin Reed, Utkarsh Srivastava
There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected...
Pig Latin: A Not-So-Foreign Language for Data Processing (2008)
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins
There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected...
Pig Latin: A Not-So-Foreign Language for Data Processing (2008)
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins
There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected...
WIC: A General-Purpose Algorithm for Monitoring Web Information Sources (2008)
Eep P, Kedar Dhamdhere, Christopher Olston
The Web is becoming a universal information dissemination medium, due to a number of factors including its support for content dynamicity. A growing number of Web information providers post near...
WWW 2007 / Track: Search Session: Crawlers ABSTRACT The Discoverability of the Web (2008)
Anirban Dasgupta, Arpita Ghosh, Ravi Kumar, Christopher Olston, Sandeep Pandey, Andrew Tomkins
Previous studies have highlighted the high arrival rate of new content on the web. We study the extent to which this new content can be efficiently discovered by a crawler. Our study has two parts....
Recrawl Scheduling Based on Information Longevity (2008)
It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the time it reaches the...
Scalable query result caching for web applications (2008)
Garrod, Charles, Manjhi, Amit, Ailamaki, Anastasia, Maggs, Bruce M., Mowry, Todd C., Olston, Christopher, ...
WIC: A General-Purpose Algorithm for Monitoring Web Information Sources (2007)
Eep P, Kedar Dhamdhere, Christopher Olston
The Web is becoming a universal information dissemination medium, due to a number of factors including its support for content dynamicity. A growing number of Web information providers post near...
WIC: A General-Purpose Algorithm for Monitoring Web Information Sources (2007)
Eep P, Kedar Dhamdhere, Christopher Olston
The Web is becoming a universal information dissemination medium, due to a number of factors including its support for content dynamicity. A growing number of Web information providers post near...
WIC: A General-Purpose Algorithm for Monitoring Web Information Sources (2007)
Eep P, Kedar Dhamdhere, Christopher Olston
The Web is becoming a universal information dissemination medium, due to a number of factors including its support for content dynamicity. A growing number of Web information providers post near...
Invalidation Clues for Database Scalability Services (2007)
Manjhi, Amit, Gibbons, Phillip B., Ailamaki, Anastassia, Garrod, Charles, Maggs, Bruce M., Mowry, Todd C., ...
The Discoverability of the Web (2007)
Anirban Dasgupta, Arpita Ghosh, Ravi Kumar, Christopher Olston, Sandeep Pandey, Andrew Tomkins
Previous studies have highlighted the high arrival rate of new content on the web. We study the extent to which this new content can be efficiently discovered by a crawler. Our study has two parts....
Simultaneous scalability and security for data-intensive web applications (2006)
Manjhi, Amit, Ailamaki, Anastassia, Maggs, Bruce M., Mowry, Todd C., Olston, Christopher, Tomasic, Anthony
Simultaneous scalability and security for data-intensive Web applications (2006)
Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, Anthony Tomasic
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and...
Invalidation clues for database scalability services (2006)
Amit Manjhi, Phillip B. Gibbons, Anastassia Ailamaki, Charles Garrod, Bruce M. Maggs, Todd C. Mowry, ...
For their scalability needs, data-intensive Web applications can use a Database Scalability Service (DBSS), which caches applications ’ query results and answers queries on their behalf. One way...
Simultaneous scalability and security for data-intensive Web applications (2006)
Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, Anthony Tomasic
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and...
Simultaneous Scalability and Security (2006)
Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, ...
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and...
Simultaneous Scalability and Security for Data-Intensive Web Applications (2006)
Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, Anthony Tomasic
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and...
Simultaneous Scalability and Security (2006)
Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, ...
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and...
Simultaneous Scalability and Security for Data-Intensive Web Applications (2006)
Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, Anthony Tomasic
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and...
Invalidation clues for database scalability services (2006)
Amit Manjhi, Phillip B. Gibbons, Anastassia Ailamaki, Charles Garrod, Bruce M. Maggs, Todd C. Mowry, ...
For their scalability needs, data-intensive Web applications can use a Database Scalability Service (DBSS), which caches applications ’ query results and answers queries on their behalf. One way...
Handling advertisements of unknown quality in search advertising (2006)
Sandeep Pandey, Christopher Olston
We consider how a search engine should select advertisements to display with search results, in order to maximize its revenue. Under the standard “pay-per-click ” arrangement, revenue depends on...
Handling advertisements of unknown quality in search advertising (2006)
Sandeep Pandey, Christopher Olston
We consider how a search engine should select advertisements to display with search results, in order to maximize its revenue. Under the standard “pay-per-click ” arrangement, revenue depends on...
Handling advertisements of unknown quality in search advertising (2006)
Sandeep Pandey, Christopher Olston
We consider how a search engine should select advertisements to display with search results, in order to maximize its revenue. Under the standard “pay-per-click ” arrangement, revenue depends on...
the previous technical report. (2006)
Amit Manjhi, Phillip B. Gibbons, Anastassia Ailamaki, Charles Garrod, Bruce M. Maggs, Todd C. Mowry, ...
For their scalability needs, data-intensive Web applications can use a Database Scalability Service (DBSS), which caches applications ’ query results and answers queries on their behalf. One way...
Simultaneous scalability and security for data-intensive Web applications (2006)
Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, Anthony Tomasic
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and...
Invalidation clues for database scalability services (2006)
Amit Manjhi, Phillip B. Gibbons, Anastassia Ailamaki, Charles Garrod, Bruce M. Maggs, Todd C. Mowry, ...
For their scalability needs, data-intensive Web applications can use a Database Scalability Service (DBSS), which caches applications ’ query results and answers queries on their behalf. To address...
Invalidation clues for database scalability services (2006)
Amit Manjhi, Phillip B. Gibbons, Anastassia Ailamaki, Charles Garrod, Bruce M. Maggs, Todd C. Mowry, ...
For their scalability needs, data-intensive Web applications can use a Database Scalability Service (DBSS), which caches applications ’ query results and answers queries on their behalf. To address...
Advanced Tools for Video and Multimedia Mining (2006)
Jia-yu Pan, Christopher Olston
the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity.
Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results (2005)
Pandey, Sandeep, Roy, Sourashis, Olston, Christopher, Cho, Junghoo, Chakrabarti, Soumen
In-degree, PageRank, number of visits and other measures of Web page popularity significantly influence the ranking of search results by modern search engines. The assumption is that popularity is...
A Scalability Service for Dynamic Web Applications (2005)
Olston, Christopher, Manjhi, Amit, Garrod, Charles, Ailamaki, Anastassia, Maggs, Bruce M., Mowry, Todd C.
Shuffling a stacked deck: the case for partially randomized ranking of search engine results (2005)
Sandeep Pandey, Sourashis Roy, Christopher Olston, Junghoo Cho, Soumen Chakrabarti
In-degree, PageRank, number of visits and other measures of Web page popularity significantly influence the ranking of search results by modern search engines. The assumption is that popularity is...
A Scalability Service for Dynamic Web Applications (2005)
Christopher Olston, Anastassia Ailamaki, Charles Garrod, Bruce M. Maggs, Amit Manjhi, Todd C. Mowry
Providers of dynamic Web applications are currently unable to accommodate heavy usage without significant investment in infrastructure and in-house management capability. Our goal is to develop...
Finding (Recently) Frequent Items in Distributed Data Streams (2005)
Amit Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, Christopher Olston
We consider the problem of maintaining frequency counts for items occurring frequently in the union of multiple distributed data streams. Na ve methods of combining approximate frequency counts from...
Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results (2005)
Sandeep Pandey, Sourashis Roy, Christopher Olston, Junghoo Cho, Soumen Chakrabarti
In-degree, PageRank, number of visits and other measures of Web page popularity significantly influence the ranking of search results by modern search engines. The assumption is that popularity is...
User-Centric Web Crawling (2005)
Sandeep Pandey, Christopher Olston
Search engines are the primary gateways of information access on the Web today. Behind the scenes, search engines crawl the Web to populate a local indexed repository of Web pages, used to answer...
Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results (2005)
Sandeep Pandey, Sourashis Roy, Christopher Olston, Junghoo Cho, Soumen Chakrabarti
In-degree, PageRank, number of visits and other measures of Web page popularity significantly influence the ranking of search results by modern search engines. The assumption is that popularity is...
User-Centric Web Crawling (2005)
Sandeep Pandey, Christopher Olston
Search engines are the primary gateways of information access on the Web today. Behind the scenes, search engines crawl the Web to populate a local indexed repository of Web pages, used to answer...
Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results (2005)
Sandeep Pandey, Sourashis Roy, Christopher Olston, Junghoo Cho, Soumen Chakrabarti
In-degree, PageRank, number of visits and other measures of Web page popularity significantly influence the ranking of search results by modern search engines. The assumption is that popularity is...
Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results (2005)
Sandeep Pandey, Sourashis Roy, Christopher Olston, Junghoo Cho, Soumen Chakrabarti
In-degree, PageRank, number of visits and other measures of Web page popularity significantly influence the ranking of search results by modern search engines. The assumption is that popularity is...
Shuffling a Stacked Deck: The Case for Partially (2005)
Randomized Ranking Of, Sandeep Pandey, Sourashis Roy, Christopher Olston, Junghoo Cho, Soumen Chakrabarti
In-degree, PageRank, number of visits and other measures of Web page popularity significantly influence the ranking of search results by modern search engines. The assumption is that popularity is...
Efficient Monitoring and Querying of Distributed, Dynamic Data via Approximate Replication (2005)
Christopher Olston, Jennifer Widom
It is increasingly common for an application's data to reside at multiple disparate locations, while the application requires centralized access to its data. A simple solution is to replicate...
Finding (recently) frequent items in distributed data streams (2005)
Amit Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, Christopher Olston
We consider the problem of maintaining frequency counts for items occurring frequently in the union of multiple distributed data streams. Naïve methods of combining approximate frequency counts from...
Shuffling a stacked deck: the case for partially randomized ranking of search engine results (2005)
Sandeep Pandey, Sourashis Roy, Christopher Olston, Junghoo Cho, Soumen Chakrabarti
In-degree, PageRank, number of visits and other measures of Web page popularity significantly influence the ranking of search results by modern search engines. The assumption is that popularity is...
Tools for Large Graph Mining (2005)
Deepayan Chakrabarti, Guy Blelloch, Christopher Olston, Jon Kleinberg, External Member
Graphs show up in a surprisingly diverse set of disciplines, ranging from computer networks to sociology, biology, ecology and many more. How do such “normal ” graphs look like? How can we spot...
What’s New on the Web? The Evolution of the Web from a Search Engine Perspective (2004)
Ntoulas, Alexandros, Cho, Junghoo, Olston, Christopher
We seek to gain improved insight into how Web search engines should cope with the evolving Web, in an attempt to provide users with the most up-to-date results possible. For this purpose we collected...
What’s New on the Web? The Evolution of the Web from a Search Engine Perspective (2004)
Alexandros Ntoulas, Junghoo Cho, Christopher Olston
We seek to gain improved insight into how Web search engines should cope with the evolving Web, in an attempt to provide users with the most up-to-date results possible. For this purpose we collected...
What's New on the Web? The Evolution of the Web from a Search Engine Perspective (2004)
Alexandros Ntoulas, Junghoo Cho, Christopher Olston
In this paper, we seek to get a better insight on how the search engines should cope with the evolving Web, in an attempt to provide users with up-to-date results. In this respect, we have crawled...
Finding (Recently) Frequent Items in Distributed Data Streams (2004)
Amit Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, Christopher Olston
We consider the problem of maintaining frequency counts for items occurring frequently in the union of multiple distributed data streams. Na ve methods of combining approximate frequency counts from...
Finding (Recently) Frequent Items in Distributed Data Streams (2004)
Amit Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, Christopher Olston
We consider the problem of maintaining frequency counts for items occurring frequently in the union of multiple distributed data streams. Na ve methods of combining approximate frequency counts from...
Finding (Recently) Frequent Items in Distributed Data Streams (2004)
Amit Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, Christopher Olston
We consider the problem of maintaining frequency counts for items occurring frequently in the union of multiple distributed data streams. Na ve methods of combining approximate frequency counts from...
What’s New on the Web? The Evolution of the Web from a Search Engine Perspective (2004)
Alexandros Ntoulas, Junghoo Cho, Christopher Olston
We seek to gain improved insight into how Web search engines should cope with the evolving Web, in an attempt to provide users with the most up-to-date results possible. For this purpose we collected...
ScentTrails: Integrating Browsing and Searching on the Web (2003)
searching. While they exhibit complementary advantages, neither paradigm alone is adequate for complex information goals that lend themselves partially to browsing and partially to searching. To...
ScentTrails: Integrating Browsing and Searching on the Web (2003)
searching. While they exhibit complementary advantages, neither paradigm alone is adequate for complex information goals that lend themselves partially to browsing and partially to searching. To...