Brief Announcement: Parallel Depth First vs. Work Stealing Schedulers on CMP Architectures (2009)
Vasileios Liaskovitis, Shimin Chen, Phillip B. Gibbons, Anastassia Ailamaki, Guy E. Blelloch, Babak Falsafi, ...
In chip multiprocessors (CMPs), limiting the number of off-chip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in...
Generalizing Metamodules to Simplify Planning in Modular Robotic Systems (2009)
Daniel J. Dewey, Michael De Rosa, Seth Copen Goldstein, Todd C. Mowry
Abstract — In this paper we develop a theory of metamodules and an associated distributed asynchronous planner which generalizes previous work on metamodules for lattice-based modular robotic...
HolisticQuery TransformationsforDynamic Web Applications (2009)
Amit Manjhi, Charles Garrod, Bruce M. Maggs, Todd C. Mowry, Anthony Tomasic
A promising approach to scaling Web applications is to distribute the server infrastructure on which they run. This approach, unfortunately, can introduce latency between the application and database...
Parallelizing Dynamic Information Flow Tracking (2009)
Olatunji Ruwase, Phillip B. Gibbons, Todd C. Mowry, Vijaya Ramach, Shimin Chen, Michael Kozuch, ...
Dynamic information flow tracking (DIFT) is an important tool for detecting common security attacks and memory bugs. A DIFT tool tracks the flow of information through a monitored program’s...
Integrated Debugging of Large Modular Robot Ensembles (2008)
Benjamin D. Rister, Jason Campbell, Padmanabhan Pillai, Todd C. Mowry
Abstract — Creatively misquoting Thomas Hobbes, the process of software debugging is nasty, brutish, and all too long. This holds all the more true in robotics, which frequently involves...
Antonia Zhai, Christopher B. Colohan, Todd C. Mowry
Efficient inter-thread value communication is essential for improving
Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan, Todd C. Mowry
Efficient inter-thread value communication is essential for improving
CMP Support for Large and Dependent Speculative Threads (2008)
Christopher B. Colohan, Anastasia Ailamaki, Ieee Computer Society, J. Gregory Steffan, Todd C. Mowry
Abstract—Thread-level speculation (TLS) has proven to be a promising method of extracting parallelism from both integer and scientific workloads, targeting speculative threads that range in size...
Antonia Zhai, Christopher B. Colohan, Todd C. Mowry
Efficient inter-thread value communication is essential for improving
Brief Announcement: Parallel Depth First vs. Work Stealing Schedulers on CMP Architectures (2008)
Vasileios Liaskovitis, Shimin Chen, Phillip B. Gibbons, Anastassia Ailamaki, Guy E. Blelloch, Babak Falsafi, ...
In chip multiprocessors (CMPs), limiting the number of off-chip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in...
Statement of Research Interests (2008)
Shimin Chen School, Shimin Chen, Todd C. Mowry, Computer Architecture, Operating Systems
this paper was Runner-Up for the Best Paper Award.) To accelerate searches, pB+-Trees use prefetching to effectively create wider nodes than the natural data transfer size: e.g., eight vs. one cache...
Flexible hardware acceleration for instruction-grain program monitoring (2008)
Chen, Shimin, Kozuch, Michael, Strigkos, Theodoros, Falsafi, Babak, Gibbons, Phillip B., Mowry, Todd C., ...
Instruction-grain program monitoring tools, which check and analyze executing programs at the granularity of individual instructions, are invaluable for quickly detecting bugs and security attacks...
Incrementally parallelizing database transactions with thread-level speculation (2008)
Colohan, Christopher B., Ailamaki, Anastassia, Gregory Steffan, J., Mowry, Todd C.
Scalable query result caching for web applications (2008)
Garrod, Charles, Manjhi, Amit, Ailamaki, Anastasia, Maggs, Bruce M., Mowry, Todd C., Olston, Christopher, ...
J. Gregory, Steffan Christopher, B. Colohan, Todd C. Mowry
Thread-Level Data Speculation (TLDS) is a technique which enables the optimistic parallelization of applications despite ambiguous data dependences between the resulting threads. Although TLDS is...
Automatic Compiler-Inserted I/0 Prefetching for Out-of-Core Applications* (2007)
Todd C. Mowry, Angela K. Demke, Orran Krieger
Current operating systems offer poor performance when a numeric application's working set does not fit in main memory. As a result, programmers who wish to solve "out-of-core "...
Angela Demke Brown, Todd C. Mowry, Orran Krieger
for Current operating systems offer poor performance when a numeric application's working set does not fit in main memory. As a result, programmers who wish to solve...
Abstract Automatic Compiler-Inserted I/O Prefetching for Out-of-Core Applications ∗ (2007)
Todd C. Mowry, Angela K. Demke, Orran Krieger
Current operating systems offer poor performance when a numeric application’s working set does not fit in main memory. As a result, programmers who wish to solve “out-of-core ” problems...
Scalar Optimization and Code Generation Development at the University of Toronto (2007)
Todd C. Mowry, Antonia Zhai, Steve Tjiang
One of the most attractive features of the SUIF compiler system is its suitability for both highlevel and low-level analysis and optimizations within a common intermediate format. For example, the...
for Out-of-Core Applications (2007)
Todd C. Mowry, Todd C. Mowry, Angela K. Demke, Angela K. Demke, Orran Krieger, Orran Krieger
Current operating systems offer poor performance when a numeric application's working set does not fit in main memory. As a result, programmers who wish to solve "out-of-core "...
Feedback and Simulation Tools for Investigating Do-Across Parallelism (2007)
Evaluating new architectural ideas and the compiler support necessary to exploit them presents some difficult methodological challenges. Once a new technique for enhancing performance has been...
info Copies of handouts. (2007)
Todd C. Mowry, Rosemary Battenfelder
In the following subdirectories you will find: doc Copies of many of the readings. asst Subdirectories with supporting material for each assignment.
Abstract Automatic Compiler-Inserted I/O Prefetching for Out-of-Core Applications (2007)
Todd C. Mowry, Angela K. Demke, Orran Krieger
Current operating systems o er poor performance when a numeric application's working set does not t in main memory. As a result, programmers who wish to solve \out-of-core " problems e...
1 Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors (2007)
Mark Horowitz, Margaret Martonosi, Todd C. Mowry, Michael D. Smith
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem...
Scheduling threads for constructive cache sharing on CMPs (2007)
Chen, Shimin, Gibbons, Phillip B., Kozuch, Michael, Liaskovitis, Vasileios, Ailamaki, Anastassia, Blelloch, Guy E., ...
Invalidation Clues for Database Scalability Services (2007)
Manjhi, Amit, Gibbons, Phillip B., Ailamaki, Anastassia, Garrod, Charles, Maggs, Bruce M., Mowry, Todd C., ...
A modular robotic system using magnetic force effectors (2007)
Brian T. Kirby, Burak Aksak, Jason D. Campbell, James F. Hoburg, Todd C. Mowry, Padmanabhan Pillai, ...
Abstract — One of the primary impediments to building ensembles of modular robots is the complexity and number of mechanical mechanisms used to construct the individual modules. As part of the...
Meld: A declarative approach to programming ensembles (2007)
Seth Copen Goldstein, Peter Lee, Todd C. Mowry, Padmanabhan Pillai
Abstract — This paper presents Meld, a programming language for modular robots, i.e., for independently executing robots where inter-robot communication is limited to immediate neighbors. Meld is a...
Scheduling threads for constructive cache sharing on CMPs (2007)
Shimin Chen, Phillip B. Gibbons, Michael Kozuch, Vasileios Liaskovitis, Anastassia Ailamaki, Guy E. Blelloch, ...
In chip multiprocessors (CMPs), limiting the number of offchip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in which...
Scheduling threads for constructive cache sharing on CMPs (2007)
Chen, Shimin, Gibbons, Phillip B., Kozuch, Michael, Liaskovitis, Vasileios, Ailamaki, Anastassia, Blelloch, Guy E., ...
In chip multiprocessors (CMPs), limiting the number of offchip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in which...
Improving hash join performance through prefetching (2007)
Chen, Shimin, Ailamaki, Anastassia, Gibbons, Phillip B., Mowry, Todd C.
CMP Support for Large and Dependent Speculative Threads (2007)
Colohan, Christopher B., Ailamaki, Anastassia, Gregory Steffan, J., Mowry, Todd C.
Scheduling threads for constructive cache sharing on CMPs (2007)
Shimin Chen, Phillip B. Gibbons, Michael Kozuch, Vasileios Liaskovitis, Anastassia Ailamaki, Guy E. Blelloch, ...
In chip multiprocessors (CMPs), limiting the number of offchip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in which...
Log-based architectures for general-purpose monitoring of deployed code (2006)
Chen, Shimin, Falsafi, Babak, Gibbons, Phillip B., Kozuch, Michael, Mowry, Todd C., Teodorescu, Radu, ...
Runtime monitoring tools are invaluable for detecting various types of bugs, in both sequential and multi-threaded programs. However, these tools often slow down the monitored program by an order of...
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads (2006)
Colohan, Christopher B., Ailamaki, Anastassia, Gregory Steffan, J., Mowry, Todd C.
Parallel depth first vs. work stealing schedulers on CMP architectures (2006)
Liaskovitis, Vasileios, Chen, Shimin, Gibbons, Phillip B., Ailamaki, Anastassia, Blelloch, Guy E., Falsafi, Babak, ...
In chip multiprocessors (CMPs), limiting the number of off-chip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in...
Simultaneous scalability and security for data-intensive web applications (2006)
Manjhi, Amit, Ailamaki, Anastassia, Maggs, Bruce M., Mowry, Todd C., Olston, Christopher, Tomasic, Anthony
Tolerating dependences between large speculative threads via sub-threads (2006)
Christopher B. Colohan, Anastassia Ailamaki, J. Gregory Steffan, Todd C. Mowry
Thread-level speculation (TLS) has proven to be a promising method of extracting parallelism from both integer and scientific workloads, targeting speculative threads that range in size from hundreds...
Simultaneous scalability and security for data-intensive Web applications (2006)
Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, Anthony Tomasic
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and...
Invalidation clues for database scalability services (2006)
Amit Manjhi, Phillip B. Gibbons, Anastassia Ailamaki, Charles Garrod, Bruce M. Maggs, Todd C. Mowry, ...
For their scalability needs, data-intensive Web applications can use a Database Scalability Service (DBSS), which caches applications ’ query results and answers queries on their behalf. One way...
Simultaneous scalability and security for data-intensive Web applications (2006)
Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, Anthony Tomasic
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and...
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads (2006)
Christopher Colohan Anastassia, Christopher B. Colohan, Anastassia Ailamaki, J. Gregory Steffan, Todd C. Mowry
Thread-level speculation (TLS) has proven to be a promising method of extracting parallelism from both integer and scientific workloads, targeting speculative threads that range in size from hundreds...
Simultaneous Scalability and Security (2006)
Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, ...
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and...
Simultaneous Scalability and Security for Data-Intensive Web Applications (2006)
Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, Anthony Tomasic
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and...
Simultaneous Scalability and Security (2006)
Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, ...
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and...
Simultaneous Scalability and Security for Data-Intensive Web Applications (2006)
Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, Anthony Tomasic
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and...
Invalidation clues for database scalability services (2006)
Amit Manjhi, Phillip B. Gibbons, Anastassia Ailamaki, Charles Garrod, Bruce M. Maggs, Todd C. Mowry, ...
For their scalability needs, data-intensive Web applications can use a Database Scalability Service (DBSS), which caches applications ’ query results and answers queries on their behalf. One way...
Tolerating dependences between large speculative threads via sub-threads (2006)
Christopher B. Colohan, Anastassia Ailamaki, J. Gregory Steffan, Todd C. Mowry
Thread-level speculation (TLS) has proven to be a promising method of extracting parallelism from both integer and scientific workloads, targeting speculative threads that range in size from hundreds...
Tolerating dependences between large speculative threads via sub-threads (2006)
Christopher B. Colohan, Anastassia Ailamaki, J. Gregory Steffan, Todd C. Mowry
Thread-level speculation (TLS) has proven to be a promising method of extracting parallelism from both integer and scientific workloads, targeting speculative threads that range in size from hundreds...
the previous technical report. (2006)
Amit Manjhi, Phillip B. Gibbons, Anastassia Ailamaki, Charles Garrod, Bruce M. Maggs, Todd C. Mowry, ...
For their scalability needs, data-intensive Web applications can use a Database Scalability Service (DBSS), which caches applications ’ query results and answers queries on their behalf. One way...
Simultaneous scalability and security for data-intensive Web applications (2006)
Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, Anthony Tomasic
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and...
Invalidation clues for database scalability services (2006)
Amit Manjhi, Phillip B. Gibbons, Anastassia Ailamaki, Charles Garrod, Bruce M. Maggs, Todd C. Mowry, ...
For their scalability needs, data-intensive Web applications can use a Database Scalability Service (DBSS), which caches applications ’ query results and answers queries on their behalf. To address...
Invalidation clues for database scalability services (2006)
Amit Manjhi, Phillip B. Gibbons, Anastassia Ailamaki, Charles Garrod, Bruce M. Maggs, Todd C. Mowry, ...
For their scalability needs, data-intensive Web applications can use a Database Scalability Service (DBSS), which caches applications ’ query results and answers queries on their behalf. To address...
Optimistic Intra-Transaction Parallelism on Chip Multiprocessors (2005)
Colohan, Christopher B., Ailamaki, Anastassia, Gregory Steffan, J., Mowry, Todd C.
A Scalability Service for Dynamic Web Applications (2005)
Olston, Christopher, Manjhi, Amit, Garrod, Charles, Ailamaki, Anastassia, Maggs, Bruce M., Mowry, Todd C.
A Scalability Service for Dynamic Web Applications (2005)
Christopher Olston, Anastassia Ailamaki, Charles Garrod, Bruce M. Maggs, Amit Manjhi, Todd C. Mowry
Providers of dynamic Web applications are currently unable to accommodate heavy usage without significant investment in infrastructure and in-house management capability. Our goal is to develop...
Shimin Chen Anastassia, Shimin Chen, Anastassia Ailamaki, Phillip B. Gibbons, Todd C. Mowry
The key idea behind Inspector Joins is that during the I/O partitioning phase of a hash-based join, we have the opportunity to look at the actual data itself and then use this knowledge in two ways:...
Optimistic Intra-Transaction Parallelism (2005)
On Chip Multiprocessors, Christopher B. Colohan, Todd C. Mowry
With the advent of chip multiprocessors, exploiting intra-transaction parallelism is an attractive way of improving transaction performance. However, exploiting intra-transaction parallelism in...
A Scalability Service for Dynamic Web Applications (2005)
Christopher Olston Amit, Amit Manjhi, Charles Garrod, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry
Providers of dynamic Web applications are currently unable to accommodate heavy usage without significant investment in infrastructure and in-house management capability. Our goal is to develop...
Optimistic Intra-Transaction Parallelism on Chip Multiprocessors (2005)
Christopher B. Colohan, Anastassia Ailamaki, J. Gregory Steffan, Todd C. Mowry
With the advent of chip multiprocessors, exploiting intra-transaction parallelism is an attractive way of improving transaction performance. However, exploiting intra-transaction parallelism in...
Optimistic Intra-Transaction Parallelism on Chip Multiprocessors (2005)
Christopher B. Colohan, Anastassia Ailamaki, J. Gregory Steffan, Todd C. Mowry
With the advent of chip multiprocessors, exploiting intra-transaction parallelism is an attractive way of improving transaction performance. However, exploiting intra-transaction parallelism in...
Redesigning Database Systems in Light of CPU Cache Prefetching (2005)
Shimin Chen, Todd C. Mowry, Christos Faloutsos, Phillip B. Gibbons
through a generous fellowship. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or...
Optimistic Intra-Transaction Parallelism on Chip Multiprocessors (2005)
Christopher B. Colohan, Todd C. Mowry
With the advent of chip multiprocessors, exploiting intra-transaction parallelism is an attractive way of improving transaction performance. However, exploiting intra-transaction parallelism in...
Shimin Chen, Anastassia Ailamaki, Phillip B. Gibbons, Todd C. Mowry
The key idea behind Inspector Joins is that during the I/O partitioning phase of a hash-based join, we have the opportunity to look at the actual data itself and then use this knowledge in two ways:...
Optimistic Intra-Transaction Parallelism on Chip Multiprocessors (2005)
Christopher B. Colohan, Anastassia Ailamaki, J. Gregory Steffan, Todd C. Mowry
With the advent of chip multiprocessors, exploiting intra-transaction parallelism is an attractive way of improving transaction performance. However, exploiting intra-transaction parallelism in...
The stampede approach to thread-level speculation (2005)
J. Gregory Steffan, Christopher Colohan, Antonia Zhai, Todd C. Mowry
Multithreaded processor architectures are becoming increasingly commonplace: many current and upcoming designs support chip multiprocessing, simultaneous multithreading, or both. While it is...
Improving Hash Join Performance through Prefetching (2004)
Chen, Shimin, Ailamaki, Anastassia, Gibbons, Phillip B., Mowry, Todd C.
Improving Hash Join Performance through (2004)
Shimin Chen, Anastassia Ailamaki, Phillip B. Gibbons, Todd C. Mowry
This is a preliminary release of an article accepted by ACM Transactions on Database Systems. The definitive version is currently in production at ACM and, when released, will supersede this version.
Compiler optimization of memory-resident value communication between speculative threads (2004)
Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan, Todd C. Mowry
Efficient inter-thread value communication is essential for improving performance in Thread-Level Speculation (TLS). Although several mechanisms for improving value communication using hardware...
Improving Hash Join Performance through Prefetching (2004)
Shimin Chen, Anastassia Ailamaki, Phillip B. Gibbons, Todd C. Mowry
Hash join algorithms suffer from extensive CPU cache stalls. This paper shows that the standard hash join algorithm for disk-oriented databases (i.e. GRACE) spends over 73 % of its user time stalled...
Improving Hash Join Performance through Prefetching (2004)
Shimin Chen, Anastassia Ailamaki, Phillip B. Gibbons, Todd C. Mowry
Hash join algorithms suffer from extensive CPU cache stalls. This paper shows that the standard hash join algorithm for disk-oriented databases (i.e. GRACE) spends over 73 % of its user time stalled...
Improving Hash Join Performance through Prefetching (2004)
Shimin Chen, Anastassia Ailamaki, Phillip B. Gibbons, Todd C. Mowry
Hash join algorithms suffer from extensive CPU cache stalls. This paper shows that the standard hash join algorithm for disk-oriented databases (i.e. GRACE) spends over 73 % of its user time stalled...
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads (2004)
Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan, Todd C. Mowry
Efficient inter-thread value communication is essential for improving performance in Thread-Level Speculation (TLS). Although several mechanisms for improving value communication using hardware...
Improving Hash Join Performance through Prefetching (2003)
Shimin Chen Anastassia, Shimin Chen, Anastassia Ailamaki, Phillip B. Gibbons, Todd C. Mowry
Hash join algorithms suffer from extensive CPU cache stalls. This paper shows that the standard hash join algorithm for disk-oriented databases (i.e. GRACE) spends over 73% of its user time stalled...
A Case for Staged Database Systems (2003)
Stavros Harizopoulos, Panos K. Chrysanthis, Christos Faloutsos, Todd C. Mowry
not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity.
Improving value communication for thread-level speculation (2002)
J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, Todd C. Mowry
Thread-Level Speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this paper, we...
Fractal prefetching B+-trees: optimizing both cache and disk performance (2002)
Shimin Chen, Phillip B. Gibbonst, Todd C. Mowry, Gary Valentin
B+-Trees have been traditionally optimized for I/O perfor-mance with disk pages as tree nodes. Recently, researchers have proposed new types of B+-Trees optimized for CPU cache performance in main...
Compiler optimization of scalar value communication between speculative threads (2002)
Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan, Todd C. Mowry
While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential...
Improving value communication for thread-level speculation (2002)
J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, Todd C. Mowry
Thread-Level Speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this paper, we...
Fractal Prefetching B+-Trees: Optimizing Both Cache and Disk Performance (2002)
Fractal Prefetching B, Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin
Trees have been traditionally optimized for I/O performance with disk pages as tree nodes. Recently, researchers have proposed new types of B -Trees optimized for CPU cache performance in main memory...
Compiler optimization of scalar value communication between speculative threads (2002)
Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan, Todd C. Mowry
While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential...
Compiler optimization of scalar value communication between speculative threads (2002)
Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan, Todd C. Mowry
While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential...
Improving value communication for thread-level speculation (2002)
J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, Todd C. Mowry
Thread-Level Speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this paper, we...
Improving value communication for thread-level speculation (2002)
J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, Todd C. Mowry
Thread-Level Speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this paper, we...
Compiler Optimization of Scalar Value Communication between Speculative Threads (2002)
Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan, Todd C. Mowry
While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential...
Fractal prefetching B+-trees: optimizing both cache and disk performance (2002)
Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin
Using speculative execution to automatically hide I/O latency (2001)
from the Office of Naval Research and Intel, and by generous contributions from the member companies of
Compiler-based I/O prefetching for out-of-core applications (2001)
Angela Demke Brown, Todd C. Mowry, Orran Krieger
Current operating systems offer poor performance when a numeric application’s working set does not fit in main memory. As a result, programmers who wish to solve “out-of-core” problems...
Spiros Papadimitriou, Todd C. Mowry
Speculative execution is often the only way to overcome dataflow-imposed limitations and exploit parallelism when dependences can be discovered only at run-time. It also facilitates automatic...
Improving Index Performance through Prefetching (2001)
Shimin Chen, Phillip B. Gibbons, Todd C. Mowry
This paper proposes and evaluates Prefetching B
Exploring Thread-Level Speculation in Software: The (2001)
Effects Of Memory, Spiros Papadimitriou, Todd C. Mowry
Speculative execution is often the only way to overcome dataflow-imposed limitations and exploit parallelism when dependences can be discovered only at run-time. It also facilitates automatic...
School ofComputerScience (2001)
Fay W. Chang, Pittsburgh Pa, Gregoryr Ganger, Todd C. Mowry, James R. Larus
This research was sponsored by DARPA/ITO through DARPAOrder D306, and issued by Indian Head Division,NSWC undercontractN00174-96-0002.Additionalsupportwasprovidedbygraduatefellowships from the Office...
Improving Index Performance through Prefetching (2001)
Shimin Chen, Phillip B. Gibbons, Todd C. Mowry
ToddC.Mowry is partially supported by an Alfred P. Sloan Research Fellowship and by aFaculty Development Award from IBM. In recognition of the crucial role that cache hierarchies play in database...
Angela Demke Brown, Todd C. Mowry
{demke, t cn} @ cs. cmu. edu Out-of-core applications consume physical resources at a rapid rate, causing interactive applications sharing the same machine to exhibit poor response times. This...
Software-controlled multithreading using informing memory operations (2000)
Todd C. Mowry, Sherwyn R. Ramkissoon
Memory latency is becoming an increasingly important performance bottleneck, especially in multiprocessors. One technique for tolerating memory latency is multithreading, whereby we switch between...
Angela Demke Brown, Todd C. Mowry
Out-of-core applications consume physical resources at a rapid rate, causing interactive applications sharing the same machine to exhibit poor response times. This behavior is the result of default...
A Scalable Approach to Thread-Level Speculation (2000)
J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, Todd C. Mowry
While architects understandhow to build cost-effective parallel machines across a wide spectrum of machine sizes (ranging from within a single chip to large-scale servers), the real challenge is how...
Software-controlled multithreading using informing memory operations (2000)
Todd C. Mowry, Sherwyn R. Ramkissoon
To help tolerate the latency of accessing remote data in a shared-memory multiprocessor, we explore a novel approach to switch-on-miss multithreading that is software-controlled rather than...
Angela Demke Brown, Todd C. Mowry
Out-of-core applications consume physical resources at a rapid rate, causing interactive applications sharing the same machine to exhibit poor response times. This behavior is the result of default...
A Scalable Approach to Thread-Level Speculation (2000)
J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, Todd C. Mowry
While architects understandhow to build cost-effective parallel machines across a wide spectrum of machine sizes (ranging from within a single chip to large-scale servers), the real challenge is how...
Angela Demke Brown, Todd C. Mowry
Out-of-core applications consume physical resources at a rapid rate, causing interactive applications sharing the same machine to exhibit poor response times. This behavior is the result of default...
A Scalable Approach to Thread-Level Speculation (2000)
J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, Todd C. Mowry
While architects understandhow to build cost-effective parallel machines across a wide spectrum of machine sizes (ranging from within a single chip to large-scale servers), the real challenge is how...
A Scalable Approach to Thread-Level Speculation (2000)
J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, Todd C. Mowry
While architects understand how to build cost-effective parallel machines across a wide spectrum of machine sizes (ranging from within a single chip to large-scale servers), the real challenge is how...
Software-controlled multithreading using informing memory operations (2000)
Todd C. Mowry, Sherwyn R. Ramkissoon
Memory latency is becoming an increasingly important performance bottleneck, especially in multiprocessors. One technique for tolerating memory latency is multithreading, whereby we switch between...
A Scalable Approach to Thread-Level Speculation (2000)
J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, Todd C. Mowry
While architects understandhow to build cost-effective parallel machines across a wide spectrum of machine sizes (ranging from within a single chip to large-scale servers), the real challenge is how...
By optimizing data layout at run-time, we can potentially enhance the performance of caches by actively creating spatial locality, facilitating prefetching, and avoiding cache conflicts and false...
Automatic Compiler-Inserted Prefetching for Pointer-Based Applications (1999)
As the disparity between processor and memory speeds continues to grow, memory latency is becoming an increasingly important performance bottleneck. While software-controlled prefetching is an...
By optimizing data layout at run-time, we can potentially enhance the performance of caches by actively creating spatial locality, facilitating prefetching, and avoiding cache conflicts and false...
Predicting Data Cache Misses in Non-Numeric Applications Through Correlation Profiling (1998)
Mowry, Todd C., Luk, Chi-Keung
Software-based latency tolerance techniques offer the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. However, to fully...
Compiler and Hardware Support for Automatic Instruction Prefetching: A Cooperative Approach (1998)
Mowry, Todd C., Luk, Chi-Keung
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especially for commercial applications. Although instruction prefetching is an attractive technique for...
Software-Controlled Multithreading Using Informing Memory Operations, (1998)
Mowry, Todd C., Ramkissoon, Sherwyn R.
Memory latency is becoming an increasingly important performance bottleneck, especially in multiprocessors. One technique for tolerating memory latency is multithreading, whereby we switch between...
Steffan, J. Gregory, Colohan, Christopher B., Mowry, Todd C.
Thread Level Data Speculation (TLDS) is a technique which enables the optimistic parallelization of applications despite ambiguous data dependences between the resulting threads. Although TLDS is...
Improving Index Performance through Prefetching (1998)
Chen, Shimin, Gibbons, Phillip B., Mowry, Todd C.
In recognition of the crucial role that cache hierarchies play in database performance, recent studies have revisited core database algorithms and data structures in an effort to reduce the number of...
Informing Memory Operations: Memory Performance Feedback Mechanisms and Their Applications (1998)
Mark Horowitz, Margaret Martonosi, Todd C. Mowry, Michael D. Smith
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem...
J. Gregory, Ste Christopher, B. Colohan, Todd C. Mowry, Processor Processor, Processor Processor
Thread-Level Data Speculation (TLDS) is a technique which enables the optimistic parallelization of applications despite ambiguous data dependences between the resulting threads. Although TLDS is...
Instruction cache miss latency is becoming an increasingly importantperformance bottleneck, especially for commercial applications. Although instruction prefetching is an attractive technique for...
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization (1998)
J. Gregory Steffan, Todd C. Mowry
As we look to the future, and the prospect of a billion transistors on a chip, it seems inevitable that microprocessors will exploit having multiple parallel threads. To achieve the full potential of...
ComparativeEvaluation of Latency Tolerance Techniques for Software Distributed Shared Memory (1998)
Todd Mowry Computer, Todd C. Mowry
A key challenge in achieving high performanceon software DSMs is overcoming their relatively large communication latencies. In this paper, we consider two techniques which address this problem:...
Recent studies have shown that caches are not as effective as expected due to the insufficient reuse of cached data. One promising approach to increasing data reuse is to lay out data in a more...
Instruction cache miss latency is becoming an increasingly importantperformance bottleneck, especially for commercial applications. Although instruction prefetching is an attractive technique for...
Compiler and Hardware Support for Automatic Instruction Prefetching: A Cooperative Approach (1998)
Instruction cache miss latency is becoming an increasingly important performance bottleneck, especially for commercial applications. Although instruction prefetching is an attractive technique for...
J. Gregory, Ste Christopher, B. Colohan, Todd C. Mowry, Processor Processor, Processor Processor
Thread-Level Data Speculation (TLDS) is a technique which enables the optimistic parallelization of applications despite ambiguous data dependences between the resulting threads. Although TLDS is...
Informing Memory Operations: Memory Performance Feedback Mechanisms and Their Applications (1998)
Mark Horowitz, Margaret Martonosi, Todd C. Mowry, Michael D. Smith
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem...
The potential for thread-level data speculation in tightly-coupled multiprocessors (1997)
J. Gregory Steffan, Todd C. Mowry
To fully exploit the potential of single-chip multiprocessors, we must find a way to parallelize non-numeric applications. However, compilers have had little success in parallelizing nonnumeric codes...
Architectural Support for Thread-Level Data Speculation (1997)
J. Gregory, J. Gregory Steffan, Christopher B. Colohan, Todd C. Mowry
Thread-Level Data Speculation (TLDS) is a technique which enables the optimistic parallelization of applications despite uncertainty as to whether data dependences exist between the resulting threads...
Predicting Data Cache Misses in Non-Numeric Applications Through Correlation Profiling (1997)
Software-based latency tolerance techniques offer the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. However, to fully...
Predicting Data Cache Misses in Non-Numeric Applications Through Correlation Profiling (1997)
To fully exploit the benefit of software-based latency tolerance techniques, one must be careful to apply them only to the dynamic references that are likely to miss - otherwise the runtime overheads...
Predicting Data Cache Misses in Non-Numeric Applications Through Correlation Profiling (1997)
To maximize the benefit and minimize the overhead of software-based latency tolerance techniques, we would like to apply them precisely to the set of dynamic references that suffer cache misses....
Compiler-Based Prefetching for Recursive Data Structures (1996)
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. While prefetching has...
Compiler-Based Prefetching for Recursive Data Structures (1996)
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. While prefetching has...
Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors (1996)
Mark Horowitz, Margaret Martonosi, Todd C. Mowry, Michael D. Smith
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem...
Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors (1996)
Mark Horowitz, Margaret Martonosi, Todd C. Mowry, Michael D. Smith
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem...
Log-Based Architectures for General-Purpose Monitoring of Deployed Code (1996)
Shimin Chen, Babak Falsafi, Phillip B. Gibbons, Michael Kozuch, Todd C. Mowry, Radu Teodorescu, ...
Keywords: Log-Based Architectures, general-purpose task
Automatic compiler-inserted I/O prefetching for out-of-core applications (1996)
Todd C. Mowry, Angela K. Demke, Orran Krieger
Current operating systems offer poor performance when a numeric application’s working set does not fit in main memory. As a result, programmers who wish to solve “out-of-core ” problems...
Compiler-directed page coloring for multiprocessors (1996)
Edouard Bugnion, Jennifer M. Anderson, Todd C. Mowry, Mendel Rosenblum, Monica S. Lam
This paper presents a new technique, compiler-directed page coloring, that eliminates conflict misses in multiprocessor applications. It enables applications to make better use of the increased...
Informing memory operations: Providing memory performance feedback in modern processors (1996)
Mark Horowitz, Margaret Martonosi, Todd C. Mowry, Michael D. Smith
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem...
Compiler-directed page coloring for multiprocessors (1996)
Edouard Bugnion, Jennifer M. Anderson, Todd C. Mowry, Mendel Rosenblum, Monica S. Lam
This paper presents a new technique, compiler-directed page coloring, that eliminates conflict misses in multiprocessor applications. It enables applications to make better use of the increased...
Automatic Compiler-Inserted I/O Prefetching for Out-of-Core Applications (1996)
Todd C. Mowry, Angela K. Demke, Orran Krieger
Current operating systems offer poor performance when a numeric application's working set does not fit in main memory. As a result, programmers who wish to solve "out-of-core" problems...
Informing Loads: Enabling Software To Observe And React To Memory Behavior (1995)
Mark Horowitz, Mark Horowitz, Margaret Martonosi, Margaret Martonosi, Todd C. Mowry, Todd C. Mowry, ...
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem...
Tolerating latency through software-controlled data prefetching /--by Todd C. Mowry. (1994)
Submitted to the Department of Electrical Engineering.
Design and evaluation of a compiler algorithm for prefetching (1992)
Todd C. Mowry, Monica S. Lam, Anoop Gupta
Software-controlled data prefetching is a promising technique for improving the performance of the memory subsystem to match today's high-performance processors. While prefetching is useful in...