A Unified Operating System for Clouds and Manycore: fos (2009)
Modzelewski, Kevin, Miller, Jason, Belay, Adam, Beckmann, Nathan, Wentzlaff, David, ...
Single chip processors with thousands of cores will be available in the next ten years and clouds of multicore processors afford the operating system designer thousands of cores today. Constructing...
Smartlocks: Self-Aware Synchronization through Lock Acquisition Scheduling (2009)
Agarwal, Anant, Santambrogio, Marco D., Wingate, David, Eastep, Jonathan
As multicore processors become increasingly prevalent, system complexity is skyrocketing. The advent of the asymmetric multicore compounds this -- it is no longer practical for an average programmer...
Graphite: A Distributed Parallel Simulator for Multicores (2009)
Beckmann, Nathan, Eastep, Jonathan, Kurian, George, Kasture, Harshad, Miller, Jason E., ...
This paper introduces the open-source Graphite distributed parallel multicore simulator infrastructure. Graphite is designed from the ground up for exploration of future multicore processors...
Agarwal, Anant, Rinard, Martin, Sidiroglou, Stelios, Misailovic, Sasa, Hoffmann, Henry
Many modern computations (such as video and audio encoders, Monte Carlo simulations, and machine learning algorithms) are designed to trade off accuracy in return for increased performance. To date,...
Page 1 Virtual Wires: Overcoming Pin Limitations in FPGA-based Logic Emulators (2009)
Jonathan Babb, Russell Tessier, Anant Agarwal
Existing FPGA-based logic emulators suffer from limited inter-chip communication bandwidth, resulting in low gate utilization (10 to 20 percent). This resource imbalance increases the number of chips...
Application Heartbeats for Software Performance and Health (2009)
Miller, Jason, Agarwal, Anant, Santambrogio, Marco, Eastep, Jonathan, Hoffmann, Henry
Adaptive, or self-aware, computing has been proposed as one method to help application programmers confront the growing complexity of multicore software development. However, existing approaches to...
Application Heartbeats for Software Performance and Health (2009)
Miller, Jason, Agarwal, Anant, Santambrogio, Marco, Eastep, Jonathan, Hoffmann, Henry
Adaptive, or self-aware, computing has been proposed as one method to help application programmers confront the growing complexity of multicore software development. However, existing approaches to...
Partitioning Strategies for Concurrent Programming (2009)
Devadas, Srinivas, Agarwal, Anant, Hoffmann, Henry
This work presents four partitioning strategies, or patterns, useful for decomposing a serial application into multiple concurrently executing parts. These partitioning strategies augment the...
Partitioning Strategies for Concurrent Programming (2009)
Devadas, Srinivas, Agarwal, Anant, Hoffmann, Henry
This work presents four partitioning strategies, or patterns, useful for decomposing a serial application into multiple concurrently executing parts. These partitioning strategies augment the...
Remote Store Programming: Mechanisms and Performance (2009)
Wentzlaff, David, Agarwal, Anant, Hoffmann, Henry
This paper presents remote store programming (RSP). This paradigm combines usability and efficiency through the exploitation of a simple hardware mechanism, the remote store, which can easily be...
ATAC: A Manycore Processor with On-Chip Optical Network (2009)
Liu, Jifeng, Psota, James, Beckmann, Nathan, Miller, Jason, Michel, Jurgen, Eastep, Jonathan, ...
Ever since industry has turned to parallelism instead of frequency scaling to improve processor performance, multicore processors have continued to scale to larger and larger numbers of cores. Some...
ATAC: A Manycore Processor with On-Chip Optical Network (2009)
Liu, Jifeng, Psota, James, Beckmann, Nathan, Miller, Jason, Michel, Jurgen, Eastep, Jonathan, ...
Ever since industry has turned to parallelism instead of frequency scaling to improve processor performance, multicore processors have continued to scale to larger and larger numbers of cores. Some...
Remote Store Programming: Mechanisms and Performance (2009)
Wentzlaff, David, Agarwal, Anant, Hoffmann, Henry
This paper presents remote store programming (RSP). This paradigm combines usability and efficiency through the exploitation of a simple hardware mechanism, the remote store, which can easily be...
Abstract LoPC: Modeling Contention in Parallel Algorithms (2008)
Matthew I. Frank, Anant Agarwal
Parallel algorithm designers need computational models that take first order system costs into account, but are also simple enough to use in practice. This paper introduces the LoPC model, which is...
An Evaluation of Directory Schemes for Cache Coherence Abstract (2008)
Anant Agarwal, Richard Simoni, John Hennessy, Mark Horowitz
The problem of cache coherence in shared-memory multipre cessors has been addressed using two basic approaches: direc-tory schemes and snoopy cache schemes. Directory schemes have been given less...
Energy Scalability of On-Chip Interconnection Networks in Multicore Architectures (2008)
Konstantakopoulos, Theodoros, Eastep, Jonathan, Psota, James, Agarwal, Anant
On-chip interconnection networks (OCNs) such as point-to-point networks and buses form the communication backbone in systems-on-a-chip, multicore processors, and tiled processors. OCNs can consume...
Energy Scalability of On-Chip Interconnection Networks in Multicore Architectures (2008)
Konstantakopoulos, Theodoros, Eastep, Jonathan, Psota, James, Agarwal, Anant
On-chip interconnection networks (OCNs) such as point-to-point networks and buses form the communication backbone in systems-on-a-chip, multicore processors, and tiled processors. OCNs can consume...
The Case for a Factored Operating System (fos) (2008)
Wentzlaff, David, Agarwal, Anant
The next decade will afford us computer chips with 1,000 - 10,000 cores on a single piece of silicon. Contemporary operating systems have been designed to operate on a single core or small number of...
The Case for a Factored Operating System (fos) (2008)
Wentzlaff, David, Agarwal, Anant
The next decade will afford us computer chips with 1,000 - 10,000 cores on a single piece of silicon. Contemporary operating systems have been designed to operate on a single core or small number of...
Bayesian Inference Algorithm on Raw (2008)
Alda Luong, Anant Agarwal, Alda Luong
This work explores the performance of Raw, a parallel hardware platform developed at MIT, running a Bayesian inference algorithm. Motivation for examining this parallel system is a growing interest...
Versatile Tiled-Processor Architectures: The Raw Approach (2008)
Rodric M. Rabbah, Ian Bratt, Krste Asanovic, Anant Agarwal
Advances in VLSI technology have spurred an increasing interest within the computer architecture community to build a new kind of “all-purpose ” processor that is able to run a broad class of...
Scalar Operand Networks: Design, Implementation, and Analysis (2008)
Michael Bedford Taylor, Walter Lee, Saman Amarasinghe, Anant Agarwal
The bypass paths and multiported register files in microprocessors serve as an implicit interconnect to communicate operand values among pipeline stages and multiple ALUs. Previous superscalar...
1.l The Case for the Analytical Cache Model (2008)
Anant Agarwal, Mark Horowitz, John Hennessy
Trace-driven simulation and hardware measurement are the techniques most often used to obtain accurate performance figures for caches. The former requires a large amount of simulation time to...
Hank Hoffmann, Volker Strumpen, Anant Agarwal
is designed to address the problem of leveraging the growing number of on-chip resources
A Retrospective on The MIT Alewife Machine: Architecture and Performance (2008)
Stanford on directory schemes for cache coherence [1] (also included in this issue). Using data from small bus-based multiprocessors, this early work demonstrated that directory schemes were as...
Scalar Operand Networks: Design, Implementation, and Analysis ∗ (2008)
Michael Bedford Taylor, Walter Lee, Saman Amarasinghe, Anant Agarwal
The bypass paths and multiported register files in microprocessors serve as an implicit interconnect to communicate operand values among pipeline stages and multiple ALUs. Previous superscalar...
Abstract The MIT Alewife Machine: Architecture and Performance (2008)
Anant Agarwal, Ricardo Bianchini, David Chaiken Y, Kirk L. Johnson, David Kranz, John Kubiatowicz, ...
Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and cost-effective mesh network at a constant cost per node. The MIT Alewife machine, a...
Abstract The MIT Alewife Machine: Architecture and Performance (2008)
Anant Agarwal, Ricardo Bianchini, Davidchaiken Y, Kirk L. Johnson, David Kranz, John Kubiatowicz, ...
Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and cost-effective mesh network at a constant cost per node. The MIT Alewife machine, a...
How to Choose the Grain Size of a Parallel Computer (2008)
Donald Yeung, William J. Dally, Anant Agarwal
Abstract Designers of parallel computers have to decide how to apportion a machine's resources between processing, memory, and communication. How these resources are apportioned determine the...
Abstract The MIT Alewife Machine: Architecture and Performance (2008)
Anant Agarwal, Ricardo Bianchini, Davidchaiken Y, Kirk L. Johnson, David Kranz, John Kubiatowicz, ...
Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and cost-effective mesh network at a constant cost per node. The MIT Alewife machine, a...
How to Choose the Grain Size of a Parallel Computer (2008)
Donald Yeung, William J. Dally, Anant Agarwal
Designers of parallel computers have to decide how to apportion a machine's resources between processing, memory, and communication. How these resources are apportioned determine the grain and...
Abstract Maps: A Compiler-Managed Memory System for Raw Machines (2008)
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
This paper describes Maps, a compiler managed memory system for Raw architectures. Traditional processors for sequential programs maintain the abstraction of a unified memory by using a single...
Abstract The MIT Alewife Machine (2008)
Anant Agarwal, Ricardo Bianchini, David Chaiken, Frederic T. Chong, Kirk L. Johnson, Kenneth Mackenzie, ...
A variety of models for parallel architectures such as shared memory, message passing, and dataflow, have converged in the recent past to a hybrid architecture form called distributed shared memory...
Scalability Of, Dan Nussbaum, Anant Agarwal, Room Ne-b
Although scalability is generally considered an important issue in the design of parallel computer architectures, a commonly accepted, precise definition for scalability does not exist. This paper...
Modeling Multiprogrammed Caches (2008)
This paper presents a simple, yet accurate, model for multiprogrammed caches and validates it against trace-driven simulation. The model takes into account nonstationary behavior of processes and...
Donald Yeung, John Kubiatowicz, Anant Agarwal
Parallel workstations, each comprising tens of processors based on shared memory, promise cost-effective scalable multiprocessing. This paper explores the coupling of such small- to medium-scale...
Memory Assignment for Multiprocessor Caches through Grey Coloring (2007)
Anant Agarwal, John V. Guttag, Christoforos N. Hadjicostis, Marios C. Papaefthymiou
The achieved performance of multiprocessors is heavily dependent on the performance of their caches. Cache performance is severely degraded when data tiles used by a program conflict in the caches....
Eugene Weinstein, Purdy Ho, Bernd Heisele, Tomaso Poggio, Ken Steele, Anant Agarwal
Abstract. We describe initial eorts at implementing a exible, modular face identication framework in the context of a pervasive computing environment including handheld computers. Currently, the...
Memory Assignment for Multiprocessor Caches through Grey Coloring (2007)
Anant Agarwal, John V. Guttag, Christoforos N. Hadjicostis, Marios C. Papaerhymiou
The achieved performance of multiprocessors is heavily dependent on the performance of their caches. Cache performance is severely degraded when data tiles used by a program conflict in the caches....
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
Technological trends require that future scalable microprocessors be decentralized. Applying these trends toward memory systems shows that size of cache accessible in a single cycle will decrease in...
Michael Bedford Taylor, Jason Kim, Jason Miller, David Wentzlaff, Fae Ghodrat, Ben Greenwald, ...
load, store, integer, or floating-point instructions every cycle; controls 25 Gbytes/s of input/output (I/O) bandwidth; and has 2 Mbytes of on-chip distributed L1 static RAM providing on-chip memory...
Multiprocessor Address Tracing and Performance Analysis (2007)
David Kranz, David Chaiken, Anant Agarwal
The design of multiprocessors requires using data from real parallel programs to study the cost-performance tradeoffs in memory systems and interconnection networks. Practical methods of...
The Alewife Transaction Store (2007)
Anant Agarwal, Campbell L. Searle, John David Kubiatowicz, John David Kubiatowicz
in partial ful llment of the requirements for the degree of
Distributed Shared Memory (2007)
Kirk L. Johnson, Kirk Lauritz Johnson, Anant Agarwal, Frederic R. Morgenthaler
2
in the FUGU Scalable Workstation (2007)
Anant Agarwal, Arthur C. Smith, Kenneth Martin Mackenzie, Kenneth Martin Mackenzie
A scalable workstation is one vision of a mainstream parallel computer: a machine that combines scalable, fine-grain communication facilities for parallel applications with virtual memory and...
Abstract Virtual Wires: Overcoming Pin Limitations in FPGA-based Logic Emulators (2007)
Jonathan Babb, Russell Tessier, Anant Agarwal
Existing FPGA-based logic emulators suffer from limited inter-chip communication bandwidth, resulting in low gate utilization (10 to 20 percent). This resource imbalance increases the number of chips...
Abstract The Virtual Wires Emulation System: A Gate-Efficient ASIC Prototyping Environment (2007)
Russell Tessier, Jonathan Babb, Matthew Dahl, Silvina Hanono, Anant Agarwal
FPGA-based ASIC development systems have become important tools in contemporary ASIC design. Existing systems exhibit low per-FPGA gate utilization (10 to 20 percent) due to limited inter-chip...
References [1] Scalable Coherent Interface (SCI). ANSI/IEEE Std 1596- (2007)
Anant Agarwal, Richard Simoni, Mark Horowitz, David Bailey, John Barton, Thomas Lasinski, ...
Mary Vernon, who taught the computational science class out of which this work grew. We also thank Profs. David Wood and Jim Goodman for supporting this work, and finally Babak Falsafi and Alain...
Abstract LoPC: Modeling Contention in Parallel Algorithms (2007)
Matthew I. Frank, Anant Agarwal
Parallel algorithm designers need computational models that take first order system costs into account, but are also simple enough to use in practice. This paper introduces the LoPC model, which is...
Integrating Data Caching into the SUDS Runtime (2007)
Anant Agarwal, Kevin W. Wilson, Kevin W. Wilson
by
Frederic T. Chong, Beng-hong Lim, Ricardo Bianchini, John Kubiatowicz, Anant Agarwal
This study reports on the performance of several applications on the Alewife machine, focusing on emerging applications and evolving architectural mechanisms. It shows that low-latency cache miss...
Automatic Partitioning of Parallel Loops and Data Arrays for (2007)
Distributed Shared Memory, Anant Agarwal, David Kranz, Venkat Natarajan
This paper presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency tra#c on shared#memory multiprocessors. While several previ# ous papers...
David Wentzlaff, Anant Agarwal
General purpose computing architectures are being called on to work on a more diverse application mix every day. This has been fueled by the need for reduced time to market and economies of scale...
Scalar Operand Networks: Design, Implementation, and Analysis (2007)
Michael Bedford Taylor, Walter Lee, Saman Amarasinghe, Anant Agarwal
The bypass paths and multiported register files in microprocessors serve as an implicit interconnect to communicate operand values among pipeline stages and multiple ALUs. Previous superscalar...
Energy Scalability of On-Chip Interconnection Network (2007)
Theodoros Konstantakopoulos, Jonathan Eastep, James Psota, Anant Agarwal
On-chip interconnection networks (OCNs) such as point-to-point networks and buses form the communication backbone in systems-on-a-chip, multicore processors, and tiled processors. OCNs can consume...
Introspective Computing (2007)
Jonathan M. Eastep, Anant Agarwal, Jonathan M. Eastep
known or hereafter created.
Constructing Virtual Architectures on a Tiled Processor (2006)
David Wentzlaff, Anant Agarwal
As the amount of available silicon resources on one chip increases, we have seen the advent of ever increasing parallel resources integrated on-chip. Many architectures use these resources as...
Gigabit Routing on a Software-exposed Tiled-Microprocessor (2005)
Umar Saif, James Anderson, Anthony Digangi, Anant Agarwal
This paper investigates the suitability of emerging tiled-architectures, equipped with low-latency on-chip networks, for high-performance network routing. In this paper, we present the design,...
Scalar Operand Networks (2005)
Michael Bedford Taylor, Walter Lee, Saman P. Amarasinghe, Anant Agarwal
Abstract—The bypass paths and multiported register files in microprocessors serve as an implicit interconnect to communicate operand values among pipeline stages and multiple ALUs. Previous...
Versatility and VersaBench: A New Metric and a Benchmark Suite for Flexible Architectures (2004)
Rabbah, Rodric M., Bratt, Ian, Asanovic, Krste, Agarwal, Anant
For the last several decades, computer architecture research has largely benefited from, and continues to be driven by ad-hoc benchmarking. Often the benchmarks are selected to represent workloads...
Versatility and VersaBench: A New Metric and a Benchmark Suite for Flexible Architectures (2004)
Rabbah, Rodric M., Bratt, Ian, Asanovic, Krste, Agarwal, Anant
For the last several decades, computer architecture research has largely benefited from, and continues to be driven by ad-hoc benchmarking. Often the benchmarks are selected to represent workloads...
Scalar Operand Networks: Design, Implementation, and Analysis (2004)
Taylor, Michael Bedford, Lee, Walter, Amarasinghe, Saman, Agarwal, Anant
The bypass paths and multiported register files in microprocessors serve as an implicit interconnect tocommunicate operand values among pipeline stages and multiple ALUs. Previous superscalar designs...
Scalar Operand Networks: Design, Implementation, and Analysis (2004)
Taylor, Michael Bedford, Lee, Walter, Amarasinghe, Saman, Agarwal, Anant
The bypass paths and multiported register files in microprocessors serve as an implicit interconnect tocommunicate operand values among pipeline stages and multiple ALUs. Previous superscalar designs...
A 1020-Node Modular Microphone Array and Beamformer for Intelligent Computing Spaces (2004)
Weinstein, Eugene, Steele, Kenneth, Agarwal, Anant, Glass, James
Ubiquitous computing environments are characterized by an unboundedamount of noise and crosstalk. In these environments, traditionalmethods of sound capture are insufficient, and array microphones...
A 1020-Node Modular Microphone Array and Beamformer for Intelligent Computing Spaces (2004)
Weinstein, Eugene, Steele, Kenneth, Agarwal, Anant, Glass, James
Ubiquitous computing environments are characterized by an unboundedamount of noise and crosstalk. In these environments, traditionalmethods of sound capture are insufficient, and array microphones...
Wentzlaff, David, Agarwal, Anant
General purpose computing architectures are being called on to work on amore diverse application mix every day. This has been fueled by the needfor reduced time to market and economies of scale that...
Wentzlaff, David, Agarwal, Anant
General purpose computing architectures are being called on to work on amore diverse application mix every day. This has been fueled by the needfor reduced time to market and economies of scale that...
Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ilp and streams (2004)
Michael Bedford Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, ...
This paper evaluates the Raw microprocessor. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications...
LOUD: A 1020-node modular microphone array and beamformer for intelligent computing spaces (2004)
Eugene Weinstein, Kenneth Steele, Anant Agarwal, James Glass
Abstract. Ubiquitous computing environments are characterized by an unbounded amount of noise and crosstalk. In these environments, traditional methods of sound capture are insufficient, and array...
Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ilp and streams (2004)
Michael Bedford Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, ...
This paper evaluates the Raw microprocessor. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications...
Versatility and versabench: A new metric and a benchmark suite for flexible architectures (2004)
Rodric M. Rabbah, Ian Bratt, Krste Asanovic, Anant Agarwal
For the last several decades, computer architecture research has largely benefited from, and continues to be driven by ad-hoc benchmarking. Often the benchmarks are selected to represent workloads...
Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ilp and streams (2004)
Michael Bedford Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, ...
This paper evaluates the Raw microprocessor. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications...
Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ilp and streams (2004)
Michael Bedford Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, ...
This paper evaluates the Raw microprocessor. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications...
Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ilp and streams (2004)
Michael Bedford Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, ...
This paper evaluates the Raw microprocessor. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications...
Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ilp and streams (2004)
Michael Bedford Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, ...
This paper evaluates the Raw microprocessor. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications...
A Stream Algorithm for the SVD (2003)
Strumpen, Volker, Hoffmann, Henry, Agarwal, Anant
We present a stream algorithm for the Singular-Value Decomposition (SVD) of anM X N matrix A. Our algorithm trades speed of numerical convergence for parallelism,and derives from a one-sided,...
A Stream Algorithm for the SVD (2003)
Strumpen, Volker, Hoffmann, Henry, Agarwal, Anant
We present a stream algorithm for the Singular-Value Decomposition (SVD) of anM X N matrix A. Our algorithm trades speed of numerical convergence for parallelism,and derives from a one-sided,...
Stream Algorithms and Architecture (2003)
Volker Strumpen, Volker Strumpen, Anant Agarwal, Anant Agarwal
Wire-exposed, programmable microarchitectures including Trips [11], Smart Memories [8], and Raw [13] oer an opportunity to schedule instruction execution and data movement explicitly. This paper...
Stream Algorithms and Architecture (2003)
Henry Hoffmann, Volker Strumpen, Anant Agarwal, Henry Hoffmann, Volker Strumpen, Anant Agarwal
Wire-exposed, programmable microarchitectures including Trips [11], Smart Memories [8], and Raw [13] offer an opportunity to schedule instruction execution and data movement explicitly. This paper...
Scalar operand networks: On-chip interconnect for ILP in partitioned architectures (2003)
Michael Bedford Taylor, Walter Lee, Saman Amarasinghe, Anant Agarwal
The bypass paths and multiported register files in microprocessors serve as an implicit interconnect to communicate operand values between pipeline stages and multiple ALUs. Previous superscalar...
A Stream Algorithm for the SVD (2003)
Volker Strumpen, Henry Hoffmann, Volker Strumpen, Anant Agarwal, Anant Agarwal
We present a stream algorithm for the Singular-Value Decomposition (SVD) of an MN matrix A. Our algorithm trades speed of numerical convergence for parallelism, and derives from a one-sided,...
Stream Algorithms and Architecture (2003)
Henry Hoffmann, Stream Algorithms, Architecture Henry Hoffmann, Volker Strumpen, Volker Strumpen, Anant Agarwal, ...
Stream Algorithms and Architecture (2003)
Volker Strumpen, Henry Hoffmann, Anant Agarwal
Wire delay and power consumption are primary obstacles to the continued scaling of microprocessor performance. Fundamentally, both issues are addressed by the emerging breed of singlechip, tiled...
A Stream Algorithm for the SVD (2003)
Volker Strumpen, Volker Strumpen, Henry Hoffmann, Henry Hoffmann, Anant Agarwal, Anant Agarwal
We present a stream algorithm for the Singular-Value Decomposition (SVD) of an M ×N matrix A. Our algorithm trades speed of numerical convergence for parallelism, and derives from a one-sided,...
On-Chip Instrument Caches for High Performance Processors, (2002)
Agarwal,Anant, Chow,Paul, Horowitz,Mark, Acken,John, Salz,Arturo
Continued increases in clock rates of VLSI processors demand a reduction in the frequency of expensive off-chip memory references. Without such a reduction, the chip crossing time and the constraints...
ATUM: A New Technique for Capturing Address Traces Using Microcode, (2002)
Agarwal,Anant, Sites,Richard L., Horowitz,Mark
Trace-driven simulation is often used in the design of computer systems, especially caches and translation lookaside buffers. Capturing address traces to drive such simulations has been problematic,...
Handheld Face Identification Technology in a Pervasive Computing Environment (2002)
Eugene Weinstein, Purdy Ho, Bernd Heisele, Tomaso Poggio, Ken Steele, Anant Agarwal
We describe initial e#orts at implementing a flexible, modular face identification framework in the context of a pervasive computing environment including handheld computers. Currently, the handheld...
Handheld face identification technology in a pervasive computing environment (2002)
Eugene Weinstein, Purdy Ho, Bernd Heisele, Tomaso Poggio, Ken Steele, Anant Agarwal
Abstract. We describe initial efforts at implementing a flexible, modular face identification framework in the context of a pervasive computing environment including handheld computers. Currently,...
Handheld face identification technology in a pervasive computing environment (2002)
Eugene Weinstein, Purdy Ho, Bernd Heisele, Tomaso Poggio, Ken Steele, Anant Agarwal
Abstract. We describe initial efforts at implementing a flexible, modular face identification framework in the context of a pervasive computing environment including handheld computers. Currently,...
SimpleFit: A Framework for Analyzing Design Tradeoffs in Raw Architectures (2001)
Csaba Andras Moritz, Donald Yeung, Anant Agarwal
AbstractÐThe semiconductor industry roadmap projects that advances in VLSI technology will permit more than one billion transistors on a chip by the year 2010. The MIT Raw microprocessor is a...
SimpleFit: A Framework for Analyzing Design Tradeoffs in Raw Architectures (2001)
Csaba Andras Moritz, Donald Yeung, Anant Agarwal
AbstractÐThe semiconductor industry roadmap projects that advances in VLSI technology will permit more than one billion transistors on a chip by the year 2010. The MIT Raw microprocessor is a...
SimpleFit: A Framework for Analyzing Design Tradeoffs in Raw Architectures (2001)
Csaba Andras Moritz, Donald Yeung, Anant Agarwal
AbstractÐThe semiconductor industry roadmap projects that advances in VLSI technology will permit more than one billion transistors on a chip by the year 2010. The MIT Raw microprocessor is a...
SimpleFit: A Framework for Analyzing Design Tradeoffs in Raw Architectures (2001)
Csaba Andras Moritz, Donald Yeung, Anant Agarwal
AbstractÐThe semiconductor industry roadmap projects that advances in VLSI technology will permit more than one billion transistors on a chip by the year 2010. The MIT Raw microprocessor is a...
The Raw processor: A composeable 32-bit fabric for embedded and general purpose computing (2001)
Michael Taylor, Jason Kim, Jason Miller, Fae Ghodrat, Ben Greenwald, Paul Johnson, ...
The Raw project is attempting to create a scalable processor architecture that is suitable for both general purpose and embedded computations. Current general purpose processors differ from embedded...
Compiler Support for Scalable and Efficient Memory Systems (2001)
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
AbstractÐTechnological trends require that future scalable microprocessors be decentralized. Applying these trends toward memory systems shows that the size of the cache accessible in a single cycle...
6.002 Circuits and Electronics, Fall 2000 (2000)
Agarwal, Anant, Lang, Jeffrey H. (Jeffrey Hastings)
Fundamentals of the lumped circuit abstraction. Resistive elements and networks; independent and dependent sources; switches and MOS devices; digital abstraction; amplifiers; and energy storage...
6.002 Circuits and Electronics, Fall 2000 (2000)
Agarwal, Anant, Lang, Jeffrey H. (Jeffrey Hastings)
Fundamentals of the lumped circuit abstraction. Resistive elements and networks; independent and dependent sources; switches and MOS devices; digital abstraction; amplifiers; and energy storage...
Multigrain Shared Memory y (2000)
Donald Yeung Z, John Kubiatowicz, Anant Agarwal
Parallel workstations, each comprising tens of processors based on shared memory, promise cost-e ective scalable multiprocessing. This paper explores the coupling of such small- to medium-scale...
Multigrain Shared Memory y (2000)
Donald Yeung Z, John Kubiatowicz, Anant Agarwal
Parallel workstations, each comprising tens of processors based on shared memory, promise cost-e ective scalable multiprocessing. This paper explores the coupling of such small- to medium-scale...
Maps: A Compiler-Managed Memory System for Raw Machines (1999)
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
This paper describes Maps, a compiler managed memory system for Raw architectures. Traditional processors for sequential programs maintain the abstraction of a unified memory by using a single...
Maps: A Compiler-Managed Memory System for Raw Machines (1999)
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
Microprocessors of the next decade and beyond will be built using VLSI chips employing billions of transistors. In this generation of microprocessors, achieving a high level of parallelism at a...
Maps: A Compiler-Managed Memory System for Raw Machines (1999)
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
Microprocessors of the next decade and beyond will be built using VLSI chips employing billions of transistors. In this generation of microprocessors, achieving a high level of parallelism at a...
Maps: A Compiler-Managed Memory System for Raw Machines (1999)
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
This paper describes Maps, the compiler managed memory system for a Raw architecture. Traditional processors for sequential programs maintain the abstraction of a unified memory by using a single...
The MIT alewife machine (1999)
Anant Agarwal, Ricardo Bianchini, David Chaiken, Frederic T. Chong, Kirk L. Johnson, David Kranz, ...
yy
SUDS: Primitive Mechanisms for Memory Dependence Speculation (1999)
Matthew Frank Andras, Matthew Frank, C. Andras Moritz, Benjamin Greenwald, Saman Amarasinghe, Anant Agarwal
As VLSI chip sizes and densities increase, it becomes possible to put many processing elements on a single chip and connect them together with a low latency communication network. In this paper we...
SUDS: Primitive Mechanisms for Memory Dependence Speculation (1999)
Matthew Frank, C. Andras Moritz, Benjamin Greenwald, Saman Amarasinghe, Anant Agarwal
As VLSI chip sizes and densities increase, it becomes possible to put many processing elements on a single chip and connect them together with a low latency communication network. In this paper we...
Raw Architecture Workstation (1999)
Michael Bedford Taylor, Anant Agarwal, Arthur C. Smith, Michael Bedford Taylor
by
On-Chip Instruction Caches for High Performance Processors, (1998)
Agarwal,Anant, Chow,Paul, Horowitz,Mark, Acken,John, Salz,Arturo
Continued increases in clock rates of VLSI processors demand a reduction in the frequency of expensive off-chip memory references. Without such a reduction, the chip crossing time and the constraints...
An Analytical Cache Model. (1998)
Agarwal, Anant, Horowitz, Mark, Hennessy, John
Trace driven simulation and hardware measurement are the techniques most often used to obtain accurate performance figures for caches. The former requires a large amount of simulation time to...
Evaluating the Performance of Software Cache Coherence. (1998)
In a shared-memory multiprocessor with private caches, cached copies of a data item must be kept consistent. This is called cache coherence. Both hardware and software coherence schemes have been...
Critical Problems in Very Large Scale Computer Systems. (1998)
Agarwal, Anant, Dally, William J., Devadas, Srinivas
The research vehicle for this contract is the largest possible computer that could be conceived for the mid to late 1990s. The technical challenges of such a machine serve as the guiding stimulus for...
Critical Problems in Very Large Scale Computer Systems. (1998)
Agarwal, Anant, Dally, William J., Glasser, Lance A.
This is the first semiannual report on this contract. The purpose of the present contract is to investigate limiting technologies for a very large computer system, one which, if built during the mid...
Memory-Reference Characteristics of Multiprocessor Applications under MACH, (1998)
Shared-memory multiprocessors have received wide attention in recent times as a means of achieving high-performance cost-effectively. Their viability requires a thorough understanding of the memory...
Critical Problems in Very Large Scale Computer Systems. (1998)
Agarwal, Anant, Dally, William J., Devadas, Srinivas
The research vehicle for this contract is the largest p ossible computer that could be conceived for the mid to late 1990s. The technical challenges of such a machine serve as the guiding stimulus...
Temporal, Processor, and Spatial Locality in Multiprocessor Memory References. (1998)
The performance of cache-coherent multiprocessors is strongly influenced by locality in the memory reference behavior of parallel applications. While the notions of temporal and spatial locality in...
Evaluating the Performance of Software Cache Coherence. (1998)
In a shared-memory multiprocessor with private caches, cached copies of a data item must be kept consistent. This is called cache coherence. Both hardware and software coherence schemes have been...
Adaptive Backoff Synchronization Techniques, (1998)
Agarwal, Anant, Cherian, Mathews
Shared-memory multiprocessors commonly use shared variables for synchronization. Our simulations of real parallel applications show that large-scale cache-coherent multiprocessors suffer significant...
Temporal, Processor, and Spatial Locality in Multiprocessor Memory References. (1998)
The performance of cache-coherent multiprocessors is strongly influenced by locality in the memory reference behavior of parallel applications. While the notions of temporal and spatial locality in...
Adaptive Backoff Synchronization Techniques. (1998)
Agarwal, Anant, Cherian, Mathews
Shared-memory multiprocessors commonly use shared variables for synchronization. Our simulations of real parallel applications show that large-scale cache-coherent multiprocessors suffer significant...
Critical Problems in Very Large Scale Computer Systems. (1998)
Agarwal, Anant, Dally, William J., Devadas, Srinivas, Leighton, F. T.
The research vehicle for this contract is the largest possible computer that could be conceived for the mid to late 1990's. The technical challenges of such a machine serve as our guiding stimulus...
Modeling A Circuit Switched Multiprocessor Interconnect, (1998)
Nussbaum, Daniel, Vuong-Adlerberg, Ingmar, Agarwal, Anant
This study began as an attempt to understand discrepancies between Patel's classic model of a circuit-switched interconnection network and simulations as part of the MIT ALEWIFE Multiprocessor...
Performance Tradeoffs in Multithreaded Processors, (1998)
High network and memory latencies in large-scale multiprocessors can cause a significant drop in processor utilization. Overlapping computation from alternate processes with memory accesses in...
A Locality-BAsed Multiprocessor Cache Interference Model, (1998)
Keeping data consistent in cache-coherent multiprocessors often requires the invalidation of cached blocks and results in higher miss rates. The increase in the cache miss rate due to invalidations...
Critical Problems in Very Large Scale Computer Systems. (1998)
Agarwal, Anant, Dally, William J., Devadas, Srinivas, Leighton, F. T.
The research vehicle for this contract is the largest possible computer that can be conceived for the mid to late 1990's. We call this machine an American Resource Computer or ARC. We imagine this...
Critical Problems in Very Large Scale Computer Systems. (1998)
Agarwal, Anant, Dally, William J., Devadas, Srinivas, Leighton, F. T.
The research vehicle for this contract is the largest possible computer than can be conceived for the mid to late 1990's. We call this machine an 'American Resource Computer' or 'ARC'. We imagine...
APRIL: A Processor Architecture for Multiprocessing. (1998)
Agarwal, Anant, Lim, Beng-Hong, Kranz, David, Kubiatowicz, John
Processors in large scale multiprocessors must be able to tolerate large communication latencies and synchronization delays. This paper describes the architecture of a rapid-context-switching...
LimitLESS Directories: A Scalable Cache Coherence Scheme. (1998)
Chaiken, David, Kubiatowicz, John, Agarwal, Anant
Caches enhance the performance of multiprocessors by reducing network traffic and average memory access latency. However, cache-based systems must address the problem of cache coherence. We propose...
The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor. (1998)
Agarwal, Anant, Chaiken, David, Johnson, Kirk, Kranz, David, Kubiatowicz, John
The Alewife multiprocessor project focuses on the architecture and design of a large-scale parallel machine. The machine uses a low dimension direct interconnection network to provide scalable...
The Raw Fabric: A Technology for Rapid Embedded System Customization (1998)
Agarwal, Anant, Amarasinghe, Saman
This report is the final technical report for the project: The Raw Fabric: A Technology for Rapid Embedded System Customization. The Raw fabric is a universal computational substrate suitable for...
Versatile Tiled-Processor Architectures: The Raw Approach (1998)
Rabbah, Rodric M., Bratt, Ian, Agarwal, Anant, Asanovic, Krste
This presentation will describe the Raw architecture, its implementation, and performance. We will focus on Raw's ability to support a diverse set of applications (ranging from desktop to embedded...
Memory Bank Disambiguation using Modulo Unrolling for Raw Machines (1998)
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
The Raw approach of replicated processor tiles interconnected with a fast static mesh network provides a simple, scalable design that maximizes the resources available in next generation processor...
Memory Bank Disambiguation using Modulo Unrolling for Raw Machines (1998)
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
This paper presents modulo unrolling, a code transformation technique for enabling array references to be accessed through the fast static network on a Raw machine. A Raw machine comprises of a mesh...
Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine (1998)
Walter Lee, Rajeev Barua, Devabhaktuni Srikrishna, Jonathan Babb, Vivek Sarkar, Saman Amarasinghe, ...
Advances in VLSI technology will enable chips with over a billion transistors within the next decade. Unfortunately, the centralized-resource architectures of modern microprocessors are illsuited to...
Integrated Shared-Memory and Message-Passing Communication in the Alewife Multiprocessor (1998)
Anant Agarwal, John David Kubiatowicz, John David Kubiatowicz
Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine (1998)
Walter Lee, Rajeev Barua, Devabhaktuni Srikrishna, Jonathan Babb, Vivek Sarkar, Saman Amarasinghe, ...
Advances in VLSI technology will enable chips with over a billion transistors within the next decade. Unfortunately, the centralized-resource architectures of modern microprocessors are illsuited to...
Memory Bank Disambiguation using Modulo Unrolling for Raw Machines (1998)
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
barua,walt,saman,agarwal¢ This paper presents modulo unrolling, a code transformation technique for enabling array references to be accessed through the fast static network on a Raw machine. A Raw...
Integrated Shared-Memory and Message-Passing Communication in the Alewife Multiprocessor (1998)
Anant Agarwal, John David Kubiatowicz, John David Kubiatowicz
To date, MIMDmultiprocessors have been divided into two classes based on hardware communication models: those supporting shared memory and those supporting message passing. Breaking with tradition,...
The Sensitivity of Communication Mechanisms to Bandwidth and Latency (1998)
Frederic T. Chong, Rajeev Barua, Fredrik Dahlgren, John D. Kubiatowicz, Anant Agarwal
The goal of this paper is to gain insight into the relative performance of communication mechanisms as bisection bandwidth and network latency vary. We compare shared memory with and without...
The Sensitivity of Communication Mechanisms to Bandwidth and Latency (1998)
Frederic Chong Rajeev, Rajeev Barua, Fredrik Dahlgren Z, John D. Kubiatowicz, Anant Agarwal
The goal of this paper is to gain insight into the relative performance of communication mechanisms as bisection bandwidth and network latency vary. We compare shared memory with and without...
Maps: A Compiler-Managed Memory System for Raw Machines (1998)
Rajeev Barua Walter, Walter Lee, Saman Amarasinghe, Anant Agarwal
This paper describes Maps, a compiler managed memory system for Raw architectures. Traditional processors for sequential programs maintain the abstraction of a unified memory by using a single...
The Sensitivity of Communication Mechanisms to Bandwidth and Latency (1998)
Frederic Chong, Rajeev Barua, Fredrik Dahlgren, John D. Kubiatowicz, Anant Agarwal
The goal of this paper is to gain insight into the relative performance of communication mechanisms as bisection bandwidth and network latency vary. We compare shared memory with and without...
Integrated Shared-Memory and Message-Passing Communication in the Alewife Multiprocessor (1998)
Anant Agarwal, John D. Kubiatowicz, John David Kubiatowicz
To date, MIMD multiprocessors have been divided into two classes based on hardware communication models: those supporting shared memory and those supporting message passing. Breaking with tradition,...
Exploiting Two-Case Delivery for Fast Protected Messaging (1998)
Kenneth Mackenzie, John Kubiatowicz, Matthew Frank, Walter Lee, Victor Lee, Anant Agarwal, ...
We propose and evaluate two complementary techniques to protect and virtualize a tightly-coupled network interface in a multicomputer. The techniques allow efficient, direct application access to...
The Sensitivity of Communication Mechanisms to Bandwidth and Latency (1998)
Frederic Chong Rajeev, Rajeev Barua, Fredrik Dahlgren Z, John D. Kubiatowicz, Anant Agarwal
The goal of this paper is to gain insight into the relative performance of communication mechanisms as bisection bandwidth and network latency vary. We compare shared memory with and without...
Exploring Optimal Cost-Performance Designs for Raw Microprocessors (1998)
Csaba Andras Moritz, Donald Yeung, Moritz Donald, Anant Agarwal
The semiconductor industry roadmap projects that advances in VLSI technology will permit more than one billion transistors on a chip by the year 2010. The MIT Raw microprocessor is a proposed...
Maps: A Compiler-Managed Memory System for Raw Machines (1998)
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
This paper describes Maps, a compiler managed memory system for Raw architectures. Traditional processors for sequential programs maintain the abstraction of a unified memory by using a single...
Memory Bank Disambiguation using Modulo Unrolling for Raw Machines (1998)
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
The Raw approach of replicated processor tiles interconnected with a fast static mesh network provides a simple, scalable design that maximizes the resources available in next generation processor...
Memory Bank Disambiguation using Modulo Unrolling for Raw Machines (1998)
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
This paper presents modulo unrolling, a code transformation technique for enabling array references to be accessed through the fast static network on a Raw machine. A Raw machine comprises of a mesh...
Memory Bank Disambiguation using Modulo Unrolling for Raw Machines (1998)
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
The Raw approach of replicated processor tiles interconnected with a fast static mesh network provides a simple, scalable design that maximizes the resources available in next generation processor...
Exploiting Two-Case Delivery for Fast Protected Messaging (1998)
Kenneth Mackenzie, John Kubiatowicz Y, Matthew Frank, Walter Lee, Victor Lee Z, Anant Agarwal, ...
We propose and evaluate two complementary techniques to protect and virtualize a tightly-coupled network interface in a multicomputer. The techniques allow efficient, direct application access to...
Memory Bank Disambiguation using Modulo Unrolling for Raw Machines (1998)
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal
This paper presents modulo unrolling, a code transformation technique for enabling array references to be accessed through the fast static network on a Raw machine. A Raw machine comprises of a mesh...
Exploiting Two-Case Delivery for Fast Protected Messaging (1998)
Kenneth Mackenzie, John Kubiatowicz Y, Matthew Frank, Walter Lee, Victor Lee Z, Anant Agarwal, ...
We propose and evaluate two complementary techniques to protect and virtualize a tightly-coupled network interface in a multicomputer. The techniques allow efficient, direct application access to...
Baring it all to software: Raw machines (1997)
Elliot Waingold, Michael Taylor, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, ...
Rapid advances in technology force a quest for computer architectures that exploit new opportunities and shed existing mechanisms that do not scale. Current architectures, such as hardware scheduled...
Baring it all to software: Raw machines (1997)
Elliot Waingold, Michael Taylor, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, ...
Rapid advances in technology force a quest for computer architectures that exploit new opportunities and shed existing mechanisms that do not scale. Current architectures, such as hardware scheduled...
Baring it all to software: Raw machines (1997)
Elliot Waingold, Michael Taylor, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, ...
Rapid advances in technology force a quest for computer architectures that exploit new opportunities and shed existing mechanisms that do not scale. Current architectures, such as hardware scheduled...
The raw benchmark suite: Computation structures for general purpose computing (1997)
Jonathan Babb, Matthew Frank, Victor Lee, Elliot Waingold, Rajeev Barua, Michael Taylor, ...
The RAW benchmark suite consists of twelve programs designed to facilitate comparing, validating, and improving reconfigurable computing systems. These benchmarks run the gamut of algorithms found in...
Baring it all to software: Raw machines (1997)
Elliot Waingold, Michael Taylor, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, ...
Rapid advances in technology force a quest for computer architectures that exploit new opportunities and shed existing mechanisms that do not scale. Current architectures, such as hardware scheduled...
Logic Emulation with Virtual Wires (1997)
Jonathan Babb, Russell Tessier, Matthew Dahl, Silvina Hanono, David Hoki, Anant Agarwal
Logic emulation enables designers to functionally verify complex integrated circuits prior to chip fabrication. However, traditional FPGA-based logic emulators have poor inter-chip communication...
LoPC: Modeling Contention in Parallel Algorithms (1997)
Matthew Frank Anant, Matthew I. Frank, Anant Agarwal, Mary K. Vernon
Parallel algorithm designers need computational models that take first order system costs into account, but are also simple enough to use in practice. This paper introduces the LoPC model, which is...
LoPC: Modeling Contention in Parallel Algorithms (1997)
Matthew I. Frank, Anant Agarwal, Mary K. Vernon
Parallel algorithm designers need computational models that take first order system costs into account, but are also simple enough to use in practice. This paper introduces the LoPC model, which is...
The RAW Benchmark Suite: Computation Structures for General Purpose Computing (1997)
Jonathan Babb, Matthew Frank, Victor Lee, Elliot Waingold, Rajeev Barua, Michael Taylor, ...
The RAW benchmark suite consists of twelve programs designed to facilitate comparing, validating, and improving reconfigurable computing systems. These benchmarks run the gamut of algorithms found in...
Baring it all to software: Raw machines (1997)
Elliot Waingold, Michael Taylor, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, ...
Rapid advances in technology force a quest for computer architectures that exploit new opportunities and shed existing mechanisms that do not scale. Current architectures, such as hardware scheduled...
Rajeev Barua, David Kranz, Anant Agarwal
Harnessing the full performance potential of cache-coherent distributed shared memory multiprocessors without inordinate user effort requires a compilation technology that can automatically manage...
Shared memory versus message passing for iterative solution of sparse, irregular problems (1996)
Frederic T. Chong, Anant Agarwal
The benefits of hardware support for shared memory versus those for message passing are difficult to evaluate without an in-depth study of real applications on a common platform. We evaluate the...
UDM: User Direct Messaging for General-Purpose Multiprocessing (1996)
Kenneth Mackenzie, John Kubiatowicz, Matthew Frank, Walter Lee, Victor Lee, Anant Agarwal, ...
User Direct Messaging (UDM) allows user-level, processor-to-processor messaging to coexist with general multiprogramming and virtual memory. Direct messaging, where processors launch and receive...
Rajeev Barua, David Kranz, Anant Agarwal
Harnessing the full performance potential of cache-coherent distributed shared memory multiprocessors without inordinate user effort requires a compilation technology that can automatically manage...
Shared memory versus message passing for iterative solution of sparse, irregular problems (1996)
Frederic T. Chong, Anant Agarwal
The benefits of hardware support for shared memory versus those for message passing are difficult to evaluate without an in-depth study of real applications on a common platform. We evaluate the...
UDM: User Direct Messaging for General-Purpose Multiprocessing (1996)
Kenneth Mackenzie, John Kubiatowicz, Matthew Frank, Walter Lee, Victor Lee, Anant Agarwal, ...
User Direct Messaging (UDM) allows user-level, processor-to-processor messaging to coexist with general multiprogramming and virtual memory. Direct messaging, where processors launch and receive...
Rajeev Barua, David Kranz, Anant Agarwal
Harnessing the full performance potential of cache-coherent distributed shared memory multiprocessors without inordinate user effort requires a compilation technology that can automatically manage...
Rajeev Barua, David Kranz, Anant Agarwal
Harnessing the full performance potential of cache-coherent distributed shared memory multiprocessors without inordinate user effort requires a compilation technology that can automatically manage...
Software Technologies for Reconfigurable Systems (1996)
FPGA-based systems are a significant area of computing, providing a high-performance implementation substrate for many different applications. However, the key to harnessing their power for most...
Application Performance on the MIT Alewife Multiprocessor (1996)
Frederic T. Chong, BengHong Lim, Ricardo Bianchini, John Kubiatowicz, Anant Agarwal
This study reports on the performance of several applications on the Alewife machine# focus# ing on emerging applications and evolving architectural mechanisms. It shows that low#latency cache miss...
Application Performance on the MIT Alewife Multiprocessor (1996)
Frederic Chong, Beng-hong Lim, Ricardo Bianchini, John Kubiatowicz, Anant Agarwal
This study reports on the performance of several applications on the Alewife machine, focusing on emerging applications and evolving architectural mechanisms. It shows that low-latency cache miss...
MGS: A Multigrain Shared Memory System (1996)
Donald Yeung, John Kubiatowicz, Anant Agarwal
Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory...
Solving Graph Problems With Dynamic Computation Structures (1996)
Jonathan Babb, Matthew Frank, Anant Agarwal
We introduce dynamic computation structures (DCS), a compilation technique to produce dynamic code for reconfigurable computing. DCS specializes directed graph instances into user-level hardware for...
UDM: User Direct Messaging for General-Purpose Multiprocessing (1996)
Kenneth Mackenzie, John Kubiatowicz, Matthew Frank, Walter Lee, Anant Agarwal, M. Frans Kaashoek
User Direct Messaging (UDM) allows user-level, processor-to-processor messaging to coexist with general multiprogramming and virtual memory. Direct messaging, where processors launch and receive...
Shared Memory versus Message Passing for Iterative Solution of Sparse, Irregular Problems (1996)
Frederic T. Chong, Anant Agarwal
The benefits of hardware support for shared memory versus those for message passing are difficult to evaluate without an in-depth study of real applications on a common platform. We evaluate the...
Application Performance on the MIT Alewife Multiprocessor (1996)
Frederic T. Chong, Beng-hong Lim, Ricardo Bianchini, John Kubiatowicz, Anant Agarwal
This study reports on the performance of several applications on the Alewife machine, focusing on emerging applications and evolving architectural mechanisms. It shows that low-latency cache miss...
Software technologies for reconfigurable systems (1996)
FPGA-based systems are a significant area of computing, providing a high-performance implementation substrate for many different applications. However, the key to harnessing their power for most...
The MIT Alewife machine: Architecture and performance (1995)
Anant Agarwal, Ricardo Bianchini, David Chaiken, David Kranz, John Kubiatowicz, Beng-hong Lim, ...
Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and cost-effective mesh network at a constant cost per node. The MIT Alewife machine, a...
The MIT Alewife Machine: Architecture and Performance (1995)
Anant Agarwal, Ricardo Bianchini, David Chaiken, David Kranz, John Kubiatowicz, Beng-hong Lim, ...
Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and cost-effective mesh network at a constant cost per node. The MIT Alewife machine, a...
The MIT Alewife Machine: Architecture and Performance (1995)
Anant Agarwal, Ricardo Bianchini, David Chaiken, David Kranz, John Kubiatowicz, Beng-hong Lim, ...
Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and costeffective mesh network at a constant cost per node. The MIT Alewife Machine, a...
Anant Agarwal, David Kranz, Venkat Natarajan
This paper presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency traffic on shared-memory multiprocessors. While several previous papers have...
FUGU: Implementing translation and protection in a multiuser, multimodel multiprocessor (1994)
Kenneth Mackenzie, John Kubiatowicz, Anant Agarwal, Frans Kaashoek
Multimodel multiprocessors provide both shared memory and message passing primitives to the user for efficient communication. In a multiuser machine, translation permits machine resources to be...
The Virtual Wires Emulation System: A gate-efficient ASIC prototyping environment (1994)
Russell Tessier, Jonathan Babb, Matthew Dahl, Silvina Hanono, Anant Agarwal
FPGA-based ASIC development systems have become important tools in contemporary ASIC design. Existing systems exhibit low per-FPGA gate utilization (10 to 20 percent) due to limited inter-chip...
FUGU: Implementing translation and protection in a multiuser, multimodel multiprocessor (1994)
Kenneth Mackenzie, John Kubiatowicz, Anant Agarwal, Frans Kaashoek
Multimodel multiprocessors provide both shared memory and message passing primitives to the user for efficient communication. In a multiuser machine, translation permits machine resources to be...
The Alewife CMMU: Addressing the Multiprocessor Communications Gap (1994)
John Kubiatowicz, David Chaiken, Anant Agarwal, Arthur Altman, Jonathan Babb, David Kranz, ...
. Communications functionality of the A#1000 CMMU includes# # Support for distributed# cache#coherent shared memory via the LimitLESS cache#coherence protocol#3## the A#1000 supports up to #ve...
Emulation of a Sparc Microprocessor with the MIT Virtual Wires Emulation System (1994)
Matthew Dahl, Jonathan Babb, Russell Tessier, Silvina Hanono, David Hoki, Anant Agarwal
We describe a complete FPGA-based emulation software system using Virtual Wires technology and present the results of emulating an 18K-gate ASIC implementation of a Sparc microprocessor. Virtual...
Software-Extended Coherent Shared Memory: Performance and Cost (1994)
This paper evaluates the tradeoffs involved in the design of the software-extended memory system of Alewife, a multiprocessor architecture that implements coherentsharedmemory through a combination...
The Alewife CMMU: Addressing the Multiprocessor Communications Gap (1994)
John Kubiatowicz, David Chaiken, Anant Agarwal, Arthur Altman, Jonathan Babb, David Kranz, ...
ications functionality of the A-1000 CMMU includes: ffl Support for distributed, cache-coherent shared memory via the LimitLESS cache-coherence protocol[3]: the A-1000 supports up to five hardware...
The Virtual Wires Emulation System: A Gate-Efficient ASIC Prototyping Environment (1994)
Russell Tessier, Jonathan Babb, Matthew Dahl, Silvina Hanono, Anant Agarwal
FPGA-based ASIC development systems have become important tools in contemporary ASIC design. Existing systems exhibit low per-FPGA gate utilization (10 to 20 percent) due to limited inter-chip...
Reactive Synchronization Algorithms for Multiprocessors (1994)
Synchronization algorithms that are efficient across a wide range of applications and operating conditions are hard to design because their performance depends on unpredictable run-time factors. The...
Reactive Synchronization Algorithms for Multiprocessors (1994)
Synchronization algorithms that are efficient across a wide range of applications and operating conditions are hard to design because their performance depends on unpredictable run-time factors. The...
Software-extended coherent shared memory: Performance and cost (1994)
Anant Agarwal, David L. Chaiken, David L. Chaiken
by
Rajeev Barua, David Kranz, Anant Agarwal
Partitioning distributed arrays to ensure locality of reference is widely recognized as being critical in obtaining good performance on distributed memory multiprocessors. Data partitioning is the...
Analyzing Multiprocessor Cache Behavior Through Data Reference Modeling (1993)
Jory Tsai, Jory Tsai, Anant Agarwal, Anant Agarwal
This paper develops a data reference modeling technique to estimate with high accuracy the cache miss ratio in eachecoherent multiprocessors. The technique involves analyzing the dynamic data...
Rajeev Barua, David Kranz, Anant Agarwal
Partitioning distributed arrays to ensure locality of reference is widely recognized as being critical in obtaining good performance on distributed memory multiprocessors. Data partitioning is the...
Waiting algorithms for synchronization in large-scale multiprocessors (1993)
Through analysis and experiments, this paper investigates two-phase waiting algorithms to minimize the cost of waiting for synchronization in large-scale multiprocessors. In a two-phase algorithm, a...
Rajeev Barua, David Kranz, Anant Agarwal
Partitioning distributed arrays to ensure locality of reference is widely recognized as being critical in obtaining good performance on distributed memory multiprocessors. Data partitioning is the...
This paper discusses our experience with fine-grain synchronization for a variant of the preconditioned conjugate gradient method. This algorithm represents a large class of algorithms that have been...
Rajeev Barua, David Kranz, Anant Agarwal
Partitioning distributed arrays to ensure locality of reference is widely recognized as being critical in obtaining good performance on distributed memory multiprocessors. Data partitioning is the...
Waiting algorithms for synchronization in large-scale multiprocessors (1993)
Through analysis and experiments, this paper investigates two-phase waiting algorithms to minimize the cost of waiting for synchronization in large-scale multiprocessors. In a two-phase algorithm, a...
Waiting algorithms for synchronization in large-scale multiprocessors (1993)
Through analysis and experiments � this paper investigates two�phase waiting algorithms to minimize the cost of waiting for synchronization in large�scale multiprocessors. In a two�phase...
Analyzing Multiprocessor Cache Behavior Through Data Reference Modeling (1993)
Jory Tsai, Jory Tsai, Anant Agarwal, Anant Agarwal
This paper develops a data reference modeling technique to estimate with high accuracy the cache miss ratio in eachecoherent multiprocessors. The technique involves analyzing the dynamic data...
This paper discusses our experience with fine-grain synchronization for a variant of the preconditioned conjugate gradient method. This algorithm represents a large class of algorithms that have been...
Integrating Message-Passing and Shared-Memory: Early Experience (1993)
David Kranz, Kirk Johnson, Anant Agarwal, John Kubiatowicz, Beng-hong Lim
This paper discusses some of the issues involved in implementing a shared-address space programming model on large-scale, distributed-memory multiprocessors. While such a programming model can be...
Anant Agarwal, Campbell L. Searle, John David Kubiatowicz, John David Kubiatowicz
Multiprocessor architects have begun to explore several mechanisms such as prefetch# ing# context#switching and software#assisted dynamic cache#coherence# which trans# form single#phase memory...
Virtual Wires: Overcoming Pin Limitations in FPGA-based Logic Emulators (1993)
Jonathan Babb, Russell Tessier, Anant Agarwal
Existing FPGA-based logic emulators suffer from limited inter-chip communication bandwidth, resulting in low gate utilization (10 to 20 percent). This resource imbalance increases the number of chips...
Anatomy of a Message in the Alewife Multiprocessor (1993)
John Kubiatowicz, Anant Agarwal
Shared-memory provides a uniform and attractive mechanism for communication. For efficiency, it is often implemented with a layer of interpretive hardware on top of a message-passingcommunications...
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors (1993)
Anant Agarwal, John Kubiatowicz, David Kranz, BengHong Lim, Donald Yeung, Godfrey D'Souza, ...
Sparcle is a processor chip developed jointly by MIT# LSI Logic# and SUN Microsystems# byevolving an existing RISC architecture towards a processor suited for large#scale multi# processors. Sparcle...
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors (1993)
Anant Agarwal, John Kubiatowicz, David Kranz, Beng-hong Lim, Donald Yeung, Godfrey D'Souza, ...
Sparcle is a processor chip developed jointly by MIT, LSI Logic, and SUN Microsystems, by evolving an existing RISC architecture towards a processor suited for large-scale multiprocessors. Sparcle...
Run-Time Thread Management for Large-Scale Distributed-Memory Multiprocessors (1993)
Anant Agarwal, Frederic R. Morgenthaler, Daniel Nussbaum, Daniel Nussbaum
in partial ful�llment of the requirements for the degree of
Sparcle: An Evolutionary Processor Design for Large-scale Multiprocessors (1993)
Anant Agarwal, John Kubiatowicz, David Kranz, Beng-hong Lim, Donald Yeung, Mike Parkin
Sparcle is a processor chip developed jointly by MIT, LSI Logic, and SUN Microsystems, by evolving an existing RISC architecture towards a processor suited for large-scale multiprocessors. Sparcle...
Vijayaraghavan Soundararajan, Anant Agarwal
As parallel machines grow in scale and complexity, latency tolerance of synchronization faults and remote memory accesses becomes increasingly important. One method for tolerating latency is...
Performance tradeoffs in multithreaded processors (1992)
High network latencies in large-scale multiprocessors can cause a significant drop in processor utilization. By maintaining multiple process contexts in hardware and switching among them in a few...
David Kranz, Beng-hong Lim, Donald Yeung, Anant Agarwal
As multiprocessors scale beyond the limits of a few tens of processors, they must look beyond traditional methods of synchronization to minimize serialization and achieve the high degrees of...
David Kranz, Donald Yeung, Anant Agarwal
As multiprocessors scale beyond the limits of a few tens of processors � they must look beyond traditional methods of synchronization to minimize serialization and achieve the high degrees of...
David Kranz, Beng-hong Lim, Donald Yeung, Anant Agarwal
As multiprocessors scale beyond the limits of a few tens of processors, they must look beyond traditional methods of synchronization to minimize serialization and achieve the high degrees of...
Vijayaraghavan Soundararajan, Anant Agarwal
As parallel machines grow in scale and complexity, latency tolerance of synchronization faults and remote memory accesses becomes increasingly important. One method for tolerating latency is...
The Impact of Communication Locality on Large-Scale Multiprocessor Performance (1992)
As multiprocessor sizes scale and computer architects turn to interconnection networks with non-uniformcommunication latencies, the lure of exploitingcommunication locality to increase performance...
Closing the Window of Vulnerability in Multiphase Memory Transactions (1992)
John Kubiatowicz, David Chaiken, Anant Agarwal
Multiprocessor architects have begun to explore several mechanisms such as prefetching, context-switching and software-assisted dynamic cache-coherence, which transform single-phase memory...
Closing the Window of Vulnerability in Multiphase Memory Transactions (1992)
John Kubiatowicz, David Chaiken, Anant Agarwal
Multiprocessor architects have begun to explore several mechanisms such as prefetching, context-switching and softwareassisted dynamic cache-coherence, which transform single-phase memory...
Closing the Window of Vulnerability in Multiphase Memory Transactions (1992)
John Kubiatowicz, David Chaiken, Anant Agarwal
Multiprocessor architects have begun to explore several mechanisms such as prefetching, context-switching and software-assisted dynamic cache-coherence, which transform single-phase memory...
David Kranz, Beng-hong Lim, Donald Yeung, Anant Agarwal
As multiprocessors scale beyond the limits of a few tens of processors, they must look beyond traditional methods of synchronization to minimize serialization and achieve the high degrees of...
Latency Tolerance through Multithreading in Large-Scale Multiprocessors (1991)
Kiyoshi Kurihara, David Chaiken, Anant Agarwal
In large-scale distributed-memory multiprocessors, remote memory accesses suffer significant latencies. Caches help alleviate the memory latency problem by maintaining local copies of frequently used...
The MIT Alewife Machine (1991)
Anant Agarwal, Ricardo Bianchini, David Chaiken, Frederic T. Chong, Associate Member, Kirk L. Johnson, ...
this paper appeared in ISCA'95.
Latency Tolerance through Multithreading in Large-Scale Multiprocessors (1991)
Kiyoshi Kurihara, David Chaiken, Anant Agarwal
In large-scale distributed-memory multiprocessors, remote memory accesses suffer significant latencies. Caches help alleviate the memory latency problem by maintaining local copies of frequently used...
LimitLESS Directories: A Scalable Cache Coherence Scheme (1991)
David Chaiken, John Kubiatowicz, Anant Agarwal
Caches enhance the performance of multiprocessors by reducing network traffic and average memory access latency. However, cache-based systems must address the problem of cache coherence. We propose...
Limits on Interconnection Network Performance (1991)
As the performance of interconnection networks becomes increasingly limited by physical constraints in high-speed multiprocessor systems, the parameters of high-performance network design must be...
LimitLESS Directories: A Scalable Cache Coherence Scheme (1991)
David Chaiken, John Kubiatowicz, Anant Agarwal
Caches enhance the performance of multiprocessors by reducing network traffic and average memory access latency. However, cache-based systems must address the problem of cache coherence. We propose...
Waiting Algorithms for Synchronization in Large-Scale Multiprocessors (1991)
Through analysis and experiments, this paper investigates two-phase waiting algorithms to minimize the cost of waiting for synchronization in large-scale multiprocessors. In a two-phase algorithm, a...
The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor (1991)
Anant Agarwal, David Chaiken, David Kranz, John Kubiatowicz, Kiyoshi Kurihara, Gino Maa, ...
The Alewife multiprocessor project focuses on the architecture and design of a large-scale parallel machine. The machine uses a low-dimensional direct interconnection network to provide scalable...
Waiting Algorithms for Synchronization in Large-Scale Multiprocessors (1991)
Through analysis and experiments, this paper investigates two-phase waiting algorithms to minimize the cost of waiting for synchronization in large-scale multiprocessors. In a two-phase algorithm, a...
LimitLESS Directories: A Scalable Cache Coherence Scheme (1991)
David Chaiken, John Kubiatowicz, Anant Agarwal
Caches enhance the performance of multiprocessors by reducing network traffic and average memory access latency. However, cache-based systems must address the problem of cache coherence. We propose...
The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor (1991)
Anant Agarwal, David Chaiken, Kirk Johnson, David Kranz, John Kubiatowicz, Kiyoshi Kurihara, ...
The Alewife multiprocessor project focuses on the architecture and design of a large-scale parallel machine. The machine uses a low dimension direct interconnection network to provide scalable...
Limits on interconnection network performance (1991)
As the performance of interconnection networks becomes increasingly limited by physical constraints in high-speed multiprocessor systems, the parameters of high-performance network design must be...
APRIL: a processor architecture for multiprocessing (1990)
Anant Agarwal, Beng-hong Lim, David Kranz, John Kubiatowicz
Processors in large-scale multiprocessors must be able to tolerate large communication latencies and synchronization delays. This paper describes the architecture of a rapid-context-switching...
APRIL: A Processor Architecture for Multiprocessing (1990)
Anant Agarwal, Beng-Hong Lim, David Kranz, John Kubiatowicz
Processors in large#scale multiprocessors must be able to tolerate large communication latencies and synchro# nization delays. This paper describes the architecture of a rapid#context#switching...
APRIL: A Processor Architecture for Multiprocessing (1990)
Anant Agarwal, Beng-hong Lim, David Kranz, John Kubiatowicz
Processors in large-scale multiprocessors must be able to tolerate large communication latencies and synchronization delays. This paper describes the architecture of a rapid-context-switching...
Overview of the Alewife Project (1990)
Introduction The goal of the Alewife experiment is to demonstrate that a parallel computer system can be made both scalable and easily programmable. Scalability will be achieved through an...
Cache Coherence Protocols for Large-Scale Multiprocessors (1990)
Anant Agarwal, Arthur C. Smith, David Lars Chaiken, David Lars Chaiken
in partial ful�llment of the requirements for the degree of
An Evaluation of Directory Schemes for Cache Coherence (1988)
Anant Agarwal, Richard Simoni, John Hennessy, Mark Horowitz
The problem of cache coherence in shared-memory multiprocessors has been addressed using two basic approaches: directory schemes and snoopy cache schemes. Directory schemes have been given less...
Analysis of cache performance for operating systems and multiprogramming /--by Anant Agarwal. (1987)
Thesis (Ph. D.)--Stanford University, 1987.