Guiding Architectural SRAM Models (2008)
Banit Agrawal, Timothy Sherwood
Abstract — Caches, block memories, predictors, state tables, and other forms of on-chip memory are continuing to consume a greater portion of processor designs with each passing year. Making good...
Abstract Understanding and Visualizing Full Systems with Data Flow Tomography (2008)
Shashidhar Mysore, Bita Mazloom, Banit Agrawal, Timothy Sherwood
It is not uncommon for modern systems to be composed of a variety of interacting services, running across multiple machines in such a way that most developers do not really understand the whole...
Formulating and Implementing Profiling over Adaptive Ranges (2008)
Shashidhar Mysore, Banit Agrawal, Rodolfo Neuber, Timothy Sherwood, Nisheeth Shrivastava, Subhash Suri
Modern computer systems are called on to deal with billions of events every second, whether they are instructions executed, memory locations accessed, or packets forwarded. This presents a serious...
Abstract Understanding and Visualizing Full Systems with Data Flow Tomography (2008)
Shashidhar Mysore, Bita Mazloom, Banit Agrawal, Timothy Sherwood
It is not uncommon for modern systems to be composed of a variety of interacting services, running across multiple machines in such a way that most developers do not really understand the whole...
Nano-enhanced Architectures: Using Carbon Nanotube Interconnects in Cache Design (2008)
Banit Agrawal, Navin Srivastava, Frederic T. Chong, Kaustav Banerjee, Timothy Sherwood
In current deep submicron (DSM) technologies, circuit designers have to deal with some serious concerns including signal noise, leakage effect, electromigration, scattering effects, variability, and...
Introspective 3D Chips Shashidhar Mysore Banit Agrawal Navin Srivastava ∗ Sheng-Chih Lin (2008)
Kaustav Banerjee, Timothy Sherwood
While the number of transistors on a chip increases exponentially over time, the productivity that can be realized from these systems has not kept pace. To deal with the complexity of modern systems,...
ToolBlocks: An Infrastructure for the Construction of Memory Hierarchy Analysis Tools. (2008)
Abstract. In an era dominated by the rapid development of faster and cheaper processors it is di cult both for the application developer and system architect to make intelligent decisions about...
SimPoint: Picking Representative Samples to Guide Simulation (2008)
Brad Calder, Timothy Sherwood, Greg Hamerly, Erez Perelman
Understanding the cycle level behavior of a processor during the execution of an application is crucial to modern computer architecture research. To gain this understanding, researchers typically...
Whiteboards that Compute: Goals and Challenges for System Designers (2008)
Even after more than thirty years of desktop computing, many (if not most) engineers facing complex problems instinctively run right to the whiteboard. A whiteboard offers an incredibly natural way...
EFFICIENT THAN THE CURRENTLY BEST KNOWN APPROACHES. (2008)
Whether tethered to an Ethernet cable or connected through wireless technology, computer systems now operate in an environment of near ubiquitous connectivity. The availability of always-on...
CONSUMER SYSTEMS TO KEEP THEM ECONOMICALLY COMPETITIVE. (2008)
Banit Agrawal, Navin Srivastava, Sheng-chih Lin, Kaustav Banerjee, Timothy Sherwood
...... Developing high-quality software for a modern computer system is no easy task. Performance-critical applications are likely to execute for quadrillions of instructions and operate in a complex...
Leakage Power Reduction of Embedded Memories on FPGAs Through Location Assignment ABSTRACT (2008)
Yan Meng, Timothy Sherwood, Ryan Kastner
Transistor leakage is poised to become the dominant source of power dissipation in digital systems, and reconfigurable devices are not immune to this problem. Modern FPGAs already have a significant...
Efficient remote profiling for resource-constrained devices (2008)
Priya Nagpurkar, Hussam Mousa, Chandra Krintz, Timothy Sherwood
The widespread use of ubiquitous, mobile, and continuously connected computing agents has inspired software developers to change the way they test, debug, and optimize software. Users now play an...
Exploring the Processor and ISA Design for Wireless Sensor Network Applications (2008)
Shashidhar Mysore, Banit Agrawal, Frederic T. Chong, Timothy Sherwood
Power consumption, physical size, and architecture design of sensor node processors have been the focus of sensor network research in the architecture community. What lies at the foundation for these...
Bit-split string-matching engines for intrusion detection and prevention (2008)
Lin Tan, Brett Brotherton, Timothy Sherwood
Network Intrusion Detection and Prevention Systems have emerged as one of the most effective ways of providing security to those connected to the network and at the heart of almost every modern...
Efficient remote profiling for resource-constrained devices (2008)
Priya Nagpurkar, Hussam Mousa, Ra Krintz, Timothy Sherwood
The widespread use of ubiquitous, mobile, and continuously-connected computing agents has inspired software developers to change the way they test, debug, and optimize software. Users now play an...
Low-Power Design of Page-Based Intelligent Memory (2007)
Mark Oskin, Frederic T. Chong, Aamir Farooqui, Timothy Sherwood, Justin Hensley
Advances in DRAM technology have led many researchers to integrate computational logic on DRAM chips to improve performance and reduce power dissipated across chip boundaries. The density, packaging,...
IP Caching For Terabit Speed Routers (2007)
Bryan Talbot, Timothy Sherwood, Bill Lin
As network speeds continue to grow, current methods of translating destination IP addresses to output port numbers during routing become inadequately slow. Eventhough this lookup is often performed...
Low-Power Design of Page-Based Intelligent Memory (2007)
Justin Hensley, Aneet Chopra, Mark Oskin, Timothy Sherwood, Aamir Farooqui, Frederic T. Chong
Advances in DRAM technology have led many researchers to integrate computational logic on DRAM chips to improve performance and reduce power dissipated across chip boundaries. The density, packaging,...
Care and Feeding of High-Performance Processors with Reconfigurable Memory Systems (2007)
Frederic Chong, Mark Oskin, Timothy Sherwood, Justin Hensley
partitioning on a range of uniprocessor and multiprocessor applications, including: a computationally-intensive, "super-optimizing" compiler, filters for image and video editing, internet...
Springer-Verlag. Loop Termination Prediction (2007)
Deeply pipelined high performance processors require highly accurate branch prediction to drive their instruction fetch. However there remains a class of events which are not easily predictable by...
ToolBlocks: An Infrastructure for the Construction of Memory Hierarchy Analysis Tools. (2007)
In an era dominated by the rapid development of faster and cheaper processors it is difficult both for the application developer and system architect to make intelligent decisions about application...
Moats and drawbridges: An isolation primitive for reconfigurable hardware based systems (2007)
Ted Huffmire, Brett Brotherton, Gang Wang, Timothy Sherwood, Ryan Kastner, Timothy Levin, ...
Blurring the line between software and hardware, reconfigurable devices strike a balance between the raw high speed of custom silicon and the post-fabrication flexibility of general-purpose...
Greg Hamerly Hamerly, Erez Perelman, Jeremy Lau, Brad Calder, Timothy Sherwood, Haym Hirsh
An essential step in designing a new computer architecture is the careful examination of different design options. It is critical that computer architects have efficient means by which they may...
Gian Luca Loi, Banit Agrawal, Navin Srivastava, Sheng-chih Lin, Timothy Sherwood, Kaustav Banerjee
Three-dimensional (3-D) integrated circuits have emerged as promising candidates to overcome the interconnect bottlenecks of nanometer scale designs. While they offer several other advantages, it is...
On the Limits of Leakage Power Reduction in Caches (2005)
Yan Meng, Timothy Sherwood, Ryan Kastner
If current technology scaling trends hold, leakage power dissipation will soon become the dominant source of power consumption. Caches, due to the fact that they account for the largest fraction of...
Yan Meng, Andrew P. Brown, Ronald A. Iltis, Timothy Sherwood, Hua Lee, Ryan Kastner
Channel estimation and multiuser detection are enabling technologies for future generations of wireless applications. However, sophisticated algorithms are required for accurate channel estimation...
Phase-Aware Remote Profiling (2005)
Priya Nagpurkar Chandra, Chandra Krintz, Timothy Sherwood
Recent advances in networking and embedded device technology have made the vision of ubiquitous computing a reality; users can access the Internet's vast offerings anytime and anywhere....
Motivation for variable length intervals and hierarchical phase behavior (2005)
Jeremy Lau, Erez Perelman, Greg Hamerly, Timothy Sherwood, Brad Calder
Most programs are repetitive, where similar behavior can be seen at different execution times. Proposed algorithms automatically group similar portions of a program’s execution into phases, where...
Low Power Electronics, Yan Meng, Wenrui Gong, Ryan Kastner, Timothy Sherwood
Wireless networks are making the vision of ubiquitous computing a reality: users will be able to connect anytime and anywhere from anything. To achieve this vision, the next generation of wireless...
Exploring the limits of leakage power reduction in caches (2005)
Yan Meng, Timothy Sherwood, Ryan Kastner
If current technology scaling trends hold, leakage power dissipation will soon become the dominant source of power consumption. Caches, due to the fact that they account for the largest fraction of...
Balancing design options with sherpa (2004)
Timothy Sherwood, Mark Oskin, Brad Calder
Application specific processors offer the potential of rapidly designed logic specifically constructed to meet the performance and area demands of the task at hand. Recently, there have been several...
Balancing design options with sherpa (2004)
Timothy Sherwood, Mark Oskin, Brad Calder
Application specific processors offer the potential of rapidly designed logic specifically constructed to meet the performance and area demands of the task at hand. Recently, there have been several...
A Co-Phase Matrix to Guide Simultaneous Multithreading Simulation (2004)
Michael Van, Timothy Sherwood, Brad Calder, Michael Van Biesbrouck, Sherwood Brad Calder
Several commercial processors have architectures that include support for Simultaneous Multithreading (SMT), yet there is still not a validated methodology for estimating the performance of an SMT...
Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection (2004)
Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese
Intrusion Detection Systems (IDSs) have become widely recognized as powerful tools for identifying, deterring and deflecting malicious attacks over the network. Essential to almost every intrusion...
Brad Calder, Todd Austin, Don Yang, Timothy Sherwood, Suleyman Sair, David Newquist, ...
A wide range of ARM developers from architects, to compiler writers, to software developers, need tools to understand, analyze, and simulate program behavior. For developers to achieve high levels of...
Balancing design options with sherpa (2004)
Timothy Sherwood, Mark Oskin, Brad Calder
Application specific processors offer the potential of rapidly designed logic specifically constructed to meet the performance and area demands of the task at hand. Recently, there have been several...
Using simpoint for accurate and efficient simulation (2003)
Erez Perelman, Greg Hamerly, Michael Van Biesbrouck, Timothy Sherwood, Brad Calder
Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of a single industry standard benchmark at this level of detail takes on the order of months...
A pipelined memory architecture for high throughput network processors (2003)
Timothy Sherwood, George Varghese, Brad Calder
Designing ASICs for each new generation of backbone routers is a time intensive and fiscally draining process. In this paper we focus on the design of a programmable architecture for backbone...
Catching accurate profiles in hardware (2003)
Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese
Run-time optimization is one of the most important ways of getting performance out of modern processors. Techniques such as prefetching, trace caching, memory disambiguation etc., are all based upon...
Catching accurate profiles in hardware (2003)
Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese
Run-time optimization is one of the most important ways of getting performance out of modern processors. Techniques such as prefetching, trace caching, memory disambiguation etc., are all based upon...
Catching accurate profiles in hardware (2003)
Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese
Run-time optimization is one of the most important ways of getting performance out of modern processors. Techniques such as prefetching, trace caching, memory disambiguation etc., are all based upon...
Pamela Cosman, Andrew Kahng, Dean Tullsen, Timothy Sherwood
All rights reserved. The dissertation of Timothy Sherwood is approved, and it is acceptable in quality and form for publication on microfilm:
Discovering and Exploiting Program Phases (2003)
Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder
he way a program traverses the code during execution. We can find this phase behavior and classify it by examining only the ratios in which different regions of code are being executed over time. We...
A decoupled predictor-directed stream prefetching architecture (2003)
Suleyman Sair, Timothy Sherwood, Brad Calder
An effective method for reducing the effect of load latency in modern processors is data prefetching. One form of hardware-based data prefetching, stream buffers, has been shown to be particularly...
Reducing code size with echo instructions (2003)
Jeremy Lau, Stefan Schoenmackers, Timothy Sherwood, Brad Calder
In an embedded system, the cost of storing a program on-chip can be as high as the cost of a microprocessor. Compressing an application’s code to reduce the amount of memory required is an...
Automatically characterizing large scale program behavior (2002)
Timothy Sherwood, Erez Perelman, Greg Hamerly
Understanding program behavior is at the foundation of computer architecture and program optimization. Many pro-grams have wildly different behavior on even the very largest of scales (over the...
Quantifying load stream behavior (2002)
Suleyman Sair, Timothy Sherwood, Brad Calder
The increasing performance gap between processors and memory will force future architectures to devote significant resources towards removing and hiding memory latency. The two major architectural...
Automatically characterizing large scale program behavior (2002)
Timothy Sherwood, Erez Perelman, Greg Hamerly, Brad Calder
Understanding program behavior is at the foundation of computer architecture and program optimization. Many programs have wildly different behavior on even the very largest of scales (over the...
Quantifying load stream behavior (2002)
Suleyman Sair, Timothy Sherwood, Brad Calder
The increasing performance gap between processors and memory will force future architectures to devote significant resources towards removing and hiding memory latency. The two major architectural...
Projecting Flat Panel Display Growth Today Systems Challenges (2002)
General purpose recognition framework Programming language-level support Efficient hypothesize-model-measure loop 1 Board management 1. Alvarado Dynamically constructed bayes nets Why is This Hard?...
Bitwidth sensitive code generation in a custom embedded accelerator design system (2001)
Scott Mahlke, Rajiv Ravindran, Michael Schlansker, Robert Schreiber, Timothy Sherwood
An ever larger variety of embedded ASICs is being designed and deployed to satisfy an explosively growing demand for new
Bitwidth Cognizant Architecture Synthesis of Custom Hardware Accelerators (2001)
Scott Mahlke, Rajiv Ravindran, Michael Schlansker, Robert Schreiber, Timothy Sherwood
applicationspecific design, architecture synthesis, bitwidth, clustering, embedded system, hardware accelerator, operation scheduling, resource allocation PICO is a system for automatically...
Bitwidth Cognizant Architecture Synthesis of Custom Hardware Accelerators (2001)
Scott Mahlke, Rajiv Ravindran, Michael Schlansker, Robert Schreiber, Timothy Sherwood
application-speci c design, architecture synthesis, bitwidth, clustering, embedded system, hardware accelerator, operation scheduling, resource allocation PICO is a system for automatically...
Bitwidth Cognizant Architecture Synthesis of Custom Hardware Accelerators (2001)
Scott Mahlke, Rajiv Ravindran, Michael Schlansker, Robert Schreiber, Timothy Sherwood
PICO is a system for automatically synthesizing embedded hardware accelerators from loop nests speci ed in the C programming language. A key issue confronted when designing such accelerators is the...
Bitwidth Cognizant Architecture Synthesis of Custom Hardware Accelerators (2001)
Scott Mahlke, Rajiv Ravindran, Michael Schlansker, Robert Schreiber, Timothy Sherwood
Abstract—Program-in chip-out (PICO) is a system for automatically synthesizing embedded hardware accelerators from loop nests specified in the C programming language. A key issue confronted when...
Automated design of finite state machine predictors for customized processors (2001)
Customized processors use compiler analysis and design automation techniques to take a generalized architectural model and create a specific instance of it which is optimized to a given application...
Automated design of finite state machine predictors for customized processors (2001)
Finite State Machines (FSM) are a fundamental building block in computer architecture, and are used to control and optimize all types of prediction and speculation, now even in the embedded space....
Predictor-directed stream buffers (2000)
Timothy Sherwood, Suleyman Sair, Brad Calder
An effective method for reducing the effect of load latency in modern processors is data prefetching. One form of data prefetching, stream buffers, has been shown to be particularly effective due to...
Predictor-directed stream buffers (2000)
Timothy Sherwood, Suleyman Sair, Brad Calder
An effective method for reducing the effect of load latency in modern processors is data prefetching. One form of data prefetching, stream buffers, has been shown to be particularly effective due to...
Predictor-directed stream buffers (2000)
Timothy Sherwood, Suleyman Sair, Brad Calder
An effective method for reducing the effect of load latency in modern processors is data prefetching. One form of data prefetching, stream buffers, has been shown to be particularly effective due to...
ActiveOS: Virtualizing intelligent memory (1999)
Mark Oskin, Frederic T. Chong, Timothy Sherwood
Current trends in DRAM memory chip fabrication have led many researchers to propose \intelligent memory " architectures that integrate microprocessors or logic with memory. Such...
Time Varying Behavior of Programs (1999)
Modern architecture research relies heavily on detailed pipeline simulation. Furthermore, programs often times exhibit interesting and important time varying behavior on an extremely large scale....
Reducing cache misses using hardware and software page placement (1999)
Timothy Sherwood, Brad Calder, Joel Emer
As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve...
ActiveOS: Virtualizing Intelligent Memory (1999)
Mark Oskin Frederic, Frederic T. Chong, Timothy Sherwood
Current trends in DRAM memory chip fabrication have led many researchers to propose "intelligent memory" architectures that integrate microprocessors or logic with memory. Such...
ActiveOS: Virtualizing Intelligent Memory (1999)
Mark Oskin Frederic, Frederic T. Chong, Timothy Sherwood
Current trends in DRAM memory chip fabrication have led many researchers to propose "intelligent memory" architectures that integrate microprocessors or logic with memory. Such...
Reducing Cache Misses Using Hardware and Software Page Placement (1999)
Timothy Sherwood, Brad Calder, Joel Emer
As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve...
Active Pages: A Computation Model for Intelligent Memory (1998)
Mark Oskin Frederic, Frederic T. Chong, Timothy Sherwood
Microprocessors and memory systems suffer from a growing gap in performance. We introduce Active Pages, a computation model which addresses this gap by shifting data-intensive computations to the...
Active Pages: A Computation Model for Intelligent Memory (1998)
Mark Oskin, Frederic T. Chong, Timothy Sherwood
Microprocessors and memory systems suffer from a growing gap in performance. We introduce Active Pages, a computation model which addresses this gap by shifting data-intensive computations to the...
Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection
Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese
Intrusion Detection Systems (IDSs) have become widely recognized as powerful tools for identifying, deterring and deflecting malicious attacks over the network. Essential to almost every intrusion...