A System Solution for High-Performance, Low Power SDR Abstract (2009)
Yuan Lin, Hyunseok Lee, Yoav Harel, Mark Woh, Scott Mahlke, Trevor Mudge, ...
One central challenge in the realization of Software Defined Radio (SDR) is to provide a programmable solution that meets the challenging high-performance, low-power requirements, while providing an...
MUKHERJEE DEBATE WHETHER RELIABILITY IS A LEGITIMATE CONCERN FOR THE (2008)
Antonio González, Scott Mahlke, Shubu Mukherjee, Resit Sendag, Derek Chiou, Joshua J. Yi, ...
MICROARCHITECT. TOPICS INCLUDE THE COSTS OF ADDING RELIABILITY VERSUS THOSE OF
Hierarchical Coarse-grained Stream Compilation for Software Defined Radio ABSTRACT (2008)
Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge
Software Defined Radio (SDR) is an emerging embedded domain where the physical layer of wireless protocols is implemented in software rather than the traditional application specific hardware. The...
Modulo Scheduling for Highly Customized Datapaths to Increase Hardware Reusability ABSTRACT (2008)
Kevin Fan, Hyunchul Park, Manjunath Kudlur, Scott Mahlke
In the embedded domain, custom hardware in the form of ASICs is often used to implement critical parts of applications when performance and energy efficiency goals cannot be met with software...
VEAL: Virtualized Execution Accelerator for Loops (2008)
Nathan Clark, Amir Hormati, Scott Mahlke
Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific accelerators used in conjunction with general...
Abstract Control CPR: A Branch Height Reduction Optimization for EPIC Architectures (2008)
Michael Schlansker, Scott Mahlke, Richard Johnson
The challenge of exploiting high degrees of instruction-level parallelism is often hampered by frequent branching. Both exposed branch latency and low branch throughput can restrict parallelism....
Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping (2008)
Amir Hormati, Nathan Clark, Scott Mahlke
The demand for high performance has driven acyclic computation accelerators into extensive use in modern embedded and desktop architectures. Accelerators that are ideal from a software perspective,...
Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Sangwon Seo, Rob Mullenix, ...
GPP+DSP+ASICs GPP+DSP+ASICs GPP+DSP+ASICs GPP+DSP+ASIC GPP+DSPs Analog ASICs Software Defined Radio (SDR): Use of software routines instead of ASICs for physical layer operations of wireless 802.11...
Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping (2008)
Amir Hormati, Nathan Clark, Scott Mahlke
The demand for high performance has driven acyclic computation accelerators into extensive use in modern embedded and desktop architectures. Accelerators that are ideal from a software perspective,...
SPEX: A programming language for software defined radio (2008)
Yuan Lin, Robert Mullenix, Mark Woh, Scott Mahlke, Trevor Mudge, Alastair Reid, ...
High-throughput, low-power Software Defined Radio(SDR) solutions require multi-core SIMD DSP processors to meet real-time performance requirements. Given the difficulty in programming traditional...
StageNet: A Reconfigurable CMP Fabric for Resilient Systems ABSTRACT (2008)
Shantanu Gupta, Shuguang Feng, Jason Blome, Scott Mahlke
Though CMOS feature size scaling has been the source of dramatic performance gains, this scaling has lead to mounting reliability concerns due to increasing power densities and on-chip temperatures....
Valeria Bertacco, Scott Mahlke, Todd Austin, Bin Zhang, Michael Orshansky
As silicon technologies move into the nanometer regime, transistor reliability is expected to wane as devices become subject to extreme process variation, particle-induced transient errors, and...
Hierarchical Coarse-grained Stream Compilation for Software Defined Radio ABSTRACT (2008)
Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge
Software Defined Radio (SDR) is an emerging embedded domain where the physical layer of wireless protocols is implemented in software rather than the traditional application specific hardware. The...
Probabilistic Predicate-Aware Modulo Scheduling Mikhail Smelyanskiy davidson¢ (2008)
Scott Mahlke, Edward S. Davidson
Predicated execution enables the removal of branches by converting segments of branching code into sequences of conditional operations. An important side effect of this transformation is that the...
The Next Generation Challenge for Software Defined Radio (2008)
Mark Woh, Sangwon Seo, Hyunseok Lee, Yuan Lin, Scott Mahlke, Chaitali Chakrabarti, ...
Abstract. Wireless communication for mobile terminals has been a high performance computing challenge. It requires almost super computer performance while consuming very little power. This...
BY SEPARATING CONTROL AND DATA PROCESSING AND BY EMPLOYING ULTRAWIDE SIMD (2008)
Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge, ...
...... Communication has become one of the central uses of computing technology, and applications that facilitate interpersonal communication, such as desktop publishing, graphic design, e-mail, and...
Compiler-directed Synthesis of Multifunction Loop Accelerators (2008)
Kevin Fan, Manjunath Kudlur, Hyunchul Park, Scott Mahlke
Complex algorithms and increased functionality are expanding the computation demands of embedded systems. Hardware accelerators are commonly used to meet these demands by executing critical...
A System Solution for High-Performance, Low Power SDR (2008)
Yuan Lin, Yuan Lin Hyunseok, Yoav Harel, Mark Woh, Scott Mahlke, Trevor Mudge, ...
One central challenge in the realization of Software Defined Radio (SDR) is to provide a programmable solution that meets the challenging high-performance, low-power requirements, while providing an...
Uncovering hidden loop level parallelism in sequential applications (2008)
Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, Scott Mahlke
As multicore systems become the dominant mainstream computing technology, one of the most difficult challenges the industry faces is the software. Applications with large amounts of explicit...
Self-calibrating online wearout detection (2007)
Jason Blome, Shuguang Feng, Shantanu Gupta, Scott Mahlke
Technology scaling, characterized by decreasing feature size, thinning gate oxide, and non-ideal voltage scaling, will become a major hindrance to microprocessor reliability in future technology...
Data access partitioning for fine-grain parallelism on multicore architectures (2007)
Michael Chu, Rajiv Ravindran, Scott Mahlke
The recent design shift towards multicore processors has spawned a significant amount of research in the area of program parallelization. The future abundance of cores on a single chip requires...
Liquid simd: Abstracting simd hardware using lightweight dynamic mapping (2007)
Nathan Clark, Amir Hormati, Sami Yehia, Scott Mahlke, Krisztián Flautner
Microprocessor designers commonly utilize SIMD accelerators and their associated instruction set extensions to provide substantial performance gains at a relatively low cost for media applications....
BulletProof: A Defect-Tolerant CMP Switch Architecture (2006)
Kypros Constantinides, Stephen Plaza, Jason Blome, Bin Zhang, Valeria Bertacco, Scott Mahlke, ...
As silicon technologies move into the nanometer regime, transistor reliability is expected to wane as devices become subject to extreme process variation, particle-induced transient errors, and...
SODA: A Low-power Architecture For Software Radio (2006)
Yuan Lin, Yuan Lin Hyunseok, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge, ...
The physical layer of most wireless protocols is traditionally implemented in custom hardware to satisfy the heavy computational requirements while keeping power consumption to a minimum. These...
Design and Implementation of Turbo Decoders for Software Defined Radio (2006)
Yuan Lin, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti
Software Defined Radio(SDR) is an emerging paradigm for wireless terminals, in which the physical layer of communication protocols is implemented in software rather than by ASICs. Many of the current...
Modulo Graph Embedding: Mapping Applications onto Coarse-Grained Reconfigurable Architectures (2006)
Hyunchul Park, Kevin Fan, Manjunath Kudlur, Scott Mahlke
Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing the potential for high computation throughput, scalability, low cost and energy efficiency....
Scalable Subgraph Mapping for Acyclic Computation Accelerators (2006)
Nathan Clark, Amir Hormati, Scott Mahlke, Sami Yehia
Computer architects are constantly faced with the need to improve performance and increase the efficiency of computation in their designs. To this end, it is increasingly common to see acyclic...
Streamroller: Automatic Synthesis of Prescribed Throughput Accelerator Pipelines (2006)
Manjunath Kudlur, Kevin Fan, Scott Mahlke
In this paper, we present a methodology for designing a pipeline of accelerators for an application. The application is modeled using sequential C language with simple stylizations. The synthesis of...
Increasing Hardware Efficiency with Multifunction Loop Accelerators (2006)
Kevin Fan, Manjunath Kudlur, Hyunchul Park, Scott Mahlke
To meet the conflicting goals of high-performance low-cost embedded systems, critical application loop nests are commonly executed on specialized hardware accelerators. These loop accelerators are...
Online timing analysis for wearout detection (2006)
Jason A. Blome, Shuguang Feng, Shantanu Gupta, Scott Mahlke
CMOS feature size scaling has long been the source of dramatic performance gains. However, because voltage levels have not scaled in step, feature size scaling has come at the cost of increased...
Soda: A low-power architecture for software radio (2006)
Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge, ...
The physical layer of most wireless protocols is traditionally implemented in custom hardware to satisfy the heavy computational requirements while keeping power consumption to a minimum. These...
BulletProof: A defect-tolerant cmp switch architecture (2006)
Kypros Constantinides, Stephen Plaza, Jason Blome, Bin Zhang, Valeria Bertacco, Scott Mahlke, ...
As silicon technologies move into the nanometer regime, transistor reliability is expected to wane as devices become subject to extreme process variation, particle-induced transient errors, and...
Scalable subgraph mapping for acyclic computation accelerators (2006)
Nathan Clark, Amir Hormati, Scott Mahlke
Computer architects are constantly faced with the need to improve performance and increase the efficiency of computation in their designs. To this end, it is increasingly common to see acyclic...
Jason Blome, Scott Mahlke, Daryl Bradley, Krisztián Flautner
Current trends in device scaling continue to cause an increasing risk of transient faults in microprocessors due to high energy strikes from radiated particles. In this work, we present a thorough...
Assessing SEU vulnerability via circuit-level timing analysis (2005)
Kypros Constantinides, Stephen Plaza, Jason Blome, Bin Zhang, Valeria Bertacco, Scott Mahlke, ...
Recently, there has been a growing concern that, in relation to process technology scaling, the soft-error rate will become a major challenge in designing reliable systems. In this work, we introduce...
A distributed control path architecture for vliw processors (2005)
Hongtao Zhong, Scott Mahlke, Michael Schlansker
VLIW architectures are popular in embedded systems because they offer high-performance processing at low cost and energy. The major problem with traditional VLIW designs is that they do not scale...
Nathan Clark, Jason Blome, Michael Chu, Scott Mahlke, Stuart Biles, Krisztian Flautner
Instruction set customization is an e#ective way to improve processor performance. Critical portions of application dataflow graphs are collapsed for accelerated execution on specialized hardware....
Exploring the Design Space of LUT-based Transparent Accelerators (2005)
Sami Yehia, Nathan Clark, Scott Mahlke, Krisztian Flautner
Instruction set customization accelerates the performance of applications by compressing the length of critical dependence paths and reducing the demands on processor resources. With instruction set...
Assessing SEU Vulnerability via Circuit-Level Timing Analysis (2005)
Kypros Constantinides, Stephen Plaza, Jason Blome, Bin Zhang, Valeria Bertacco, Scott Mahlke, ...
Recently, there has been a growing concern that, in relation to process technology scaling, the soft-error rate will become a major challenge in designing reliable systems. In this work, we introduce...
A Distributed Control Path Architecture (2005)
For Vliw Processors, Hongtao Zhong, Kevin Fan, Scott Mahlke, Michael Schlansker
VLIW architectures are popular in embedded systems because they offer high-performance processing at low cost and energy. The major problem with traditional VLIW designs is that they do not scale...
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System (2005)
Kevin Fan, Manjunath Kudlur, Hyunchul Park, Scott Mahlke
Scheduling algorithms used in compilers traditionally focus on goals such as reducing schedule length and register pressure or producing compact code. In the context of a hardware synthesis system...
Software Defined Radio - A High Performance Embedded Challenge (2005)
Hyunseok Lee, Yuan Lin, Yoav Harel, Mark Woh, Scott Mahlke, Trevor Mudge, ...
Wireless communication is one of the most computationally demanding workloads. It is performed by mobile terminals ("cell phones") and must be accomplished by a small battery powered system.
Jason Blome, Scott Mahlke, Daryl Bradley, Krisztian Flautner
Current trends in device scaling continue to cause an increasing risk of transient faults in microprocessors due to high energy strikes from radiated particles. In this work, we present a thorough...
Nathan Clark, Jason Blome, Michael Chu, Scott Mahlke, Stuart Biles, Krisztián Flautner
Instruction set customization is an effective way to improve processor performance. Critical portions of application dataflow graphs are collapsed for accelerated execution on specialized hardware....
Manjunath Kudlur, Kevin Fan, Michael Chu, Rajiv Ravindran, Nathan Clark, Scott Mahlke
Application-specific instruction set processors (ASIPs) have the potential to meet the challenging cost, performance, and power goals of future embedded processors by customizing the hardware to suit...
Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott Mahlke, Krisztian Flautner
Application-specific instruction set extensions are an effective way of improving the performance of processors. Critical computation subgraphs can be accelerated by collapsing them into new...
A Programmable Vector Coprocessor Architecture for Wireless Applications (2004)
Yuan Lin, Nadev Baron, Hyunseok Lee, Scott Mahlke, Trevor Mudge
The physical layers of most wireless protocols are traditionally implemented in ASICs due to the heavy computation requirements. These solutions are costly to design and hardwired solutions that o#er...
Automatic Synthesis of Customized Local Memories for Multicluster Application Accelerators (2004)
Manjunath Kudlur, Kevin Fan, Michael Chu, Scott Mahlke
Distributed local memories, or scratchpads, have been shown to e#ectively reduce cost and power consumption of application-specific accelerators while maintaining performance. The design of the local...
Memory System Design Space Exploration for Low-Power, Real-Time Speech Recognition (2004)
Rajeev Krishna, Scott Mahlke, Todd Austin
The recent proliferation of computing technology has brought added interest to natural I/O interface technologies such as speech recognition. Unfortunately, the computational and memory demands of...
Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott Mahlke, Krisztian Flautner
Application-specific instruction set extensions are an effective way of improving the performance of processors. Critical computation subgraphs can be accelerated by collapsing them into new...
Automatic Synthesis of Customized Local Memories for Multicluster Application Accelerators (2004)
Manjunath Kudlur, Kevin Fan, Michael Chu, Scott Mahlke
Distributed local memories, or scratchpads, have been shown to e#ectively reduce cost and power consumption of application-specific accelerators while maintaining performance. The design of the local...
Manjunath Kudlur, Kevin Fan, Michael Chu, Rajiv Ravindran, Nathan Clark, Scott Mahlke
Application-specific instruction set processors (ASIPs) have the potential to meet the challenging cost, performance, and power goals of future embedded processors by customizing the hardware to suit...
A Programmable Vector Coprocessor Architecture for Wireless Applications (2004)
Yuan Lin, Nadev Baron, Hyunseok Lee, Scott Mahlke, Trevor Mudge
The physical layers of most wireless protocols are traditionally implemented in ASICs due to the heavy computation requirements. These solutions are costly to design and hardwired solutions that...
Clark, Nathan, Zhong, Hongtao, Tang, Wilkin, Mahlke, Scott
General-purpose processors are often incapable of achieving the challenging cost, performance, and power demands of high-performance applications. To meet these demands, most systems employ a number...
Processor acceleration through automated instruction set customization (2003)
Nathan Clark, Hongtao Zhong, Scott Mahlke
Application-specific extensions to the computational capabilities of a processor provide an efficient mechanism to meet the growing performance and power demands of embedded applications. Hardware,...
Processor Acceleration Through Automated Instruction Set Customization (2003)
Nathan Clark, Hongtao Zhong, Scott Mahlke
Application-specific extensions to the computational capabilities of a processor provide an e#cient mechanism to meet the growing performance and power demands of embedded applications. Hardware, in...
Architectural Optimizations for Low-Power, Real-Time Speech Recognition (2003)
Rajeev Krishna, Scott Mahlke, Todd Austin
The proliferation of computing technology to low power domains such as hand--held devices has lead to increased interest in portable interface technologies, with particular interest in speech...
Systematic register bypass customization for application-specific processors (2003)
Kevin Fan, Nathan Clark, Michael Chu, K. V. Manjunath, Rajiv Ravindran, Mikhail Smelyanskiy, ...
Register bypass provides additional datapaths to eliminate data hazards in processor pipelines. The difficulty with register bypass is that the cost of the bypass network is substantial and grows...
Processor acceleration through automated instruction set customization (2003)
Nathan Clark, Hongtao Zhong, Scott Mahlke
Application-specific extensions to the computational capabilities of a processor provide an efficient mechanism to meet the growing performance and power demands of embedded applications. Hardware,...
Region-based hierarchical operation partitioning for multicluster processors (2003)
Michael Chu, Kevin Fan, Scott Mahlke
Clustered architectures are a solution to the bottleneck of centralized register files in superscalar and VLIW processors. The main challenge associated with clustered architectures is compiler...
Automatically Generating Custom Instruction Set Extensions (2002)
Nathan Clark, Wilkin Tang, Scott Mahlke
General-purpose processors that are utilized as cores are often incapable of achieving the challenging cost, performance, and power demands of high-performance audio, video, and networking...
Insights into the memory demands of speech recognition algorithms (2002)
Rajeev Krishna, Scott Mahlke, Todd Austin
The vision of pervasive computing is one of invisible computers interacting with humans in all aspects of their lives. These invisible computers can be embedded in anything from specialized portable...
Insights into the memory demands of speech recognition algorithms (2002)
Rajeev Krishna, Scott Mahlke, Todd Austin
The vision of pervasive computing is one of invisible computers interacting with humans in all aspects of their lives. These invisible computers can be embedded in anything from specialized portable...
Advisor Prof, Scott Mahlke, Advisor Prof, Rajat Moona
To work in a challenging environment involved in the design and development of architecture and compilation techniques for high performance processors.
Bitwidth sensitive code generation in a custom embedded accelerator design system (2001)
Scott Mahlke, Rajiv Ravindran, Michael Schlansker, Robert Schreiber, Timothy Sherwood
An ever larger variety of embedded ASICs is being designed and deployed to satisfy an explosively growing demand for new
Bitwidth Cognizant Architecture Synthesis of Custom Hardware Accelerators (2001)
Scott Mahlke, Rajiv Ravindran, Michael Schlansker, Robert Schreiber, Timothy Sherwood
applicationspecific design, architecture synthesis, bitwidth, clustering, embedded system, hardware accelerator, operation scheduling, resource allocation PICO is a system for automatically...
PICO-NPA: High-level synthesis of nonprogrammable hardware accelerators (2001)
B. Ramakrishna Rau, Vinod Kathail, Robert Schreiber, Robert Schreiber, Shail Aditya, Shail Aditya, ...
synthesis
Bitwidth Cognizant Architecture Synthesis of Custom Hardware Accelerators (2001)
Scott Mahlke, Rajiv Ravindran, Michael Schlansker, Robert Schreiber, Timothy Sherwood
application-speci c design, architecture synthesis, bitwidth, clustering, embedded system, hardware accelerator, operation scheduling, resource allocation PICO is a system for automatically...
Bitwidth Cognizant Architecture Synthesis of Custom Hardware Accelerators (2001)
Scott Mahlke, Rajiv Ravindran, Michael Schlansker, Robert Schreiber, Timothy Sherwood
PICO is a system for automatically synthesizing embedded hardware accelerators from loop nests speci ed in the C programming language. A key issue confronted when designing such accelerators is the...
Bitwidth Cognizant Architecture Synthesis of Custom Hardware Accelerators (2001)
Scott Mahlke, Rajiv Ravindran, Michael Schlansker, Robert Schreiber, Timothy Sherwood
Abstract—Program-in chip-out (PICO) is a system for automatically synthesizing embedded hardware accelerators from loop nests specified in the C programming language. A key issue confronted when...
High-Level Synthesis of Nonprogrammable Hardware Accelerators (2000)
Robert Schreiber Shail, Robert Schreiber, Shail Aditya, B. Ramakrishna, Rau Vinod Kathail, Scott Mahlke, ...
The PICO-N system automatically synthesizes embedded nonprogrammable accelerators to be used as co-processors for functions expressed as loop nests in C. The output is synthesizable VHDL that defines...
High-Level Synthesis of Nonprogrammable Hardware Accelerators (2000)
Robert Schreiber, B. Ramakrishna Rau, Darren Cronquist, Mukund Sivaraman, Robert Schreibery, Shail Aditya, ...
high-level synthesis, ASIC, systolic array The PICO-NPA system automatically synthesizes nonprogrammable accelerators (NPAs) to be used as co-processors for functions expressed as loop nests in C....
Control CPR: A Branch Height Reduction Optimization for EPIC Architectures (1999)
Michael Schlansker, Scott Mahlke, Richard Johnson
The challenge of exploiting high degrees of instruction-level parallelism is often hampered by frequent branching. Both exposed branch latency and low branch throughput can restrict parallelism....
Control CPR: A branch height reduction optimization for EPIC architectures (1999)
Michael Schlansker, Michael Schlansker, Scott Mahlke, Scott Mahlke, Richard Johnson, Richard Johnson
ILP, critical path reduction, compilers © Copyright Hewlett-Packard Company 1999 The challenge of exploiting high degrees of instructionlevel parallelism is often hampered by frequent branching....
Automatic and efficient evaluation of memory hierarchies for embedded systems (1999)
Santosh Abraham, Scott Mahlke, Santosh G. Abraham, Scott A. Mahlke
hierarchical evaluation, automatic design, embedded system, cache simulation, cache modeling © Copyright Hewlett-Packard Company 1999 Automation is the key to the design of future embedded systems...
Achieving High Levels of Instruction-Level Parallelism With Reduced Hardware Complexity (1997)
Michael S. Schlansker, B. Ramakrishna Rau, Scott Mahlke, Vinod Kathail, Richard Johnson, Sadun Anik, ...
instruction-level parallelism, VLIW processors, superscalar processors, overlapped execution, out-of-order execution, speculative execution, branch prediction, instruction scheduling, compile-time...
Register connection: A new approach to adding registers into instruction set architectures (1993)
Tokuzo Kiyohara, Scott Mahlke, William Chen, Roger Bringmann, Richard Hank, Sadun Anik, ...
Code optimization and scheduling for superscalar and superpipelined processors often increase the register requirement of programs. For existing instruction sets with a small to moderate number of...
Register Connection: A New Approach to Adding Registers into Instruction Set Architectures (1993)
Tokuzo Kiyohara, Scott Mahlke, William Chen, Roger Bringmann, Richard Hank, Sadun Anik, ...
Code optimization and scheduling for superscalar and superpipelined processors often increase the register requirement of programs. For existing instruction sets with a small to moderate number of...
Sentinel Scheduling for VLIW and Superscalar Processors (1992)
Scott Mahlke, William Y. Chen, B. Ramakrishna, Rau Michael, S. Schlansker
Speculative execution is an important source of parallelism for VLIW and superscalar processors. A serious challenge with compiler-controlled speculative execution is to accurately detect and report...
Scott Mahlke, William Y. Chen, Pohua P. Chang
In this paper the performance of multiple-instructionissue processors with variable register file sizes is examined for a set of scalar programs. We make several important observations. First,...
The Effect Of Compiler Optimizations On Available Parallelism In Scalar Programs (1991)
Scott Mahlke, Nancy J. Warter, William Y. Chen, Pohua P. Chang
In this paper we analyze the effect of compiler optimizations on fine grain parallelism in scalar programs. We characterize three levels of optimization: classical, superscalar, and multiprocessor....
The Effect Of Compiler Optimizations On Available Parallelism In Scalar Programs (1991)
Scott Mahlke, Nancy J. Warter, William Y. Chen, Pohua P. Chang
In this paper we analyze the effect of compiler optimizations on fine grain parallelism in scalar programs. We characterize three levels of optimization: classical, superscalar, and multiprocessor....