Hewlett-Packard Laboratories (2007)
Shail Aditya, B. Ramakrishna Rau, Vinod Kathail
This paper describes a mechanism for automatic design and synthesis of very long instruction word (VLIW), and its generalization, explicitly parallel instruction computing specification of their...
THE CYDRA TM 5 STRIDE-INSENSITIVE MEMORY SYSTEM (2007)
B. Ramakrishna Rau, Michael S. Schlansker
Cydrome's Cydra TM 5 Departmental Supercomputer TM is designed for compute-intensive engineering, scientific, and financial applications. One of the major sub-systems is the Main Memory,...
Alain Darte, Robert Schreiber, B. Ramakrishna Rau, Frederic Vivien
Constructing and exploiting linear schedules with prescribed parallelism
Unit Mixte, Alain Darte, Alain Darte, Robert Schreiber, ...
A constructive solution to the juggling problem in systolic array synthesis
Constructing and exploiting linear schedules with prescribed parallelism (2002)
Frederic Vivien, Alain Darte, Alain Darte, Robert Schreiber, Robert Schreiber, B. Ramakrishna Rau, ...
systolic array, multicluster VLIW, linear schedule We present two new results of importance in code generation for and synthesis of synchronously scheduled parallel processor arrays and multicluster...
PICO-NPA: High-level synthesis of nonprogrammable hardware accelerators (2001)
B. Ramakrishna Rau, Vinod Kathail, Robert Schreiber, Robert Schreiber, Shail Aditya, Shail Aditya, ...
synthesis
Code Size Minimization and Retargetable Assembly for Custom EPIC and VLIW Instruction Formats (2000)
Shail Aditya, Scott A. Mahlke, B. Ramakrishna Rau
PICO is a fully automated system for designing the architecture and the micro-architecture of VLIW and EPIC processors. A serious concern with this class of processors, due to their very long...
Code Size Minimization and Retargetable Assembly for Custom EPIC and VLIW Instruction Formats (2000)
Shail Aditya, Scott A. Mahlke, B. Ramakrishna Rau
PICO is a fully automated system for designing the architecture and the microarchitecture of VLIW and EPIC processors. A serious concern with this class of processors, due to their very long...
HPL-PD architecture specification: Version 1.1 (2000)
Vinod Kathail, Michael S. Schlansker, B. Ramakrishna Rau
instruction-level parallelism, parametric architecture, EPIC, VLIW, superscalar, speculative execution, predicated execution, programmatic cache control, run-time memory disambiguation, branch...
Embedded Computing: New Directions in Architecture and Automation (2000)
B. Ramakrishna Rau, Michael S. Schlansker
this report, we elaborate on these claims and provide, as an example, an overview of PICO, the architecture synthesis system that the authors and their colleagues have been developing over the past...
Fast Design Space Exploration through Validity and Quality Filtering of Subsystem Designs (2000)
Santosh G. Abraham, B. Ramakrishna Rau, Robert Schreiber
Automated design tools help to capture the benefits of customization in embedded system design while not exceeding design budgets. Such design tools must understand and exploit the hierarchical...
A constructive solution to the juggling problem in systolic array synthesis (2000)
B. Ramakrishna Rau, Frédéric Vivien, Alain Darte, Alain Darte, Robert Schreiber, Robert Schreiber, ...
systolic array synthesis, affine scheduling © Copyright Hewlett-Packard Company 2000 We describe a new, practical, constructive method for solving the well-known conflict-free scheduling systolic...
Fast Design Space Exploration Through Validity and Quality Filtering of Subsystem Designs (2000)
Santosh G. Abraham, B. Ramakrishna Rau, Robert Schreiber
automated design, multi-objective optimization, multiple criteria optimization, design space decomposition, hierarchical evaluation, performance evaluation, trace-driven simulation Automated design...
Embedded computing: New directions in architecture and automation (2000)
B. Ramakrishna Rau, Michael S. Schlansker
embedded computing, special-purpose architectures, customization, custom architectures, off-theshelf customizable systems, FPGA, automation, architecture synthesis, hardwaresoftware co-design,...
EPIC: An architecture for instruction-level parallel processors (2000)
Michael S. Schlansker, B. Ramakrishna Rau
VLIW architecture, instruction-level parallelism, MultiOp, nonunit assumed latencies, NUAL, rotating register files, unbundled branches, control speculation, speculative opcodes, exception tag,...
High-Level Synthesis of Nonprogrammable Hardware Accelerators (2000)
Robert Schreiber, B. Ramakrishna Rau, Darren Cronquist, Mukund Sivaraman, Robert Schreibery, Shail Aditya, ...
high-level synthesis, ASIC, systolic array The PICO-NPA system automatically synthesizes nonprogrammable accelerators (NPAs) to be used as co-processors for functions expressed as loop nests in C....
Code Size Minimization and Retargetable (2000)
Assembly For Custom, Shail Aditya, Scott A. Mahlke, B. Ramakrishna Rau
this paper is to describe a series of code size minimization techniques used within PICO, some of which are applied during the automatic design of the instruction format, while others are applied...
A constructive solution to the juggling problem in systolic array synthesis. (1999)
Darte, Alain, Schreiber, Robert, Rau, B. Ramakrishna, Vivien, Frédéric
(eng) We describe a new, practical, constructive method for solving the well-known conflict-free scheduling problem for the locally sequential, globally parallel (LSGP) case of systolic array...
B. Ramakrishna Rau, Vinod Kathail, Shail Aditya
retargetable compilers, table-driven compilers, machine description, processor description, instruction-level parallelism, EPIC processors, VLIW processors, EPIC compilers, VLIW compilers, code...
Elcor’s Machine Description System: Version 3.0 (1998)
Shail Aditya, Vinod Kathail, B. Ramakrishna Rau
retargetable compilers, table-driven compilers, machine description, processor description, instruction-level parallelism, EPIC processors, VLIW processors,
Achieving High Levels of Instruction-Level Parallelism With Reduced Hardware Complexity (1997)
Michael S. Schlansker, B. Ramakrishna Rau, Scott Mahlke, Vinod Kathail, Richard Johnson, Sadun Anik, ...
instruction-level parallelism, VLIW processors, superscalar processors, overlapped execution, out-of-order execution, speculative execution, branch prediction, instruction scheduling, compile-time...
Optimization of Machine Descriptions for Efficient Use (1996)
John Gyllenhaal, B. Ramakrishna Rau
A machine description facility allows compiler writers to specify machine execution constraints to the optimization and scheduling phases of an instruction-level parallelism (ILP) optimizing...
Optimization of Machine Descriptions for Efficient Use (1996)
John C. Gyllenhaal, B. Ramakrishna Rau
A machine description facility allows compiler writers to specify machine execution constraints to the optimization and scheduling phases of an instruction-level parallelism (ILP) optimizing...
Predicating Load Latencies Using Cache Profiling (1996)
Santosh G. Abraham, B. Ramakrishna Rau
Processor performance has been increasing at 50 % per year but memory access times have been improving at 5-10 % per year only. As a result, the latency of cache misses in processor cycles is...
Iterative modulo scheduling: An algorithm for software pipelining loops (1994)
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative...
Iterative modulo scheduling: An algorithm for software pipelining loops (1994)
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative...
Iterative modulo scheduling: An algorithm for software pipelining loops (1994)
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative...
Dynamically scheduled vliw processors (1993)
VLIW processors are viewed as an attractive way of achieving instruction-level parallelism because of their ability to issue multiple operations per cycle with relatively simple control logic. They...
Nancy J. Warter, Scott A. Mahlke, B. Ramakrishna Rau
In this paper we present a set of isomorphic control transformations that allow the compiler to apply local scheduling techniques to acyclic subgraphs of the control flow graph. Thus, the code motion...
B. Ramakrishna Rau, Joseph A. Fisher
instruction-level parallelism, VLIW processors, superscalar processors, pipelining, multiple operation issue, speculative execution, scheduling, register allocation Instruction-level Parallelism...
Pseudo-randomly interleaved memory (1991)
Interleaved memories are often used to provide the high bandwidth needed by multi-processors and high performance uniprocessors. The manner in which memory locations are distributed across the memory...