Scheduling Strategies for Mapping Application Workflows onto the Grid (2009)
Anirban M, Ken Kennedy, Charles Koelbel, Gabriel Marin, John Mellor-crummey
In this work, we describe new strategies for scheduling and executing Workflow applications on Grid resources using the GrADS [18] infrastructure. Workflow scheduling is based on heuristic scheduling...
Honors • Phi Beta Kappa (2008)
Anshuman Dasgupta, Advisors Keith, D. Cooper, Ken Kennedy, Advisors Keith, D. Cooper, ...
OWU for exceptional ability and outstanding performance (1997, 1998)
Abstract. Array syntax, which is supported in many technical programming languages, adds expressive power by allowing operations on and assignments to whole arrays and array sections. To compile an...
Ken Kennedy, T Charles Koelbel, T Sanjay Ranka, Chau-wen Tsengt
We present a unified approach to compiling Fortran 77D and Fortran 90D programs for efficient execution on MIMD distributed-memory machines. The integrated Fortran D compiler relies on two key...
Operated by the Universities Space Research Association (2008)
Reinhard Von Hanxleden, Ken Kennedy, Charles Koelbel, Raja Das, Joel Saltz, National Aeronaulics, ...
-.0 0 0
FIRST DISTRIBUTED MASSIVELY PARALLEL (2008)
Ken Kennedy, Charles F. Bender, John L. Hennessy, Mary K, Larry Smarr
The National Computational Science Alliance (NCSA) hopes to accelerate the creation of a nationwide parallel computing environment for the US by developing hardware and software that will ease...
RETROSPECTIVE: Coloring Heuristics for Register Allocation (2008)
Preston Briggs, Keith D. Cooper, Ken Kennedy, Linda Torczon
From the earliest compilers, register allocation was recognized as an important optimization. Indeed, the original Fortran compiler spent two of its six passes on the problem [1]. (That compiler used...
Abstract Improving Register Allocation for Subscripted Variables* (2008)
David Callahan, Steve Carrs, Ken Kennedy
Most conventional compilers fail to allocate array el-ements to registers because standard data-flow anal-ysis treats arrays like scalars, making it impossible to analyze the definitions and uses of...
Abstract Compiler Blockability of Numerical Algorithms* (2008)
Over the past decade, microprocessor design strategies have focused on increasing the computational power on a single chip. Unfortunately, memory speeds haue not kept pace. The result is an imbalance...
The Impact of Multicore on Computational Science Software (2008)
Jack Dongarra, Dennis Gannon, Geoffrey Fox, Ken Kennedy
The idea that computational modeling and simulation represents a new branch of scientific methodology, alongside theory and experimentation, was introduced about two decades ago. It has since come to...
Abstract Most conventional compilers fail to allocate array elements to registers because standard data-flow analysis treats arrays like scalars, making it impossible to analyze the definitions and...
Shamik D. Sharma, Joel Saltz, Joel Saltz, Raja Das, Raja Das, Yuan-shin Hwang, ...
Procedures are presented that are designed to help users e ciently program irregular problems (e.g. unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial di erential equations solvers)...
Keith D. Cooper, Mary W. Hall, Ken Kennedy
Procedure cloning is an interprocedural transformation where the compiler creates specialized copies of procedure bodies. The compiler divides incoming calls between the original procedure and its...
Compiler Techniques for Software Prefetching on Cache-Coherent Shared-Memory Multiprocessors (2007)
Nathaniel Mcintosch, Nathaniel Mcintosh, Katherine Fletcher, Katherine Fletcher, Keith Cooper, Keith Cooper, ...
In this paper we present a comprehensive compiler framework for improving the efficiency of compilerdirected software prefetching on cache-coherent distributed shared-memory multiprocessors. The key...
Seema Hiranandani, Ken Kennedy, Chau-wen Tseng, Scott Warren
This paper describes the design and implementation of the D editor, an intelligent interactive editor for data-parallel Fortran programs. It forms the centerpiece of the D system, a collection of...
Management of the NHSE -- a Virtual Distributed Digital Library (2007)
Shirley Browne, Jack Dongarra, Tom Rowan, Ken Kennedy
The National HPCC Software Exchange (NHSE) is a distributed collection of software, documents, and data of interest to the high performance computing community. Our experiences with the design and...
Compiler Techniques for Software Prefetching on Cache-Coherent Shared-Memory Multiprocessors (2007)
Keith Cooper, Ken Kennedy, Nathaniel Mcintosh
This document describes a set of new techniques for improving the efficiency of compiler-directed software prefetching for parallel Fortran programs running on cache-coherent DSM (distributed shared...
ASimple,Fast Dominance Algorithm (2007)
Keith D. Cooper, Timothy J. Harvey, Ken Kennedy
The problem of finding the dominators in a control-flow graph has a long history in the literature. The original algorithms suffered from a large asymptotic complexity but were easy to understand....
ASimple,Fast Dominance Algorithm (2007)
Keith D. Cooper, Timothy J. Harvey, Ken Kennedy
The problem of finding the dominators in a control-flow graph has a long history in the literature. The original algorithms suffered from a large asymptotic complexity but were easy to understand....
Mathematical Models for the Speedup of Parallel Material Dynamics Codes (2007)
Darrell L. Hicks, Ken Kennedy, Lorie M. Liebrock
There have been some previous predictions of possible parallel speedups which seem to be overly pessimistic in their prognostications for the advantages to be gained by parallel processing in...
Ulrich Kremer, Performance Prediction, Christoph W, Ulrich Kremert, John Mellor-crummey, John Mellor-crummey, ...
for Distributed-Memory Machines in the D Programming Environment*
Optimizing Fortran 90 Shift Operations on (2007)
Distributed-memory Multicomputers, Ken Kennedy, John Mellor-crummey, Gerald Roth
Abstract. When executing Fortran 90 style data-parallel array operations on distributed-memory multiprocessors, intraprocessor data movement due to shift operations can account for a signicant...
Distributed-Memory Machines (2007)
Alok Choudhary, Alok Choudhary, Seema Hiranandani, Seema Hiranandani, Geoffrey Fox, Geoffrey Fox, ...
We present an integrated approach to compiling Fortran 77D and Fortran 90D programs for efficient execution on MIMD distributed-memory machines. The integrated Fortran D compiler relies on two key...
Compiler Support for Machine-Independent (2007)
Seema Hiranandani, Seema Hiranandani, Ken Kennedy, Ken Kennedy, Chau-wen Tseng, Chau-wen Tseng
Because of the complexity and variety of parallel architectures, an efficient machine-independent parallel programming model is needed to make parallel computing truly usable for scientific...
Integrated Support for Task and Data Parallelism (2007)
Mani Ch, Mani Chandy, Ian Foster, Ian Foster, Ken Kennedy, Ken Kennedy, ...
We present an overview of research at the CRPC designed to provide an efficient, portable programming model for scientific applications possessing both task and data parallelism. Fortran M programs...
Seema Hiranandani, Ken Kennedy, Chau-wen Tseng, Scott Warren
Fortran D and High Performance Fortran are languages designed to support efficient data-parallel programming on a variety of parallel architectures. The goal of the D Editor is to provide a tool that...
A General Stencil Compilation Strategy for Distributed-Memory Machines (2007)
Gerald Roth, Gerald Roth, Steve Carr, Steve Carr, John Mellor-crummey, John Mellor-crummey, ...
For many Fortran 90 programs performing dense matrix computations, the main computational portion of the program belongs to a class of kernels known as stencils. This paper describes a strategy for...
an Interactive Parallel Programming Tool y (2007)
Nathaniel Mcintosh, Kathryn S. Mckinley, Jeffery D. Oldham, Michael Paleczny, Mary W. Hall, Mary W. Hall, ...
The ParaScope Editor is an interactive parallel programming tool that assists knowledgeable users in developing scientific Fortran programs. It displays the results of sophisticated program analyses,...
Alok Choudhary, Geoffrey Fox, Seema Hiranandani, Ken Kennedy, Charles Koelbel, Sanjay Ranka, ...
We present a unified approach to compiling Fortran 77D and Fortran 90D programs for efficient execution on MIMD distributed-memory machines. The integrated Fortran D compiler relies on two key...
Programming Environments (2007)
Vikram Adve, Alan Carle, Elana Granston, Seema Hiran, Ken Kennedy, Charles Koelbel, ...
Support for Data-Parallel Programming."
Categories and Subject Descriptors General Terms (2007)
Qing Yi, Ken Kennedy, Haihang You, Keith Seymour, Jack Dongarra
Daniel Nurmi, Anirban M, John Brevik, Chuck Koelbel, Rich Wolski, Ken Kennedy
Large-scale distributed systems offer computational power at unprecedented levels. In the past, HPC users typically had access to relatively few individual supercomputers and, in general, would...
Daniel Nurmi, Anirban M, John Brevik, Chuck Koelbel, Rich Wolski, Ken Kennedy
Large-scale distributed systems offer computational power at unprecedented levels. In the past, HPC users typically had access to relatively few individual supercomputers and, in general, would...
Scalable Grid Application Scheduling via Decoupled Resource Selection and Scheduling (2006)
Yang Zhang, Anirban M, Henri Casanova, Andrew A. Chien, Yang-suk Kee, Ken Kennedy, ...
Abstract — Over the past years grid infrastructures have been deployed at larger and larger scales, with envisioned deployments incorporating tens of thousands of resources. Therefore, application...
Dependence-based code generation for a cell processor (2006)
Abstract. Obtaining high performance on the STI CELL processor requires substantial programming effort because its architectural features must be explicitly managed, with separate codes required for...
Profitable loop fusion and tiling using model-driven empirical search (2006)
Loop fusion and tiling are both recognized as effective transformations for improving memory performance of scientific applications. However, because of their sensitivity to the underlying cache...
Dependence-based code generation for a cell processor (2006)
Abstract. The CELL processor has attracted significant interest from the research community due to the performance it is capable of achieving through its multiple heterogeneous cores (scalar and...
K.: Improving Compilation of Java Scientific Applications (2006)
Zoran Budimlić, Mackale Joyner, Ken Kennedy
Java is a high productivity object-oriented programming language that is rapidly gaining popularity in high-performance application development. One major obstacle to its broad acceptance is its...
Strategy for Compiling Parallel Matlab for General Distributions (2006)
Mary Fletcher, Cheryl Mccosh, Ken Kennedy, Guohua Jin
Executing applications in parallel can produce significant performance gains, yet the time and expertise needed for the low-level details of parallelism is often prohibitive. Additionally, many...
Xiaobing and his colleagues at the Chinese Academy of Sciences (who translated the (2005)
Y Allen, Ken Kennedy, Adam Bordelon, Remko Van Beusekom, John Callan, Arun Chauhan, ...
Errata highlighted in color are corrections that are not fixed in the second printing, either because they were received too late or were missed for some other reason. About the Authors P. line 11:...
Parameterizing loop fusion for automated empirical tuning (2005)
Yuan Zhao, Qing Yi, Ken Kennedy, Dan Quinlan, Richard Vuduc
Abstract. Traditional compilers are limited in their ability to optimize applications for different architectures because statically modeling the effect of specific optimizations on different...
A cache-conscious profitability model for empirical tuning of loop fusion (2005)
Abstract. Loop fusion is recognized as an effective program transformation for improving memory hierarchy performance. However, unconstrained loop fusion can lead to poor performance because of...
Evaluating a model for cache conflict miss prediction (2005)
Cache conflict misses can cause severe degradation in application performance. Previous research has shown that for many scientific applications majority of cache misses are due to conflicts in...
A cache-conscious profitability model for empirical tuning of loop fusion (2005)
Abstract. Loop fusion is recognized as an effective program transformation for improving memory hierarchy performance. However, unconstrained loop fusion can lead to poor performance because of...
Iterative data-flow analysis, revisited (2004)
Keith D. Cooper, Timothy J. Harvey, Ken Kennedy
ABSTRACT The iterative algorithm is widely used to solve instances of data-flow analysis problems. The algorithm is attractive because it is easy to implement and robust in its behavior. The theory...
Defining and measuring the productivity of programming languages (2004)
Ken Kennedy, Charles Koelbel, Robert Schreiber, Ken Kennedy, Charles Koelbel, Robert Schreiber
The goal of programming support systems is to make it possible for application developers to produce software faster, without any degradation in software quality. However, it is essential that this...
Apan Qasem, Ken Kennedy, John Mellor-crummey
Abstract. In many cases, simple analytical models used by traditional compilers are no longer able to yield effectively optimized code for complex programs because of the enormous complexity of...
Apan Qasem, Ken Kennedy, John Mellor-crummey
Abstract. In many cases, simple analytical models used by traditional compilers are no longer able to yield effectively optimized code for complex programs because of the enormous complexity of...
Defining and measuring the productivity of programming languages (2004)
Ken Kennedy, Charles Koelbel, Robert Schreiber, Hewlett Packard Company
The overall objective of programming support systems is to make it possible to produce software faster with the same workforce, with no degradation, and possibly an improvement, in software quality....
Iterative data-flow analysis, revisited (2004)
Keith D. Cooper, Timothy J. Harvey, Ken Kennedy
The iterative algorithm is widely used to solve instances of data-flow analysis problems. The algorithm is attractive because it is easy to implement and robust in its behavior. The theory behind the...
Automatic type-driven library generation for telescoping languages (2003)
Arun Chauhan, Cheryl Mccosh, Ken Kennedy, Richard Hanson
Telescoping languages is a strategy to automatically generate highly-optimized domain-specific libraries. The key idea is to create specialized variants of library procedures through extensive...
Slice-hoisting for array-size inference in MATLAB (2003)
Abstract. Inferring variable types precisely is very important to be able to compile MATLAB libraries effectively in the context of the telescoping languages framework being developed at Rice. Past...
Because of the increasing gap between the speeds of processors and main memories, compilers must enhance the locality of applications to achieve high performance. Loop fusion enhances locality by...
Because of the increasing gap between the speeds of processors and main memories, it is critical that compilergenerated code makes effective use of the cache memory hierarchy on modern processors. To...
Fast copy coalescing and live-range identification (2002)
Zoran Budimlić, Keith D. Cooper, Timothy J. Harvey, Ken Kennedy, Timothy S. Oberg, Steven W. Reeves
This paper presents a fast new algorithm for modeling and reasoning about interferences for variables in a program without constructing an interference graph. It then describes how to use this...
Almost-whole-program compilation (2002)
Abstract. This paper presents the motivation and an implementation strategy for a new almost-whole-program compilation framework for Java programs. This framework improves the exibility of the...
Fast copy coalescing and live-range identification (2002)
Keith D. Cooper, Timothy J. Harvey, Ken Kennedy, Timothy S. Oberg, Steven W. Reeves
ABSTRACT This paper presents a fast new algorithm for modeling and rea-soning about interferences for variables in a program without constructing an interference graph. It then describes how to use...
Qing Yi, Ken Kennedy, Qing Yi, Ken Kennedy
Because of the increasing gap between the speeds of processors and main memories, compilers must enhance the locality of applications to achieve high performance. Loop fusion enhances locality by...
Fast copy coalescing and live-range identification (2002)
Zoran Budimlić, Keith D. Cooper, Timothy J. Harvey, Ken Kennedy, Timothy S. Oberg, Steven W. Reeves
This paper presents a fast new algorithm for modeling and reasoning about interferences for variables in a program without constructing an interference graph. It then describes how to use this...
Transforming Health Care Through Information Technology (2001)
Gray, James N., Hillis, W. Daniel, Kahn, Robert E., Kennedy, Ken, Miller, John P., Nagel, David C., ...
In Transforming Health Care Through Information Technology the PITAC offers six key recommendations that could significantly expand access to health care, improve its quality, reduce its costs, and...
Using Information Technology to Transform the Way We Learn (2001)
Gray, James N., Hillis, W. Daniel, Kahn, Robert E., Kennedy, Ken, Miller, John P., Nagel, David C., ...
Using Information Technology to Transform the Way We Learn highlights PITAC’s findings and recommendations on how the Federal government can provide the leadership needed to solve key information...
At Rice University, we have undertaken a project to construct a framework for generating high-level problem solving languages that can achieve high performance on a variety of platforms. The...
The GrADS project: Software support for high-level grid application development (2001)
Francine Berman, Andrew Chien, Keith Cooper, Jack Dongarra, Ian Foster, Dennis Gannon, ...
Advances in networking technologies will soon make it possible to use the global information infrastructure in a qualitatively different way—as a computational resource as well as an information...
The GrADS project: Software support for high-level grid application development (2001)
Francine Berman, Andrew Chien, Keith Cooper, Jack Dongarra, Ian Foster, Dennis Gannon, ...
Advances in networking technologies will soon make it possible to use the global information infrastructure in a qualitatively different way—as a computational as well as an information resource....
JaMake: A java compiler environment (2001)
Abstract. A language of choice for general-purpose programming, Java is quickly becoming popular in more specialized areas, such as scientic computing. However, even though the compilation...
The GrADS Project: Software Support for High-Level Grid Application Development (2001)
Francine Berman, Andrew Chien, Keith Cooper, Jack Dongarra, Ian Foster, Dennis Gannon, ...
Advances in networking technologies will soon make it possible to use the global information infrastructure in a qualitatively di#erentway---as a computational as well as an information resource. As...
Ken Kennedy, Bradley Broom, Keith Cooper, Jack Dongarra, Rob Fowler, Dennis Gannon, ...
As machines and programs have become more complex, the process of programming applications that can exploit the power of high-performance systems has become more difficult and correspondingly more...
Ken Kennedy, Bradley Broom, Keith Cooper, Jack Dongarra, Rob Fowler, Dennis Gannon, ...
As machines and programs have become more complex, the process of programming applications that can exploit the power of high-performance systems has become more difficult and correspondingly more...
Ken Kennedy, Bradley Broom, Keith Cooper, Jack Dongarra, Rob Fowler, Dennis Gannon, ...
As machines and programs have become more complex, the process of programming applications that can exploit the power of high-performance systems has become more difficult and correspondingly more...
Scalarizing fortran 90 array syntax (2001)
Array syntax, existed in many languages, adds expressive power by allowing operations on and assignments to the array sections. When compiling to a uniprocessor machine, the array statement must be...
Transforming Access to Government Information (2000)
Gray, James N., Hillis, W. Daniel, Kahn, Robert E., Kennedy, Ken, Miller, John P., Nagel, David C., ...
In Transforming Health Care Through Information Technology the PITAC offers six key recommendations that could significantly expand access to health care, improve its quality, reduce its costs, and...
Developing Open Source Software for High End Computing (2000)
Gray, James N., Hillis, W. Daniel, Kahn, Robert E., Kennedy, Ken, Miller, John P., Nagel, David C., ...
In the attached report, we focus exclusively on software development for high end computing (sometimes referred to as high-performance computing or supercomputing) because of its critical importance...
Jhychun Wang, Vikram S. Adve, Vikram S. Adve, John Mellor-crummey, John Mellor-crummey, Mark Anderson, ...
Supporting source-level performance analysis of programs written in data-parallel languages requires a unique degree of integration between compilers and performance analysis tools. Compilers for...
Overpartitioning with the Rice dHPF compiler (2000)
Bradley Broom, Daniel Chavarria-miranda, Guohua Jin, Rob Fowler, Ken Kennedy, John Mellor-crummey
Strategies for partitioning an application's data play a fundamental role in determining the range of possible parallelizations that can be performed and ultimately their potential efficiency....
Memory bandwidth bottleneck and its amelioration by a compiler (2000)
As the speed gap between CPU and memory widens, memory hierarchy has become the primary factor limiting program performance. Until now, the principal focus of hardware and software innovations has...
Memory bandwidth bottleneck and its amelioration by a compiler (2000)
Chen Ding, Chen Ding, Ken Kennedy, Ken Kennedy
As the speed gap between CPU and memory widens, memory hierarchy has become the primary factor limiting program performance. Until now, the principal focus of hardware and software innovations has...
Transforming loops to recursion for multi-level memory hierarchies (2000)
Qing Yi, Vikram Adve, Ken Kennedy
Recently, there have been several experimental and theoretical results showing signicant performance benets of recursive algorithms on both multi-level memory hierarchies and on shared-memory...
Overpartitioning with the Rice dHPF Compiler (2000)
Bradley Broom Daniel, Daniel Chavarria-miranda, Guohua Jin, Rob Fowler, Ken Kennedy, John Mellor-crummey
This paper surveys HPF extensions in the Rice dHPF compiler that implement a general mechanism for overpartitioning data. In this approach, eachphysical node is allocated multiple tiles for which...
Improving effective bandwidth through compiler enhancement of global cache reuse (2000)
Reusing data in cache is critical to achieving high performance on modern machines because it reduces the impact of the latency and bandwidth limitations of direct memory access. To date, most...
Improving effective bandwidth through compiler enhancement of global cache reuse (2000)
Reusing data in cache is critical to achieving high performance on modern machines, because it reduces the impact of the latency and bandwidth limitations of direct memory access. To date, most...
Improving memory hierarchy performance for irregular applications (1999)
John Mellor-crummey, David Whalley, Ken Kennedy
The performance of irregular applications on modern computer systems is hurt by the wide gap between CPU and memory speeds because these applications typically underutilize multi-level memory...
Prospects for Scientific Computing in Polymorphic, Object-Oriented Style (1999)
Since the introduction of the Java programming language, there has been widespread interest in the use of Java for high performance scientific computing. One major impediment to such use is the...
Chen Ding Ken, Chen Ding, Ken Kennedy
With the rapid improvement of processor speed, performance of the memory hierarchy has become the principal bottleneck for most applications. A number of compiler transformations have been developed...
Bandwidth-Based Performance Tuning and Prediction (1999)
Chen Ding, Chen Ding, Ken Kennedy, Ken Kennedy
As the speed gap widens between CPU and memory, memory hierarchy performance has become the bottleneck for most applications. This is due in part to the difficulty of fully utilizing the deep and...
With the rapid improvement of processor speed, performance of the memory hierarchy has become the principal bottleneck for most applications. A number of compiler transformations have been developed...
Inter-array Data Regrouping (1999)
As the speed gap between CPU and memory widens, memory hierarchy has become the performance bottleneck for most applications because of both the high latency and low bandwidth of direct memory...
Inter-array Data Regrouping (1999)
As the speed gap between CPU and memory widens, memory hierarchy has become the performance bottleneck for most applications because of both the high latency and low bandwidth of direct memory...
Improving memory hierarchy performance for irregular applications (1999)
John Mellor-crummey, David Whalley, Ken Kennedy
The gap between CPU speed and memory speed in modern computer systems is widening as new generations of hardware are introduced. Loop blocking and prefetching transformations help bridge this gap for...
Inter-array data regrouping (1999)
Abstract. As the speed gap between CPU and memory widens, memory hierarchy has become the performance bottleneck for most applications because of both the high latency and low bandwidth of direct...
Compiler Analysis for Irregular Problems in Fortran D. (1998)
VON Hanxleden, Reinhard, Kennedy, Ken, Koelbel, Charles, Das, Raja, Saltz, Joel
We developed a dataflow framework which provides a basis for rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs,...
Compiling Scientific Programs for Scalable Parallel Systems (1998)
Kennedy, Ken, Mellor-Crummey, John, Jin, Guohua, Adve, Vikram, Fowler, Robert J.
This report details research into compiler technology to support machine-independent data parallel programming for scientific application. The investigation focused on focused on design and...
Loop fusion in High Performance Fortran (1998)
In this paper we investigate a unique problem associated with fusing loops within a High Performance Fortran (HPF) program. In particular, we discuss the issue of performing loop fusion in an HPF...
Static interprocedural optimizations in Java (1998)
Zoran Budimlic, Zoran Budimlic, Ken Kennedy, Ken Kennedy
Interprocedural optimizations are important in Java because the object-oriented programming style encourages the use of many small methods. Unfortunately, such optimizations are difficult because of...
Memory-Bandwidth Based Performance Tuning and Prediction (1998)
Chen Ding, Chen Ding, Ken Kennedy, Ken Kennedy
It is the contention of this paper that memory bandwidth has become the single most important determinant of performance on modern computer systems built from commodity processors. This contention is...
Loop Fusion in High Performance Fortran (1998)
Gerald Roth, Gerald Roth, Ken Kennedy, Ken Kennedy
In this paper we investigate a unique problem associated with fusing loops within a High Performance Fortran (HPF) program. In particular, we discuss the issue of performing loop fusion in an HPF...
Automatic data layout for distributed-memory machines (1998)
The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. After the algorithm selection, the data...
Static interprocedural optimizations in Java (1998)
Interprocedural optimizations are important in Java because the object-oriented programming style encourages the use of many small methods. Unfortunately, such optimizations are difficult because of...
Compiling stencils in high performance fortran (1997)
John Mcllor-crummcy, Gerald Roth, Gerald Roth, John Mellor-crummey, R. Gregg Brickner, R. Gregg Brickner, ...
For many Fortran90 and HPF programs performing dense matrix computations, the main computational portion of the program belongs to a class of kernels known as stencils. Stencil computations are...
Optimizing Java - Theory and Practice (1997)
The enormous popularity of the Internet has made an instant star of the Java programming language. Java's portability, reusability, security and clean design, has made it the language of choice...
Compiler Techniques for Software Prefetching on Cache-Coherent Shared-Memory Multiprocessors (1997)
Nathaniel Mcintosh Katherine, Katherine Fletcher, Keith Cooper, Ken Kennedy
In this paper we present a comprehensive compiler framework for improving the efficiency of compilerdirected software prefetching on cache-coherent distributed shared-memory multiprocessors. The key...
Compiling Stencils in High Performance Fortran (1997)
Gerald Roth, Gerald Roth, John Mellor-crummey, John Mellor-crummey, Ken Kennedy, Ken Kennedy, ...
For many Fortran90 and HPF programs performing dense matrix computations, the main computational portion of the program belongs to a class of kernels known as stencils. Stencil computations are...
Compiling Stencils in High Performance Fortran (1997)
Gerald Roth, John Mellor-crummey, Ken Kennedy, R. Gregg Brickner
For many Fortran90 and HPF programs performing dense matrix computations, the main computational portion of the program belongs to a class of kernels known as stencils. Stencil computations are...
Cross-loop reuse analysis and its application to cache optimizations (1996)
Keith Cooper, Ken Kennedy, Nathaniel Mcintosh
Abstract. In this paper we describe the design of a data-flow framework for detecting cross-loop reuse. Cross-loop reuse takes place when a set of data items or cache lines is accessed in a given...
Dependence analysis of Fortran90 array syntax (1996)
Dependence analysis and dependence information are critical components of many optimizing and parallelizing compilers. And there exist many fast and precise dependence tests that work on...
Interprocedural Compilation of Fortran D (1996)
Mary W. Hall, Mary W. Hall, Seema Hiranandani, Seema Hiranandani, Ken Kennedy, Ken Kennedy, ...
Fortran D is a version of Fortran extended with data decomposition specifications. It is designed to provide a machine-independent programming model for data-parallel applications and has heavily...
A Communication Placement Framework with Unified Dependence and Data-flow Analysis (1996)
Communication placement analysis is an important step in the compilation of data-parallel programs for multiprocessor systems. This paper presents a communication placement framework that minimizes...
Value-Driven Redundancy Elimination (1996)
Ken Kennedy, Noah Harding Professor, Loren Taylor Simpson, Loren Taylor Simpson
Value-driven redundancy elimination is a combination of value numbering and code motion. Value numbering is an optimization that assigns numbers to values in such a way that two values are assigned...
Optimal Register Assignment to Loops for Embedded Code Generation (1996)
David J. Kolson, Alexandru Nicolau, Nikil Dutt, Ken Kennedy
this article, we present a technique for optimal (i.e., spill minimizing) register assignment to loops. First we present a technique for register assignment to architecture styles that are...
Resource-Based Communication Placement Analysis (1996)
. Communication placement analysis is an important step in the compilation of data-parallel programs. However, to simplify the placement analysis, previous techniques ignored most machine-dependent...
Dependence Analysis of Fortran90 Array Syntax (1996)
Gerald Roth, Gerald Roth, Ken Kennedy, Ken Kennedy
Dependence analysis and dependence information are critical components of many optimizing and parallelizing compilers. And there exist many fast and precise dependence tests that work on...
A General Stencil Compilation Strategy for Distributed-Memory Machines (1996)
Gerald Roth, Gerald Roth, Steve Carr, Steve Carr, John Mellor-crummey, John Mellor-crummey, ...
For many Fortran 90 programs performing dense matrix computations, the main computational portion of the program belongs to a class of kernels known as stencils. This paper describes a strategy for...
A Method for Register Allocation to Loops in Multiple Register File Architectures (1996)
David Kolson, Alexandru Nicolau, Nikil Dutt, Ken Kennedy
Multiple instruction issue processors place high demands on register file bandwidth. One solution to reduce this bottleneck is the use of multiple register files. Register allocation for these...
A model and compilation strategy for out-of-core data parallel programs (1995)
Rajesh Bordawekar, Alok Choudhary, Ken Kennedy, Charles Koelbel, Michael Paleczny
It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We present a data storage...
Automatic data layout for High Performance Fortran (1995)
Ulrich Krcmcr, Ken Kennedy, Ken Kennedy, Ulrich Kremer
High Performance Fortran (HPF) is rapidly gaining acceptance as a language for parallel programming. The goal of HPF is to provide a simple yet ecient machine independent parallel programming model....
Automatic data layout for High Performance Fortran (1995)
High Performance Fortran (HPF) is rapidly gaining acceptance as a language for parallel programming. The goal of HPF is to provide a simple yet efficient machine independent parallel programming...
Experiences in data-parallel programming (1995)
Terry W. Clark, Reinhard V. Hanxleden, Ken Kennedy
yz
An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs (1995)
Vikram Adve, John Mellor-crummey, Mark Anderson, Ken Kennedy, Jhy-chun Wang, Daniel A. Reed
Supporting source-level performance analysis of programs written in data-parallel languages requires a unique degree of integration between compilers and performance analysis tools. Compilers for...
National HPCC Software Exchange (1995)
Shirley Browne Jack, Jack Dongarra, Stan Green, Keith Moore, Tom Rowan, Reed Wade, ...
This report describes an effort to construct a National HPCC Software Exchange (NHSE). This system shows how the evolving National Information Infrastructure (NII) can be used to facilitate sharing...
Integer Programming for Array Subscript Analysis (1995)
: We present a new method to determine whether a convex region contains any integer points. The method is designed for array subscript analysis in parallel programs. The general problem is whether a...
National HPCC Software Exchange (1995)
Shirley Browne Jack, Jack Dongarra, Stan Green, Keith Moore, Tom Rowan, Reed Wade, ...
This report describes an effort to construct a National HPCC Software Exchange (NHSE). This system shows how the evolving National Information Infrastructure (NII) can be used to facilitate sharing...
A Model and Compilation Strategy for Out-of-Core Data Parallel Programs (1995)
Rajesh Bordawekar Alok, Rajesh Bordawekar, Alok Choudhary, Alok Choudhary, Ken Kennedy, Ken Kennedy, ...
It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We present a data storage...
A Model and Compilation Strategy for Out-of-Core Data Parallel Programs (1995)
Rajesh Bordawekar Alok, Alok Choudhary, Ken Kennedy, Charles Koelbel, Mike Paleczny
It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We present a data storage...
National HPCC Software Exchange (1995)
Shirley Browne, Jack Dongarra, Stan Green, Keith Moore, Tom Rowan, Reed Wade, ...
This report describes an effort to construct a National HPCC Software Exchange (NHSE). This system shows how the evolving National Information Infrastructure (NII) can be used to facilitate sharing...
Means of Achieving Cross-Program Focus, Coordination, and Technology Transfer (1995)
Shirley Browne, Jack Dongarra, Geoffrey Fox, Ken Kennedy
This paper is in response to the National Science and Technology Council (NSTC) Committee on Information and Communications R&D (CIC) call for white papers on the CIC's Strategic...
A Manual for the CHAOS Runtime Library (1995)
Shamik D. Sharma, Joel Saltz, Joel Saltz, Raja Das, Raja Das, Yuan-shin Hwang, ...
Procedures are presented that are designed to help users efficiently program irregular problems (e.g. unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations...
Management of the NHSE - a Virtual Distributed Digital Library (1995)
Shirley Browne, Jack Dongarra, Tom Rowan, Ken Kennedy
The National HPCC Software Exchange (NHSE) is a distributed collection of software, documents, and data of interest to the high performance computing community. Our experiences with the design and...
National HPCC Software Exchange (1995)
Shirley Browne, Jack Dongarra, Stan Green, Keith Moore, Tom Rowan, Reed Wade, ...
This report describes an effort to construct a National HPCC Software Exchange (NHSE). This system shows how the evolving National Information Infrastructure (NII) can be used to facilitate sharing...
Combining Dependence and Data-Flow Analyses to Optimize Communication (1995)
Reducing communication overhead is crucial for improving the performance of programs on distributed-memory machines. Compilers for data-parallel languages must perform communication optimizations in...
Compiler Support for Out-of-Core Arrays on Parallel Machines (1995)
Michael Paleczny, Michael Paleczny, Ken Kennedy, Ken Kennedy, Charles Koelbel, Charles Koelbel
Many computational methods are currently limited by the size of physical memory, the latency of disk storage, and the difficulty of writing an efficient outof -core version of the application. We are...
An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs (1995)
Jhu-chun Wang, Vikram Adve, Vikram S. Adve, John Mellor-crummey, John Mellor-crummey, Mark Anderson, ...
To support the transition from programming languages in which parallelism and communication are explicit to high-level languages that rely on compilers to infer such details from data decomposition...
A Model and Compilation Strategy for Out-of-Core Data Parallel Programs (1995)
Rajesh Bordawekar, Alok Choudhary, Ken Kennedy, Charles Koelbel, Michael Paleczny
It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We present a data storage...
Compiler Support for Out-of-Core Arrays on Parallel Machines (1995)
Michael Paleczny, Ken Kennedy, Charles Koelbel
Many computational methods are currently limited by the size of physical memory, the latency of disk storage, and the difficulty of writing an efficient outof -core version of the application. We are...
A Manual for the CHAOS Runtime Library (1995)
Shamik D. Sharma, Joel Saltz, Joel Saltz, Raja Das, Raja Das, Yuan-shin Hwang, ...
Procedures are presented that are designed to help users efficiently program irregular problems (e.g. unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations...
Optimizing Fortran 90 Shift Operations on Distributed-Memory Multicomputers (1995)
Distributed-memory Multicomputers, Ken Kennedy, Ken Kennedy, John Mellor-crummey, John Mellor-crummey, Gerald Roth, ...
. When executing Fortran 90 style data-parallel array operations on distributed-memory multiprocessors, intraprocessor data movement due to shift operations can account for a significant fraction of...
Efficient Address Generation for Block-Cyclic Distributions (1995)
Ken Kennedy, Ken Kennedy, Ajay Sethi, Ajay Sethi
Data-parallel languages, such as High Performance Fortran, are designed to make programming of distributed-memory machines easier, and resulting programs more portable and efficient. Advanced...
Automatic Data Layout for High Performance Fortran (1995)
Ken Kennedy, Ken Kennedy, Ulrich Kremer, Ulrich Kremer
High Performance Fortran (HPF) is rapidly gaining acceptance as a language for parallel programming. The goal of HPF is to provide a simple yet efficient machine independent parallel programming...
An Empirical Study of Cross-loop Reuse in the NAS benchmarks (1995)
Keith Cooper, Keith Cooper, Ken Kennedy, Ken Kennedy, Nathaniel Mcintosh, Nathaniel Mcintosh
This paper describes an empirical study designed to quantify the level of crossloop reuse occurring in a set of scientific Fortran programs, the NAS Benchmarks. Cross-loop reuse takes place when a...
An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs (1995)
Vikram Adve, John Mellor-crummey, Mark Anderson, Ken Kennedy, Jhy-chun Wang, Daniel A. Reed
Supporting source-level performance analysis of programs written in data-parallel languages requires a unique degree of integration between compilers and performance analysis tools. Compilers for...
A Linear-Time Algorithm for Computing the Memory Access Sequence in Data-Parallel Programs (1995)
Ken Kennedy, Nenad Nedeljkovic, Ken Kennedy, Ajay Sethi, Ajay Sethi
Data-parallel languages, such as High Performance Fortran, are designed to facilitate writing of portable programs for distributed-memory machines. Novel features of these languages call for...
A Model and Compilation Strategy for Out-of-Core Data Parallel Programs (1995)
Charles Koelbel, Mike Paleczny, Rajesh Bordawekar, Rajesh Bordawekar, Alok Choudhary, Alok Choudhary, ...
It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We present a data storage...
Shirley Browne, Jack Dongarra, Stan Green, Keith Moore, Tom Rowan, Reed Wade, ...
This report describes an e ort to construct a National HPCC Software Exchange (NHSE). This system shows how the evolving National Information Infrastructure (NII) can be used to facilitate sharing of...
Management of the NHSE { a Virtual Distributed Digital Library (1995)
Shirley Browne Y, Jack Dongarra, Tom Rowan, Ken Kennedy
The National HPCC Software Exchange (NHSE) is a distributed collection of software, documents, and data of interest to the high performance computing community. Our experiences with the design and...
Index array flattening through program transformation (1995)
Raja Das, Paul Havlak, Joel Saltz, Ken Kennedy
This paper presents techniques for compiling loops with complex, indirect array accesses into loops whose array references have at most one level of indirection. The transformation allows prefetching...
Scalar replacement in the presence of conditional control flow (1994)
Most conventional compilers fail to allocate array elements to registers because standard data- ow analysis treats arrays like scalars, making it impossible to analyze the denitions and uses of...
Scalar replacement in the presence of conditional control flow (1994)
Most conventional compilers fail to allocate array elements to registers because standard dataflow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of...
Give-N-Take --- A balanced code placement framework (1994)
Reinhard V. Hanxleden, Reinhard Hanxleden, Ken Kennedy, Ken Kennedy
GIVE-N-TAKE is a code placement framework which uses a general producer-consumer concept. An advantage of GIVE-N-TAKE over existing partial redundancy elimination techniques is its concept of...
Context optimization for SIMD execution (1994)
One issue that SIMD compilers must address is generating code to change the machine context; i.e., disabling processors not involved in the current computation. We present two compiler optimizations...
Evaluating Compiler Optimizations For Fortran D (1994)
Seema Hiranandani, Ken Kennedy, Chau-wen Tseng, Chau-wen Tseng
The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a...
Improving the Ratio of Memory Operations to Floating-Point Operations in Loops (1994)
this paper we attempt to answer that question. To do so, we develop and evaluate techniques that automatically restructure program loops to achieve high performance on specific target architectures....
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution (1994)
Kathryn Mckinley, Ken Kennedy, Ken Kennedy
. Loop fusion is a program transformation that merges multiple loops into one. It is effective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper...
Automatic Data Layout for High Performance Fortran (1994)
Ken Kennedy, Ken Kennedy, Ulrich Kremer, Ulrich Kremer
High Performance Fortran (HPF) is rapidly gaining acceptance as a language for parallel programming. The goal of HPF is to provide a simple yet efficient machine independent parallel programming...
Automatic Data Layout Using 0-1 Integer Programming (1994)
Robert Bixby, Ken Kennedy, Ulrich Kremer
: The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. By shifting much of the burden of...
Modeling Parallel Computation (1994)
Lorie M. Liebrock, Ken Kennedy
Introduction We attempt to model parallel computations that are composed of one or more irregularly coupled regular meshes (ICRMs). The model will be used to determine whether a distribution of an...
Context Optimization for SIMD Execution (1994)
Ken Kennedy, Ken Kennedy, Gerald Roth, Gerald Roth
One issue that SIMD compilers must address is generating code to change the machine context; i.e., disabling processors not involved in the current computation. We present two compiler optimizations...
Integrated Support for Task and Data Parallelism (1994)
Mani Chandy, Ian Foster, Ken Kennedy, Charles Koelbel, Chau-wen Tseng
We present an overview of research at the CRPC designed to provide an efficient, portable programming model for scientific applications possessing both task and data parallelism. Fortran M programs...
Automatic Data Layout for High Performance Fortran (1994)
Ken Kennedy, Ken Kennedy, Ulrich Kremer, Ulrich Kremer
High Performance Fortran (HPF) is rapidly gaining acceptance as a language for parallel programming. The goal of HPF is to provide a simple yet efficient machine independent parallel programming...
Context Optimization for SIMD Execution (1994)
Ken Kennedy, Ken Kennedy, Gerald Roth, Gerald Roth
SIMD architectures offer an alternative to MIMD architectures for obtaining high performance computation through parallelism. However, to obtain the best performance on such architectures, aggressive...
Value-Based Distributions and Alignments in Fortran D (1994)
Reinhard Von Hanxleden, Reinhard V. Hanxleden, Ken Kennedy, Ken Kennedy, Joel Saltz, Joel Saltz
Compiling irregular applications written in a data-parallel language, like Fortran D or High Performance Fortran (HPF), presents a challenge of growing importance. A major component of data-parallel...
Automatic Data Layout Using 0-1 Integer Programming (1994)
Robert Bixby, Robert Bixby, Ken Kennedy, Ken Kennedy, Ulrich Kremer, Ulrich Kremer
: The goal of languages like Fortran D or HPF is to provide a simple yet efficient machineindependent parallel programming model. By shifting much of the burden of machine-dependent optimization to...
Compilation Techniques for Block-Cyclic Distributions (1994)
Seema Hiranandani, Seema Hiranandani, Ken Kenney, Ken Kennedy, John Mellor-crummey, John Mellor-crummey, ...
Compilers for data-parallel languages such as Fortran D and High-Performance Fortran use data alignment and distribution specifications as the basis for translating programs for execution on MIMD...
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution (1994)
Ken Kennedy, Kathryn S. McKinley
. Loop fusion is a program transformation that merges multiple loops into one. It is effective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper...
Experiences on Data-Parallel Programming (1994)
Reinhard Von Hanxleden, Terry W. Clark, Terry W. Clark, Reinhard Hanxleden, Ken Kennedy, Ken Kennedy
To parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, or conversely, the absence of others. We have encountered a number...
Give-N-Take - A Balanced Code Placement Framework (1994)
Reinhard Von Hanxleden, Reinhard Hanxleden, Ken Kennedy, Ken Kennedy
Give-N-Take is a code placement framework which uses a general producer-consumer concept. An advantage of Give-N-Take over existing partial redundancy elimination techniques is its concept of...
Automatic Data Layout for High Performance Fortran (1994)
High Performance Fortran (HPF) is rapidly gaining acceptance as a language for parallel programming. The goal of HPF is to provide a simple yet efficient machine independent parallel programming...
The D Editor: A New Interactive Parallel Programming Tool (1994)
Scott Warren, Seema Hiranandani, Seema Hiranandani, Ken Kennedy, Ken Kennedy, Chau-wen Tseng, ...
Fortran D and High Performance Fortran are languages designed to support efficient data-parallel programming on a variety of parallel architectures. The goal of the D Editor is to provide a tool that...
Modeling Parallel Computation (1994)
Introduction We attempt to model parallel computations that are composed of one or more irregularly coupled regular meshes (ICRMs). The model will be used to determine whether a distribution of an...
Improving the ratio of memory operations to floating-point operations in loops (1994)
Over the past decade, microprocessor design strateges have focused on mcreaslng the computa-tional power on a single chip Because computations often reqmre more data from cache per floatmg-pomt...
Compiler Technology for Machine-Independent Parallel Programming (1994)
Center for Research on Parallel Computation Historically, the principal achievement of compiler technology has been to make it possible to program in a high-level, machine-independent style. The...
Typed fusion with applications to parallel and sequential code generation (1993)
Ken Kennedy, Kathryn S. Mckinley
Loop fusion is a program transformation that merges multiple loops into one and is an effective optimization both for increasing the granularity of parallel loops and for improving data locality....
Preliminary experiences with the Fortran D compiler (1993)
Seema Hiranandani, Seema Hiranandani, Ken Kennedy, Ken Kennedy, Chau-wen Tseng
Fortran D is a version of Fortran enhanced with data decomposition specifications. Case studies illustrate strengths and weaknesses of the prototype Fortran D comprier when compiling hnear algebra...
Preliminary experiences with the Fortran D compiler (1993)
Seema Hiranandani, Seema Hiranandani, Ken Kennedy, Ken Kennedy, Chau-wen Tseng, Chau-wen Tseng
Fortran D is a version of Fortran enhanced with data decomposition specifications. Case studies illustrate strengths and weaknesses of the prototype Fortran D compiler when compiling linear algebra...
The ParaScope Editor is a new kind of interactive parallel programming tool for developing scientific Fortran programs. It assists the knowledgeable user by displaying the results of sophisticated...
Maximizing loop parallelism and improving data locality via loop fusion and distribution (1993)
Abstract. Loop fusion is a program transformation that merges multiple loops into one. It is effective for reducing the synchronization overhead of parallel loops and for improving data locality....
Integrated Support for Task and Data Parallelism (1993)
K. Mani Chandy, Ian Foster, Ken Kennedy, Charles Koelbel, Chau-wen Tseng
We present an overview of research at the CRPC designed to provide an efficient, portable programming model for scientific applications possessing both task and data parallelism. Fortran M programs...
Parallelization of Linearized Applications in Fortran D (1993)
Lorie M. Liebrock, Ken Kennedy
The Fortran D language extends Fortran by permitting the user to specify the distribution of array variables across the processors of a parallel computer system. This information can then be used by...
Automatic data layout for distributed-memory machines in the D programming environment (1993)
Ulrich Kremer, John Mellor-Crummey, Ken Kennedy, Alan Carle
Although distributed-memory message-passing parallel computers are among the most costeffective high performance machines available, scientists find them extremely difficult to program. Most...
Experiences Using the ParaScope Editor: an Interactive Parallel Programming Tool (1993)
Nathaniel Mcintosh, Kathryn S. Mckinley, Jeffery D. Oldham, Michael Paleczny, Mary Hall, Mary W. Hall, ...
The ParaScope Editor is an interactive parallel programming tool that assists knowledgeable users in developing scientific Fortran programs. It displays the results of sophisticated program analyses,...
Cache Coherence Using Local Knowledge (1993)
Ervan Darnell, Ervan Darnell, Ken Kennedy, Ken Kennedy
Typically, commercially available shared memory machines have addressed the cache coherence problem with hardware strategies based on global inter-cache communication. However, global communication...
Value-Based Distributions in Fortran D: A Preliminary Report (1993)
Reinhard Von Hanxleden, Reinhard V. Hanxleden, Ken Kennedy, Ken Kennedy, Joel Saltz, Joel Saltz
Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular...
The ParaScope Parallel Programming Environment (1993)
Keith Cooper, Mary W. Hall, Robert T. Hood, Ken Kennedy, Kathryn S. M, Kinley John, ...
The ParaScope parallel programming environment, developed to support scientific programming of sharedmemory multiprocessors, includes a collection of tools that use global program analysis to help...
Compiler Technology for Machine-Independent Parallel Programming (1993)
Historically, the principal achievement of compiler technology has been to make it possible to program in a high-level, machine-independent style. The absence of compiler technology to provide such a...
Experiences Using the ParaScope Editor: an Interactive Parallel Programming Tool (1993)
Mary W. Hall, Timothy J. Harvey, Ken Kennedy, Nathaniel McIntosh, Kathryn S. McKinley, Jeffrey D. Oldham, ...
The ParaScope Editor is an interactive parallel programming tool that assists knowledgeable users in developing scientific Fortran programs. It displays the results of sophisticated program analyses,...
A Code Placement Framework and its Application to Communication Generation (1993)
Reinhard Von Hanxleden, Reinhard Hanxleden, Ken Kennedy, Ken Kennedy
We present a dataflow framework that extends classical partial redundancy elimination techniques and uses a general producer-consumer concept. Consumers express a demand for certain computations to...
Unified Compilation of Fortran 77D and 90D (1993)
Alok Choudhary, Geoffrey Fox, Seema Hiranandani, Ken Kennedy, Charles Koelbel, Sanjay Ranka, ...
this paper, we describe a unified strategy for compiling both Fortran 77D and Fortran 90D into efficient
Preliminary Experiences with the Fortran D Compiler (1993)
Seema Hiranandani, Seema Hiranandani, Ken Kenney, Ken Kennedy, Chuau-Web Tseng, Chau-wen Tseng
Fortran D is a version of Fortran enhanced with data decomposition specifications. Case studies illustrate strengths and weaknesses of the prototype Fortran D compiler when compiling linear algebra...
Analysis and Transformation in an Interactive Parallel Programming Tool (1993)
Ken Kennedy, Kathryn S. McKinley, Chau-wen Tseng
The ParaScope Editor is a new kind of interactive parallel programming tool for developing scientific Fortran programs. It assists the knowledgeable user by displaying the results of sophisticated...
Scalable I/O for Out-of-Core Structures (1993)
Mike Paleczny, Ken Kennedy, Ken Kennedy, Charles Koelbel, Charles Koelbel, Mike Paleczny T
Many computational methods are currently limited by the size of physical memory, the latency of disk storage, and the ditticulty of writing an etticient out-of-core version of the application....
Typed fusion with applications to parallel and sequential code generation (1993)
Ken Kennedy, Kathryn S. Mckinley
Loop fusion is a program transformation that merges multiple loops into one and is an effective optimization both for increasing the granularity of parallel loops and for improving data locality....
Automatic Software Cache Coherence through Vectorization (1992)
Ervan Darnell, Ervan Darnell, Ken Kennedy, Ken Kennedy
Access latency in large-scale shared-memory multiproces-sors is a concern since most (if not all) memory is one or more hops away through an interconnection network. Providing processors with one or...
Compiling Fortran D for MIMD distributed-memory machines (1992)
Mary W. Hall, Mary W. Hall, Seema Hiranandani, Seema Hiranandani, Ken Kennedy, Ken Kennedy, ...
Algorithms exist for compiling Fortran D for MIMD distributed-memory machines, but are significantly restricted in the presence of procedure calls. This paper presents interprocedural analysis,...
Compiler analysis for irregular problems in Fortran D (1992)
Reinhard Hanxleden, Ken Kennedy, Charles Koelbel, Raja Das, Joel Saltz Y
We developed a data ow framework which provides a basis for rigorously de ning strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs,...
Compiler Analysis for Irregular Problems in Fortran D (1992)
Reinhard Von Hanxleden, Ken Kennedy, Charles Koelbel, Raja Das, Joel Saltz R
We developed a dataflow framework which provides a basis for rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs,...
Keith Cooper Mary, Mary W. Hall, Ken Kennedy
Procedure cloning is an interprocedural optimization where the compiler creates specialized copies of procedure bodies. To clone a procedure, the compiler replicates it and then divides the incoming...
Compiling Fortran D for MIMD Distributed-Memory Machines (1992)
Seema Hiranandani, Ken Kennedy, Chau-Wen Tseng
Fortran D, a version of Fortran extended with data decomposition specifications, is designed to provide a machine-independent data-parallel programming model. This paper describes analysis,...
Evaluating Parallel Languages for Molecular Dynamics Computations (1992)
Terry W. Clark, Reinhard Von Hanxleden, Ken Kennedy, Charles Koelbel, L. Ridgway Scott
Computational molecular dynamics is an important application requiring large amounts of computing time. Parallel processing offers the possibility of much better performance on scientific...
Optimizing for Parallelism and Data Locality (1992)
Previous research has used program transformation to introduce parallelism and to exploit data locality. Unfortunately, these two objectives have usually been considered independently. This work...
Scalar Replacement in the Presence of Conditional Control Flow (1992)
Steve Carr, Steve Carr, Steve Carr, Ken Kennedy, Ken Kennedy, Ken Kennedy
This paper presents a source-to-source transformation, called scalar replacement, that finds opportunities for reuse of subscripted variables and replaces the references involved by references to...
Evaluation of Compiler Optimizations for Fortran D on MIMD Distributed-Memory Machines (1992)
Seema Hiranandani, Seema Hiranandani, Ken Kennedy, Ken Kennedy, Chau-wen Tseng, Chau-wen Tseng
The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a...
Compiling Fortran D for MIMD Distributed-Memory Machines (1992)
Seema Hiranandani, Ken Kennedy, Chau-wen Tseng
Fortran D, a version of Fortran extended with data decomposition specifications, is designed to provide a machine-independent data-parallel programming model. This paper describes analysis,...
Compiling Fortran 77D and 90D for MIMD Distributed-Memory Machines (1992)
Alok Choudhary, Alok Choudhary, Seema Hiranandani, Seema Hiranandani, Geoffrey Fox, Geoffrey Fox, ...
We present an integrated approach to compiling Fortran 77D and Fortran 90D programs for efficient execution on MIMD distributed-memory machines. The integrated Fortran D compiler relies on two key...
Optimizing for Parallelism and Data Locality (1992)
Kathryn Mckinley, Ken Kennedy, Ken Kennedy
Previous research has used program transformation to introduce parallelism and to exploit data locality. Unfortunately, these two objectives have usually been considered independently. This work...
Software Support for Irregular and Loosely Synchronous Problems (1992)
Alok Choudhary, Alok Choudhary, Geoffrey Fox, Geoffrey Fox, Sanjay Ranka, Sanjay Ranka, ...
A large class of scientific and engineering applications may be classified as irregular and loosely synchronous from the perspective of parallel processing. We present a partial classification of...
Efficient Call Graph Analysis (1992)
Mary Hall, Mary W. Hall, Ken Kennedy, Ken Kennedy
We present an efficient algorithm for computing the procedure call graph, the program representation underlying most interprocedural optimization techniques. The algorithm computes the possible...
Automatic Software Cache Coherence through Vectorization (1992)
Ervan Darnell, Ervan Darnell, Ken Kennedy, Ken Kennedy
Access latency in large-scale shared-memory multiprocessors is a concern since most (if not all) memory is one or more hops away through an interconnection network. Providing processors with one or...
Relaxing SIMD Control Flow Constraints using Loop Transformations (1992)
Reinhard Von Hanxleden, Reinhard Hanxleden, Ken Kennedy, Ken Kennedy
Many loop nests in scientific codes contain a parallelizable outer loop but have an inner loop for which the number of iterations varies between different iterations of the outer loop. When running...
Scalar Replacement in the Presence of Conditional Control Flow (1992)
Most conventional compilers fail to allocate array elements to registers because standard data-flow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of...
Interprocedural Compilation of Fortran D for MIMD Distributed-Memory Machines (1992)
Mary W. Hall, Mary W. Hall, Seema Hiranandani, Seema Hiranandani, Ken Kennedy, Ken Kennedy, ...
Algorithms exist for compiling Fortran D for MIMD distributed-memory machines, but are significantly restricted in the presence of procedure calls. This paper presents interprocedural analysis,...
Evaluating Parallel Languages for Molecular Dynamics Computations (1992)
Reinhard Von Hanxleden, Terry W. Clark, Terry W. Clark, Reinhard Hanxleden, Ken Kennedy, Ken Kennedy, ...
Computational molecular dynamics is an important application requiring large amounts of computing time. Parallel processing offers very high performance potential, but irregular problems like...
Compiler Analysis for Irregular Problems in Fortran D (1992)
Reinhard Von Hanxleden, Fortran D, Reinhard Hanxleden, Ken Kennedy, Ken Kennedy, Charles Koelbel, ...
Many parallel programs require run-time support to implement the communication caused by indirect data references. In previous work, we have developed the inspector-executor paradigm to handle these...
Compiler Blockability of Numerical Algorithms (1992)
Steve Carr, Steve Carr, Ken Kennedy, Ken Kennedy
Over the past decade, microprocessor design strategies have focused on increasing the computational power on a single chip. Unfortunately, memory speeds have not kept pace. The result is an imbalance...
Practical dependence testing (1991)
Gina Goff, Gina Goff, Ken Kennedy, Ken Kennedy, Chau-wen Tseng, Chau-wen Tseng
Precise and efficient dependence tests are essential to the effectiveness of a parallelizing compiler. This paper proposes a dependence testing scheme based on classifying pairs of subscripted...
Compiling scientific code for complex memory hierarchies (1991)
The trend in high-performance microprocessor design is toward increasing computational power on the chip. At the same time, memory size is increasing but memory speed is not. The result is an...
An implementation of interprocedural bounded regular section analysis (1991)
Paul Havlak, Paul Havlak, Ken Kennedy, Ken Kennedy
Optimizing compilers should produce efficient code even in the presence of high-level language constructs. However, current programming support systems are significantly lacking in their ability to...
Compiler optimizations for Fortran D on MIMD distributed-memory machines (1991)
Seema Hiranandani, Seema Hiranandani, Ken Kennedy, Ken Kennedy, Chau-wen Tseng, Chau-wen Tseng
The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a...
Interactive parallel programming using the ParaScope editor (1991)
Kathryn S. Mckinley, Ken Kennedy, Ken Kennedy, Kathryn M Kinley, Chau-wen Tseng, Chau-wen Tseng
The ParaScope project is developing an integrated collection of tools to help scientific programmers implement correct and efficient parallel programs. The centerpiece of this collection is the...
Compiler Optimizations for Fortran D on MIMD Distributed-Memory Machines (1991)
Seema Hiranandani Ken, Seema Hiranandani, Ken Kennedy, Ken Kennedy, Chau-wen Tseng, Chau-wen Tseng
Massively parallel MIMD distributed-memory machines can provide enormous computation power. However, the difficulty of developing parallel programs for these machines has limited their accessibility....
An Overview of the Fortran D Programming System (1991)
Seema Hiranandani Ken, Seema Hiranandani, Ken Kennedy, Ken Kennedy, Charles Koelbel, Charles Koelbel, ...
The success of large-scale parallel architectures is limited by the difficulty of developing machineindependent parallel programs. We have developed Fortran D, a version of Fortran extended with data...
Ken Kennedy Kathryn S. McKinley Chau-Wen Tseng (1991)
Ken Kennedy, Kathryn S. Mckinley, Chau-wen Tseng
The ParaScope Editor is a new kind of interactive parallel programming tool for developing scientific Fortran programs. It assists the knowledgeable user by displaying the results of sophisticated...
An Implementation of Interprocedural Bounded Regular Section Analysis (1991)
Optimizing compilers should produce efficient code even in the presence of high-level language constructs. However, current programming support systems are significantly lacking in their ability to...
Practical Dependence Testing (1991)
Gina Goff, Gina Goff, Ken Kennedy, Ken Kennedy, Chau-wen Tseng, Chau-wen Tseng
Precise and efficient dependence tests are essential to the effectiveness of a parallelizing compiler. This paper proposes a dependence testing scheme based on classifying pairs of subscripted...
Analysis and Transformation in the ParaScope Editor (1991)
Ken Kennedy, Kathryn S. McKinley, Chau-wen Tseng
The ParaScope Editor is a new kind of interactive parallel programming tool for developing scientific Fortran programs. It assists the knowledgeable user by displaying the results of sophisticated...
Interactive Parallel Programming Using the ParaScope Editor (1991)
Kathryn S. Mckinley, Ken Kennedy, Ken Kennedy, Kathryn M Kinley, Chau-wen Tseng, Chau-wen Tseng
The ParaScope project is developing an integrated collection of tools to help scientific programmers implement correct and efficient parallel programs. The centerpiece of this collection is the...
Compiler Support for Machine-Independent Parallel Programming in Fortran D (1991)
Seema Hiranandani, Seema Hiranandani, Ken Kennedy, Ken Kennedy, Chau-wen Tseng, Chau-wen Tseng
Because of the complexity and variety of parallel architectures, an efficient machine-independent parallel programming model is needed to make parallel computing truly usable for scientific...
An Implementation of Interprocedural Bounded Regular Section Analysis (1991)
Paul Havlak, Paul Havlak, Ken Kennedy, Ken Kennedy
Optimizing compilers should produce efficient code even in the presence of high-level language constructs. However, current programming support systems are significantly lacking in their ability to...
Interactive Parallel Programming Using the ParaScope Editor (1991)
Ken Kennedy, Kathryn McKinley, Chau-wen Tseng
The ParaScope project is developing an integrated collection of tools to help scientific programmers implement correct and efficient parallel programs. The centerpiece of this collection is the...
Interprocedural Transformations for Parallel Code Generation (1991)
We present a new approach that enables compiler optimization of procedure calls and loop nests containing procedure calls. We introduce two interprocedural transformations that move loops across...
Compiler Optimizations for Fortran D on MIMD Distributed-Memory Machines (1991)
Seema Hiranandani, Seema Hiranandani, Ken Kenney, Ken Kennedy, Chau-wen Tseng, Chau-wen Tseng
Massively parallel MIMD distributed-memory machines can provide enormous computation power. However, the difficulty of developing parallel programs for these machines has limited their accessibility....
Interprocedural Transformations for Parallel Code Generation (1991)
We present a new approach that enables compiler optimization of procedure calls and loop nests containing procedure calls. We introduce two interprocedural transformations that move loops across...
An Overview of the Fortran D Programming System (1991)
Seema Hiranandani, Seema Hiranandani, Ken Kenney, Ken Kennedy, Charles Koelbel, Charles Koelbel, ...
The success of large-scale parallel architectures is limited by the difficulty of developing machineindependent parallel programs. We have developed Fortran D, a version of Fortran extended with data...
Interprocedural Transformations for Parallel Code Generation (1991)
Mary Hall, Kathryn Mckinley, Mary W. Hall, Ken Kennedy, Ken Kennedy
We present a new approach that enables compiler optimization of procedure calls and loop nests containing procedure calls. We introduce two interprocedural transformations that move loops across...
Fortran D language specification (1990)
Charles Koelbel, Ulrich Kremer, Chau-wen Tseng, Min-you Wu, Geoffrey Fox, Geoffrey Fox, ...
Koelbel y
Improving register allocation for subscripted variables (1990)
David Callahan, Steve Carr, Ken Kennedy
Most conventional compilers fail to allocate array elements to registers because standard data-flow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of...
Improving Register Allocation for Subscripted Variables (1990)
David Callahan, Steve Carr, Ken Kennedy
Most conventional compilers fail to allocate array elements to registers because standard data-flow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of...
Analysis of Event Synchronization in A Parallel Programming Tool (1990)
David Callahan, Ken Kennedy, Jaspal Subhlok
Understanding synchronization is important for a parallel programming tool that uses dependence analysis as the basis for advising programmers on the correctness of parallel constructs. This paper...
Loop Distribution with Arbitrary Control Flow (1990)
Ken Kennedy, Kathryn S. Mckinley
Loop distribution is an integral part of transforming a sequential program into a parallel one. It is used extensively in parallelization, vectorization, and memory management. For loops with control...
Loop Distribution with Arbitrary Control Flow (1990)
Kathryn Mckinley, Ken Kennedy, Ken Kennedy, Kathryn S. Mckinley
Loop distribution is an integral part of transforming a sequential program into a parallel one. It is used extensively in parallelization, vectorization, and memory management. For loops with control...
Blocking linear algebra codes for memory hierarchies (1989)
Abstract. Because computation speed and memory size are both increasing, the latency of memory, in basic machine cycles, is also increasing. As a result, recent compiler research has focused on...
Estimating interlock and improving balance for pipelined machines (1988)
David Callahan, John Cocke, Ken Kennedy
Pipelining is now a standard technique for increasing the speed of computers, particularly for oating-point arithmetic. Single-chip, pipelined oating-point functional units are available as \o the...
Automatic translation of Fortran programs to vector form (1987)
The recent success of vector computers such as the Cray-1 and array processors such as those manufactured by Floating Point Systems has increased interest in making vector operations available to the...
Why Performance Models Matter for Grid Computing (1970)
Global heterogeneous computing, often referred to as the Grid [5, 6], is a popular emerging computing model in which high performance computers linked by high-speed networks are used to solve...
A Rice University perspective on software engineering licensing (0000)
This article focuses on professors\' perspective of Rice university, Texas, on software engineering licensing. Professors including the author were worried that licensing might have exactly the...
A Nationwide Parallel Computing Environment. (0000)
States that the move to large-scale parellism has been the greatest gain in computing performance since 1989 by the National Science Foundation (NSF). Information on the use of shared memory Parallel...
A Rice University perspective on software engineering licensing
This article focuses on professors' perspective of Rice university, Texas, on software engineering licensing. Professors including the author were worried that licensing might have exactly the...
A Nationwide Parallel Computing Environment.
States that the move to large-scale parellism has been the greatest gain in computing performance since 1989 by the National Science Foundation (NSF). Information on the use of shared memory Parallel...
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse
Reusing data in cache is critical to achieving high performance on modern machines because it reduces the impact of the latency and bandwidth limitations of direct memory access. To date, most...
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse
Chen Ding, Chen Ding, Ken Kennedy, Ken Kennedy
Reusing data in cache is critical to achieving high performance on modern machines, because it reduces the impact of the latency and bandwidth limitations of direct memory access. To date, most...