CATWALK: A Quick Development Path for Performance Models

Project Start: January 2013

Motivation

Today, performance tuning occurs rather unsystematically. Application developers measure the performance of their codes in experiments and spend a certain amount of time improving the most resource-intensive parts, typically aiming at the low-hanging fruits, either until they are satisfied or until they believe that further improvements would exceed their capacity or time budget. However, in view of the rising costs of procuring and operating large-scale systems, it is important to set performance expectations against which the achieved performance can be compared, for example, to reliably identify performance bugs either in the application or in the system. Such expectations are defined in the form of analytical performance models, which specify execution time or other resource requirements (e.g., the number of floating point operations and memory accesses) as a function of various input parameters such as number of cores, problem size, and system properties. In spite of its power, performance modeling is unfortunately not yet widely adopted. First, it requires significant expertise, and second, even if the necessary expertise is available, it is usually a very time-consuming process. The goal of this project is therefore to incorporate some of the expertise into a performance tool and to automate key modeling activities.

Approach

We plan to extend the performance analysis tool Scalasca and add capabilities for the creation, application, and evolution of performance models. We will favor simplicity over accuracy in absolute terms. The most important goal will be to give a good estimation of relative performance between different parts of a program when scaled to larger processor configurations. In this way, scalability bottlenecks can be anticipated at a very early stage long before they become manifest in actual measurements on future platforms.

Most of the new functionality will be embedded in the Cube framework, which defines a data model and a set of operations to represent and manipulate profiling data. Cube describes a performance experiment as a mapping of call paths and processes onto a set of metrics. The name Cube is derived from the three-dimensional array used to store a single data set. Among the users of Scalasca, Cube is mainly known as the browser employed to visualize and inactively explore individual data sets. The operations are closed in the sense that each operator yields a virtual experiment, which can be further manipulated or displayed using the same set tools available for original experiments. This is why we refer to them in the literature as performance algebra. An example is the difference operator, which “subtracts” one experiment from another to highlight performance improvements or degradations.

In this project, we plan to define a new algebra data type to represent performance models. Essentially, a model is like an experiment, except that individual metric values are replaced by functions. We plan to support both semi-empirical models as well as application-requirements models. Additional operations will allow models to be derived from experiments and predictions derived from models, covering but not limited to the following functionality:

• Guessing model base functions by trying typical candidates and exploiting knowledge of communication patterns

• Model parameter identification via curve fitting

• Model reduction through elimination of low-impact components

• Model validation and quality assessment

• Prediction of performance

• Identification of scalability bottlenecks

• Identification of performance bugs

In addition to expressing common modeling activities as operators that map experiments onto models and vice versa, we also plan to codify the modeling process in the form of a modeling wizard that will guide the user through the different steps, offering appropriate choices where helpful.

The major advantage of our modeling algebra over existing modeling or prediction tools be twofold: Combining a limited set of data types with a range of relatively closed operations offers an unprecedented degree of flexibility in customizing performance models for a particular purpose. Second, our framework will be based on profiling data instead of trace data to extrapolate performance. Disadvantages in terms of accuracy will be outweighed by the ease at which profiles can be obtained – especially when the application runs for a longer period of time – and by the lack of too many restricting assumptions. Nevertheless, since Scalasca provides the results of trace analyses such as wait-state detection in the form of profiles, our extrapolations can still take advantage of performance detail captured in traces if available.

Further details, progress reports and more information can be found a the CATWALK projects web-page:www.vi-hps.org/projects/catwalk

Partners

Prof. Dr. Felix Wolf (Spokesman) Laboratory for Parallel Programming; German Research School for Simulation Sciences GmbH
Prof. Dr. Torsten Hoefler Performance-Oriented Parallel Programming Group; Swiss Federal Institute of Technology
Dr.-Ing. Bernd Mohr; Jülich Supercomputing Centre Forschungszentrum Jülich GmbH
Prof. Dr. Gabriel Wittum Goethe Center for Scientific Computing; Goethe-Universität Frankfurt
Prof. Dr. Christian Bischof Scientific Computing; Technische Universität Darmstadt