This is an archived page of the 2004 conference

Tutorials

Application Performance Analysis Tools for Linux Clusters; LCI Applications Tutorial: IA32, IA64 and Opteron Architecture and Programming Techniques for the Performance Programmer; Clustermatic: An Innovative Approach to Simplified Cluster Computing

Last updated: 23 April 2004


A-I

Title

 

Application Performance Analysis Tools for Linux Clusters

Presenters

 

Shirley Moore, Felix Wolf, Rick Kufrin, Philip J. Mucci

Overview

 

Many factors contribute to overall application performance in today's high-performance cluster computing environments. These factors include the memory sub-system, network hardware and software stack, compilers and libraries, and I/O sub-system. Tools are needed that enable application developers to easily collect and analyze performance data and to use the data to optimize performance on clusters. The majority of application scientists do not have the time nor the inclination to make extensive changes to their source code in order to collect performance data. Furthermore, analyzing large amounts of performance data can be a daunting task, and pinpointing specific performance problems that will benefit most from hand tuning can be like looking for a needle in a haystack. Determining the cause of a performance problem and how to fix it often requires specialized knowledge of the architecture and its interaction with the compiler and runtime system.  Automated analysis of performance data can help reduce the dimensionality of the performance metric space, identify points in the space that indicate performance problems, and map those points onto locations in the source code.

Outline

 

This tutorial will introduce the following tools that address the above issues:

  • The PerfSuite collection of easy-to-use tools, utilities, and libraries for performance analysis on Linux clusters
  • The Dynaprof tool for inserting performance measurement instrumentation directly into a running application's address space at run time, and
  • The CUBE display tool (in combination with the KOJAK trace analyzer) for interactive exploration of a multidimensional performance space based on a processor-node-cluster hierarchy

Schedule


8:30-9:00   Overview of cluster performance issues and types of
                 performance data
9:00-10:00   PerfSuite
                 -Downloading and installing
                 -Collection of hardware counter data using psrun
                 -Postprocessing of performance data using psprocess
                 -Statistical profiling experiments
                 -Application example
10:00-10:30   Break
10:30-11:00   Dynaprof
                 -Downloading and installing
                 -Dynamic instrumentation
                 -Instrumenting multi-threaded and multi-process programs
                 -Dynaprof probes
                 -Application example
11:00-11:45   CUBE / KOJAK
                 -Downloading and installing
                 -Data collection for CUBE with KOJAK
                 -Interactive exploration of a multidimensional performance
                  space using the CUBE display
                 -Application example
11:45-12:00   Discussion and wrap up

 

 

A-II

Title

 

LCI Applications Tutorial: IA32, IA64 and Opteron Architecture and Programming Techniques for the Performance Programmer

Presenter(s)

 

John Towns and David Klepacki

Overview

 

 

Outline

 

 

Schedule


 

 

 

S-I

Title

 

Clustermatic: An Innovative Approach to Simplified Cluster Computing

Presenters

 

Gregory R. Watson, Ronald Minnich, Erik A. Hendriks, and Matthew J. Sottile

Overview

 

Clustermatic is an innovative software architecture that redefines cluster computing at all levels: from the BIOS to the parallel environment. Other cluster systems typically rely on a complicated software suite that is layered on top of a conventional operating system that must be installed on a local disk in every node. The complexity and size of these systems tends to limit their deployment to small-to-mid size machines, reduces reliability, and requires a significant management overhead for normal administrative activities.

In contrast, the Clustermatic design maximizes performance and availability by achieving significant improvements in booting and application startup times, minimizing points of failure and vastly simplifying management and administration activities. It is suitable for use on a wide range of architectures, and has been successfully deployed on tiny clusters containing only 2 diskless nodes all that way up to an 1108 node, 11 TFlop cluster at Los Alamos National Laboratory.

Key components of Clustermatic include LinuxBIOS, BProc, BJS, MPICH and Linux.

This tutorial aims to introduce participants to the Clustermatic architecture, while providing hands-on experience in installing, managing and using a real cluster. The tutorial will combine detailed technical information about the design and operation of Clustermatic software with practical examples of how to deploy Clustermatic on a typical cluster system. Our tutorial format is designed to maximize the hands-on time for participants by giving each attendee the ability to undertake the activities using a real cluster system.


Outline

 

  1. Introduction to the Clustermatic Architecture
    • Short overview of Clustermatic
    • Key benefits of Clustermatic over traditional architectures
    • Key deliverables of the tutorial
  2. LinuxBIOS
    • Introduction to LinuxBIOS and why it is important
    • Supported chipsets and architectures
    • Who is using it?
    • HANDS-ON: Configuring and compiling LinuxBIOS for your system; Simulated installation of LinuxBIOS in flash
  3. BProc
    • Introduction to BProc
    • Key components of BProc
    • Prerequisites for installing BProc
    • HANDS-ON: Installing BProc software on cluster; Configuring BProc; Problem solving BProc issues
  4. Filesystems
    • Introduction to Clustermatic filesystem options
    • Parallel filesystems with Clustermatic
    • HANDS-ON: Configuring and using NFS and V9FS with Clustermatic
  5. BJS
    • Introduction to job scheduling and BJS
    • BJS scheduling policy support
    • Implementing your own policies
    • HANDS-ON: Configuring default policies with BJS; Configuring Maui policies with BJS
  6. System Management and Supermon
    • Introduction to system management and monitoring with Clustermatic
    • HANDS-ON: Techniques for managing Clustermatic clusters; Advanced management techniques using Supermon
  7. MPI Application Support
    • Introduction to the use of MPI on a Clustermatic System
    • HANDS-ON: Porting and running MPI applications to run under Clustermatic

Schedule


8:30-8:35    Tutorial Introduction
8:35-9:00    Clustermatic Overview
9:00-10:00    BProc & Beoboot
10:00-10:30    Coffee Break
10:30-11:00    BProc & Beoboot (continued)
11:00-12:00    LinuxBIOS
12:00-1:00    Lunch
1:00-2:00    Filesystems
2:00-2:30    BJS
2:30-3:00    Supermon
3:00-3:30    Coffee Break
3:30-4:55    Programming & debugging with MPI
4:55-5:00    Feedback