This is an archived page of the 2004 conference

abstracts

Plenary Presentations, Cluster Health, Data Rates , Applications Track, Systems Track, and Vendor Presentations Abstracts

Last updated: 23 April 2004See also, Tutorials.


Plenary Presentations

Plenary Session I

Title

 

Title

Author(s)

 

Brian Ropers-Huilman

Author Inst

 

Louisiana State University. USA

Presenter

 

Brian Ropers-Huilman

Abstract


Available soon.

 

 

Plenary Session II

Title

 

Title

Author(s)

 

David Jursik

Author Inst

 

IBM Worldwide Deep Computing Sales

Presenter

 

David Jursik

Abstract


Available soon.

 

 

Plenary Session III

Title

 

Title

Author(s)

 

Dr. Reza Rooholamini

Author Inst

 

Dell Product Group

Presenter

 

Dr. Reza Rooholamini

Abstract


Available soon .

 

 

Cluster Health

Title


A Failure Predictive and Policy-Based High Availability Strategy for Linux High Performance Computing Cluster

Author(s)

Chokchai Leangsuksun1, Tong Liu11, Tirumala Rao11, Stephen L. Scott2, and Richard Libby3

Author Inst.

1Louisiana Tech University, 2Oak Ridge National Laboratory, 3Intel Corporation

Presenter

Chokchai Leangsuksun

Abstract

The Open Source Cluster Application Resources (OSCAR) is a fully integrated cluster software stack designed for building and maintaining a Linux Beowulf cluster. As OSCAR has become a popular tool for building the cost-effective HPC cluster, undoubtedly, High Availability (HA) will equally be an important aspect that enables HPC systems, as clearly an unavailable cluster equals no performance. To embrace both HA and HPC features, we created the HA-OSCAR solution, which eliminates the numerous single-point-of -failure in HPC systems and alleviates unplanned downtime through sophisticated self-healing mechanisms and hardware-level failure detection and prediction based on the Service Availability Forum's Hardware Platform Interface (OpenHPI). Service monitoring and policy-based head node during recovery is also discussed in detail. Furthermore, we investigate a network file-system issue during server failure and resolution via the High Reliable Network File System (HR-NFS), without the need for an expensive hardware-based, shared-storage solution. Furthermore, our solution enables a graceful recovery with a deliberate job checkpointing and migration upon head node failure prediction. Finally, we introduce our Web-based management module that provides a customizable service monitoring, recovery/failover management mechanism with an effective cluster monitoring ability.



Title

Listening to Your Cluster with LoGS

Author(s)

James E. Prewett

Author Inst.

HPC@UNM

Presenter

Jim Prewett

Abstract

Introduction
Large systems are now being built from smaller systems. GNU/Linux clusters have gone from a fad at a couple of academic institutions to being some of the largest and fastest computers in the world. Similarly designed systems have become the bread and butter of more-traditional supercomputing vendors. Cluster computing is now big business. All of the computers listed in the TOP500 List for November 2003 are clusters! Further, smaller clusters are now being purchased (and often administered) by individual research groups; clusters are accessible computing power.

One way to make the administration of these machines bearable is to carefully analyze system logs with a real-time analysis tool. Unfortunately, the available free-software tools are lacking in many ways when events must be correlated across a potentially very large number of machines. Another issue with log analysis for larger clusters is that the volume of log data can be quite large. Free tools such as Logsurfer and SWATCH can be very inefficient when finding interesting messages due to the organization of their ruleset.

LoGS is a log analysis engine that attempts to address many of the issues with maintaining cluster machines. LoGS has a dynamic ruleset, is able to look for one or more messages before triggering an action and has a powerful programming language use for configuration and extension. With proper rule-set construction, LoGS is a very efficient analysis engine.


Data Rates

Title

A NIC-Offload Implementation of Portals for Quadrics QsNet

Author(s)

Kevin Pedretti, Ron Brightwell

Author Inst

Sandia National Laboratories, USA

Presenter

Kevin Pedretti

Abstract

The Portals data movement layer was specifically designed to support intelligent and/or programmable network interface cards, such as Quadrics QsNet. Portals provides elementary building blocks that can be combined to implement a variety of upper layer protocols. As such, it is general enough to support many different types of services that require data movement, such as MPI and parallel file systems. While the QsNet interface and its associated software stack were also designed to support a variety of upper-layer protocols, there are significant differences in the approach taken to achieve generality. In this paper, we analyze the different capabilities offered by Portals and the QsNet network stack. We discuss the design and implementation of Portals for QsNet and present a performance comparison using micro-benchmarks. We analyze how the different approaches have impacted performance and discuss how future intelligent network interface may be able to overcome some of the current limitations.



Title

Benchmarking Parallel I/O Performance for Computational Fluid Dynamics Applications

Author

Thomas Hauser

Author Inst

Utah State University, USA

Presenter

Thomas Hauser

Abstract

Available soon.



Applications Track Abstracts:

Applications Papers I

Title

Dynamic Load-Balancing Algorithm Porting on MIMD Machines

Author(s)

Francisco Muniz

Author Inst

CDTN/CNEN, Brazil

Presenter

Francisco Muniz

Abstract

This paper describes the porting strategies and the implementation of a dynamic load-balancing mechanism over the PVM library. Such a load-balancing mechanism, the Extended Gradient approach, is found in the open literature. The implementation was done using the 'C' programming language, running over Linux/X86 compute nodes. Some results that validate the usefulness of the load-balancing system are presented. The conclusions are general and not restricted to a any particular architecture of distributed-memory MIMD (Multiple Instruction, Multiple Data) machines.



Title

Optimizing Linux Cluster Performance by Exploring the Correlation between Application Characteristics and Gigabit Ethernet Device Parameters

Author(s)

Onur Celebioglu, Tau Leng, Victor Mashayekhi

Author Inst.

Dell Inc., USA

Presenter

Onur Celebioglu

Abstract

Cluster interconnect performance is typically characterized by latency and throughput. However, not only latency and throughput but also the CPU utilization of an interconnect are important attributes that affect overall system performance. In our studies, we have run cluster benchmarks with two device drivers with different throughput and latency characteristics. We have observed that point-to-point performance tests such as throughput and latency cannot be translated directly into application performance. We also tried to further tune the performance of the system by changing the interrupt coalescing parameters one of the drivers. Finally, we used this data to understand the correlation between an application's characteristics and interconnect performance attributes.



Title

Performance Analysis of a HybridParallel Linear Algebra Kernel

Author(s)

Sue Goudy, Lorie Liebrock, and Steve Schaffer

Author Inst.

New Mexico Institute of Mining and Technology, USA

Presenter

Sue Goudy

Abstract

The focus of this paper is the performance of a kernel from a two-dimensional iterative solver. Complexity models for hybrid parallelization of block Gauss-Seidel relaxation are derived. We examine system parameters that can affect the performance of hybrid code. Complexity estimates are tested for a variety of decomposition strategies and problem sizes. Results from the Intel Teraflops supercomputer and from the Vplant visualization cluster at Sandia National Laboratories are presented. We show that the benefits oaf hybrid programming for this iterative solver are limited, on both the massively parallel system and the Linux cluster.



Applications Papers II: Education and Training

Title

In Search of Clusters for HIgh-Performance Computing Education

Author(s)

Paul Gray

Author Inst

University of Northern Iowa, USA

Presenter

Paul Gray

Abstract

Available soon.



Title

Classroom Exercises for Grid Services

Author(s)

Amy Apon1, Jens Mache2, Yuriko Yara1, and Kurt Landrus1

Author Inst

1University of Arkansas and 2Lewis & Clark College, USA

Presenter

Any Apon

Abstract

Grid protocols and technologies are being adopted in a wide variety of academic, government, and industry research laboratories, and there is a growing body of research-oriented literature in Grid computing. However, there is a need for educational material that is suitable for classroom use. The goal of this paper is to develop and evaluate a suite of classroom exercise for use in a graduate or advanced undergraduate course. The exercises build on basic knowledge of operating systems concepts at the undergraduate level. This paper presents our design of the exercises. We evaluate the effectiveness of one exercise extensively and provide suggestions to educators about how to effectively use the Globus Toolkit 3 in a classroom setting.



Title

Automating the Large-Scale Collection an Analysis of Performance Data on Linux Clusters

Author(s)

Philip Mucci1, Jack Dongarra1, Shirley Moore1, Fengguang Song1, Felix Wolf1, and Rick Kufrin2

Author Inst

1University of Tennessee and 2NCSA/University of Illinois, USA

Presenter

Rick Kufrin

Abstract

Introduction
Many factors contribute to overall application performance in today's high-performance cluster computing environments. These factors include the memory subsystem, network hardware and software stack, compilers and libraries, and I/O subsystem. The large variability in hardware and software configuration present in clusters can cause application performance to also exhibit large variability on different platforms or on the same platform over time. Compute-intensive applications may perform well on an architecture with efficient utilization of CPU and single-processor memory, such as the Intel Xeon, while memory-intensive applications may perform well on and architecture with good scalability of the memory subsystem, such as the AMD Opteron node. Even with a fixed hardware configuration, software factors can cause large variations in performance. Compilers that produce acceptable code on some platform configurations may produce sub-optimal code on other platform variants. Some math libraries require hand tuning of various complied-in parameters, variant of the same platform. Some libraries (e.g., BLAS, LAPACK) have standardized APIs that are shared across different implementations that can have considerable variations on performance. It can be difficult to predict which library variant will perform best on a particular platform without twisting each variant on the platform. If an application is updated and/or port to a platform originally not supported, the optimization flags in the application Makefile may be anachronistic or otherwise inappropriate and may need to be altered to achieve acceptable performance on new target platforms and platform variants.



Systems Track Abstracts:

Systems Papers I

Title

Unified Heterogeneous HPCC Hardware Management Framework

Author

Yung-Chin Fang, Jeffrey Mayerson, Rizwan Ali, Monica Kashyap, Jenwei Hsieh, Tau Leng Victor Mashayeckhi

Author Inst

Dell Inc., USA

Presenter

Yung-Chin Fang

Abstract

The remote, hardware-level management of heterogeneous clusters (such as the remote power cycling of a hung node) is a necessary task for a computer center. This task requires knowledge across multiple specifications, fabrics (hardware, firmware, software, management) and implementations. For a heterogeneous cluster environment, there is little in common across hardware-level management interface implementations. In a heterogeneous HPCC, grid or cyber-infrastructure environment. there is need to have a common hardware-management interface across unique architecture, platform, firmware, software and management fabric implementations. This paper presents the framework of a unified interface across heterogeneous clusters to overcome these differences. This paper also addresses certain findings in the prototyping process.



Title

Cluster Security as a Unique Problem with Emerent Properities: Issues and Techniques

Author

William Yurcik, Gregory A. Koenig, Xin Meng, and Joseph Greenseid

Author Inst

NCSA/University of Illinois, USA

Presenter

Joseph Greenseid

Abstract

Large-scale commodity cluster systems are finding increasing deployment in academic, research, and commercial settings. Coupled with this increasing popularity are concerns regarding the security of these clusters. While an individual commodity machine may have prescribed best practices for security, a cluster of commodity machines has emergent security properties that are unique from the sum of its parts. This concept has not yet been addressed in either cluster administration techniques or the research literature. We highlight the emergent properties of cluster security that distinguish it as a unique problem space and then outline a unified framework for protection techniques. We conclude with a description of preliminary progress on a monitoring project focused specifically on cluster security that we have started at the National Center for Supercomputing Applications.



Title

Batch System Deployment on a Production Terascale Cluster

Author

karl W. Schulz, Kent Milfeld, Chona S. Guiang, Avijit Purkayastha, Tommy Minyard, John R. Boisseau, and John Casu

Author Inst

TACC/University of Texas-Austin, USA

Presenter

Karl Schulz

Abstract

On multi-user HPC clusters, the batch system is a key component for aggregating compute nodes into a single, sharable computing resource. The batch system becomes the "nerve center" for coordinating the use of resources and controlling the state of the system in a way the must be "fair" to its users. Large, multi-user clusters need batch utilities that are robust, reliable, flexible, and easy to use and administer. In this paper we present our experiences with the configuration and deployment of a terascale cluster of 600 processors, with particular attention given to the integration of the LSF HPC batch system software by Platform Computing. To begin, we review the cluster design and present our requirements for a production batch environment supporting a community of hundreds of users. Next, we outline the configuration and extensions made to the LSF batch system and operating environment to meet our design criteria, including the development of job-monitoring and job-filtering application, authentication modifications to manage compute node access, and integration of the system with internal accounting applications. Initial scalability results using LSF for MPI applications are presented and compared against modified versions of the LSF application suite. The modified version incurred substantially lower overhead and provided good scalability on MPI applications up to 600 processors. Implementation of software updates as RPM packages, the use of modules for environment management, and the development of tools for monitoring compute-node software states have helped to insure a consistent, system-wide environment of user jobs across node failures and system reboots.



Systems Papers II: Processor and File System Performance

Title

Performance Characteristics of Dual-Processor HPC Cluster Nodes Based on 64-bit Commodity Processors

Author

A. Purkayastha, C.S. Guiang, K. Schulz, T. Minyard, K. Milfeld, W. Barth, P. Hurley, and J.R. Boisseau

Author Inst

TACC/University of Texas, USA

Presenter

Chona S. Guiang

Abstract

Dual-processor nodes are the preferred building blocks in HPC clusters because of the greater performance-to-price ratio of such configurations relative to clusters comprising single-processor nodes. The arrival of 64-bit commodity clusters for HPC is advantageous for applications that require large amounts of memory and I/O because of the larger memory addressability of these processors. Some of these 64-bit processors also use more advanced memory subsystems, which provide increased performance for some applications. This paper examines the overall performance characteristics of three dual-processor systems based on commodity 64-bit processors: Intel Itanium2, AMD Opteron and IBM PowerPC 970, also know as the Apple PowerPC G5. First, a low-level characterization of each system is obtained using a variety of computational kernels and micro-benchmarks to measure the speeds of the functional units and memory subsystems. Performance measurements and analysis of several scientific applications that span a wide range of computational requirements are presented next. Finally, we offer some general observations and insights on performance for applications developers and discuss 32- to 64-bit migration and interoperability issues.



Title

An Analysis of State-of-the-Art Parallel File System for Linux

Author

Martin W. Margo, Patricia A. Kovatch, Phil Andews, Bryan Banister

Author Inst

SDSC/University of California - San Diego, USA

Presenter

Martin W. Margo

Abstract

Parallel file systems are a critical piece of any Input/Output (/O)-intensive high-performance computing system. A parallel file system enables each process on every node to perform I/O to and from a common storage target. With more and more sites adopting Linux clusters for high-performance computing, the need for high-performing I/O on Linux is increasing. New options are available for Linux: IBM's GPFS (General Parallel File System) and Cluster File Systems, Inc.'s Lustre. Parallel Virtual File System (PVFS) from Clemson University and Argonne National Laboratories continues to be available. Using our IA-64 Linux cluster testbed, we evaluated each parallel file system on its ease of installation and administration, redundancy, performance, scalability and special features. We analyzed the results of our experiences and concluded with comparison information.



Title

Comparing Linus Clusters for teh Community Climate System Model

Author

Matthew Woitaszek, Michael Oberg, and Henry M. Tufo

Author Inst

University of Colorado - Boulder, USA

Presenter

Matthew Woitaszek

Abstract

In this paper, we examine the performance of two components of the NCAR Community Climate System Model (CCSM), executing on clusters with a variety of microprocessor architectures and interconnects. Specifically, we examine the execution time and scalability of the Community Atmospheric Model (CAM) and the Parallel Ocean Program (POP) on Linux clusters with Intel Xeon and AMD Opteron processors, using Dolphin, Myrinet, and Infiniband interconnects, and compare the performance of the cluster systems to an SGI Altix and IBM p690 supercomputer. Of the architectures examined, clusters constructed using AMD Opteron processors generally demonstrate the best performance, outperforming Xeon clusters nd occasionally an IBM p6690 supercomputer in simulated years per day.



Vendor Presentations

Vendor Session I

Title

The File System Challenge in HPC

Author(s)

Ben Rosen

Author Inst

Dell Inc., USA

Presenter

Ben Rosen

Abstract

Available soon.



Title

Smart Interconnect: Recent Developments in Myricom Hardware and Software

Author(s)

Patrick Geoffray

Author Inst

Myricom, USA

Presenter

Patrick Geoffray

Abstract

Available soon.



Vendor Session II

Title

Clusteriing Solutions from IBM

Author(s)

Rebecca Austen and Jay Urbanski

Author Inst

IBM, USA

Presenter

Rebecca Austen and Jay Urbanski

Abstract

Available soon.



Title

Experiences with Large Production Clusters at Sandia

Author(s)

Robert Ballance

Author Inst

Sandia National Laboratories, USA

Presenter

Robert Ballance

Abstract

Available soon.