This is an archived page of the 2005 conference

abstracts

2005 Abstracts:Plenary Presentations, Applications Track, Systems Track, and Vendor Presentations

Last updated: 20 April 2005



Plenary Presentations

Plenary I

Title

 

Opportunities and Challenges in High-End Computing for Science and Engineering

Author(s)

 

Thom H. Dunning, Jr.

Author Inst

 

NCSA/University of Illinois Urbana-Champaign

Presenter

 

Thom H. Dunning, Jr.

Abstract


Computational modeling and simulation were among the most significant developments in the practice of scientific inquiry in the 20th century. They were significant contributions to scientific and engineering research programs, finding increasing use in a broad range of industrial applications. We are presently in the midst of a revolution in computing technologies with an order-of-magnitude increase in computing capability every three to five years. The most pressing question now is: What must scientists and engineers do to harness the power of high-end computing and information technologies to solve the most critical problems in science and engineering? Realizing the scientific and engineering advances promised by the revolution in computing technologies will require a holistic approach to computational science and engineering. It will require advances in the theoretical and mathematical sciences leading to computational models of ever increasing predictive power and fidelity.  It  will require close collaboration among computational scientists, computer scientists, and applied mathematicians to translate these advances into scientific and engineering applications that can realize the full potential of high-end computers.  It will require educating a new generation of scientists and engineers to use computational modeling and simulation to address the challenging problems that they will face in the 21st century. In this presentation, we will briefly discuss all of these issues.

 

 

Plenary II

Title

 

Transforming the Sensing and Prediction of Intense Local Weather Through Dynamic Adaptation: People and Technologies Interacting with the Atmosphere

Author(s)

 

Kelvin Droegemeier

Author Inst

 

University of Oklahoma

Presenter

 

Kelvin Droegemeier

Abstract


Each year across the United States, floods, tornadoes, hail, strong winds, lightning, and winter storms—so-called mesoscale weather events—cause hundreds of deaths, routinely disrupt transportation and commerce, and result in annual economic losses greater than $13B. Although mitigating the impacts of such events would yield enormous economic and societal benefits, the ability to do so is stifled by rigid IT frameworks that cannot accommodate the real time, on-demand, and dynamically adaptive needs of mesoscale weather research; its disparate, high-volume data sets and streams; and the tremendous computational demands of its numerical models and data assimilation systems.

This presentation describes a major paradigm shift now under way in the field of meteorology—away from today's environment in which remote sensing systems, atmospheric prediction models, and hazardous weather detection systems operate in fixed configurations, and on fixed schedules largely independent of weather—to one in which they can change their configuration dynamically in response to the evolving weather. This transformation involves the creation of Grid-enabled systems that can operate on demand and obtain the needed computing, networking, and storage resources with little prior planning, as weather and end-user needs dictate. In addition to describing the research and technology development being performed to establish this capability, I discuss the associated economic and societal implications of dynamically adaptive weather systems and the manner in which this new paradigm can serve as an underpinning for future cyberinfrastructure development.

 

 

Plenary III

Title

 

Life, Liberty and the Pursuit of Larger Clusters

Author(s)

 

Mark Seager

Author Inst

 

Lawrence Livermore National Laboratory

Presenter

 

Mark Seager

Abstract


The state of the art in Linux clusters is O(1K) nodes. Now many institutions are wondering if even larger clusters can be built. Also, many cluster interconnects can practically scale to O(4K) to O(6K). In this talk, we offer some thoughts on the current limitations in Linux clusters and their impact on scaling up. We also offer some thoughts on what topics Linux cluster developers should focus on in order to enable much larger clusters.

 

 

Applications Track Abstracts:

Applications Papers I:

Title


Performance Metrics for Ocean and Air Quality Models on Commodity Linux Platforms

Author(s)

George Delic

Author Inst.

HiPERiSM Consulting

Presenter

George Delic

Abstract

Introduction
This a report on a project to evaluate industry standard fortran 90/95 compilers for IA-32 Linux commodity platforms when applied to Air Quality Models (AQM). The goal is to determine the optimal performance and workload though-put achievable with commodity hardware. Only a few AQM's have been successfully converted to OpenMP (CAMs), or MPI (CMAQ) and considerable work remains to be done on others. In exploring the potential for parallelism it has been interesting to discover the problems with serial performance on several AQM codes. For this reason we have searched for more precise metrics of performance as an aid to measuring progress in performance enhancement. The historical analogy is the programming environment on Cray architectures which enable the development of performance attributes for either individual codes or workloads using hardware performance counters. Since commodity processors also have performance counters, software interfaces such as PAPI, may be used to read them.

This study applied the PAPI library in understanding what delivered performance is for two AQM's. ISCST3 and AERMOD and what the optimal achievable performance can be . For the latter, as a base line, two Ocean models with good vector character have been included. These are used to measure the optimal performance to be expected on commodity hardware with available compiler technology.

In addition to performance metrics (as derived from hardware performance counter values) some comments on I?O storage performance are also included because of the special character of I?o requirements in AQMs.



Title

Cluster Computing Through an Application-Oriented Computational Chemistry Grid

Author(s)

Kent Milfeld

Author Inst.

TACC/University of Texas at Austin

Presenter

Kent Milfeld

Abstract

Over the last 20 years, personal computers and networking infrastructures have greatly enhanced the working environment and communication of researchers. Also, Linux clusters now flourish, providing significant resources for executing parallel applications. But there is still a gap between desktop environments of individuals and the wide assortment of Unix-flavored HPC system environments. Grid technologies are delivering tools to bridge this gap, especially between HPC systems; but the difficulty of implementing the infrastructure software (installation, configuration, etc.) has discouraged adaptation of
grid software at the desktop level. Hence, users who employ long-running parallel applications in their research still log into a grid-enabled machine to submit batch jobs and manipulate data within a grid. An infrastructure model adapted by the Computational Chemistry Grid (CCG)1 eliminates dependence on grid software at the desktop, is based on the need to run chemistry applications on HPC systems, and uses a “client” interface for job submission. A middleware server with grid software components is employed to handle the deployment and scheduling of jobs and resource management transparently. This same infrastructure can be used to implement other client/server paradigms requiring pre- and post-processing of application data on the desktop, and application execution on large high-performance computing (HPC) systems, as well as small departmental (Linux) clusters. This paper describes the structure and implementation of the CCG infrastructure and discusses its adaptation to other client/server application needs.



Title

A Resource Management System for Adaptive Parallel Applications in Cluster Environments

Author(s)

Sheikh Ghafoor

Author Inst.

Mississippi State University

Presenter

Sheikh Ghafoor

Abstract

Adaptive parallel applications that can change resources during execution, promise better system utilization, increased application performance, and furthermore, they open the opportunity for developing a new class of parallel applications driven by unpredictable data and events, capable of amassing huge resources on demand. This paper discusses requirements for a resource management system to support such applications including communication and negotiation of resources. To schedule adaptive applications, interaction between the applications and the resource management system is necessary. While managing adaptive applications is a multidimensional complex research problem, this paper focuses only on support that a RMS requires to accommodate adaptive applications. An early prototype implementation shows that scheduling of adaptive applications is possible in a cluster environment and the overhead of management of applications is low compared to the long running time of typical parallel applications. The prototype implementation supports a variety of adaptive parallel applications in addition to rigid parallel applications.

 

Applications Papers II:

Title

Large Scale Simulations in Nanostructures with NEMO3-D on Linux Clusters

Author(s)

Marek Korkusinski, Faisal Saied, Haiying Xu, Seungwon Lee, Mohamed Sayeed, Sebastien Goasguen, and Gerhard Klimeck

Author Inst

Purdue University, USA

Presenter

Faisal Saied

Abstract

NEMO3D is a quantum mechanical based simulation tool created to provide quantitative predictions for nanometer-scaled semiconductor devices. NEMO3D computes strain field using an atomistic valence force field method and electronic quantum states using an atomistic tight-binding Hamiltonian. Target applications for NEMO3D include semiconductor quantum dots and semiconductor quantum wires. The atomistic nature of the model, and the need to go to 100 million atoms and beyond, make this code computationally very demanding. High performance computing platforms, including large Linux clusters, are indispensable for this research.

The key features of NEMO3D, including the underlying physics model, the application domains, the algorithms and parallelization have been described in detail previously [1-3],. NEMO3D has been developed with Linux clusters in mind, and has been ported to a number of other HPC platforms. Also, a sophisticated graphical user interface is under development for NEMO3D. This work is a part of a wider project, the NSF Network for Computational Nanotechnology (NCN) and the full paper will include more details on that project (http://nanohub.org/ ).

The main goal of this paper is to present new capabilities that have been added to NEMO3D to make it one of the premier simulation tools for design and analysis of realistically sized nanoelectronic devices. These recent advances include algorithmic refinements, performance analysis to identify the best computational strategies, and memory saving measures. The combined effect of these enhancements is the ability to increase the strain problem size from about 20 to 64 million atoms and the electronic state calculation from 0.5 to 21 million atoms. These two computational domains correspond to physical device domain of around 15x298x298 nm 3 and 15x178x178 nm 3 , large enough to consider realistic components of a nano-structured array with imperfections and irregularities. The key challenges are the reduction of the memory footprint to allow the initialization of large systems and numerically the extraction of interior, degenerate eigenvectors.



Title

High Performance Algorithms for Scalable Spin-Qubit Circuits with Quantum Dots

Author

John Fettig

Author Inst

NCSA/University of Illinois at Urbana-Champaign, USA

Presenter

John Fettig

Abstract

This report details improvements made on a code used for computer assisted design (CAD) of scalable spin-qubit circuits based on multiple quantum dots. It provides a brief scientific framework as well as an overview of the physical and numerical model. Then modifications¯cations and improvements to the code based on utilization of PETSc are listed. Then new and old codes are benchmarked on three NCSA computer clusters. The speed-up of the code is considerable: about 10 times for the eigenvalue solver and 2 times for the Poisson equation solver. An example of code application towards quantum dot modeling is also given. Finally, conclusions and recommendations for future work are provided.



Title

Parallel Multi-Zone Methods for Large-Scale Multidisciplinary Computational Physics Simulations

Author

Ding Li, Guoping Xia, and Charles L. Merkle

Author Inst

Purdue University, USA

Presenter

Ding Li

Abstract

A parallel multi-zone method for the simulation of large-scale multidisciplinary applications involving field equations from multiple branches of physics is outlined. The equations of mathematical physics are expressed in a unified form that enables a single algorithm and computational code to describe problems involving diverse, but closely coupled, physics. Specific sub-disciplines include fluid and plasma dynamics, electromagnetic fields, radiative energy transfer, thermal/mechanical stress and strain distributions and conjugate heat transfer in solids. Efficient parallel implementation of these coupled physics must take into account the different number of governing field equations in the various physical zones and the close coupling inside and between regions. This is accomplished by implementing the unified computational algorithm in terms of an arbitrary grid and a flexible data structure that allows load balancing by sub-clusters. Capabilities are demonstrated by a trapped vortex liquid spray combustor, an MHD power generator, combustor cooling in a rocket engine and a pulsed detonation engine-based combustion system for a gas turbine. The results show a variety of interesting physical phenomena and the efficacy of the computational implementation.


 

Applications Papers III: Performance Measurement

Title

PerfSuite: An Accessible, Open Source Performance Analysis Environment for Linux Development and Performance

Author(s)

Rick Kufrin

Author Inst

NCSA/University of Illinois at Urbana-Champaign, USA

Presenter

Rick Kufrin

Abstract

The motivation, design, implementation, and current status of a new set of software tools called PerfSuite that is targeted to performance analysis of user applications on Linux based systems is described. The primary emphasis of these tools is ease of use/deployment, and portability/reuse, both in implementation details as well as in data representation and format. After a year of public beta availability and production deployment on Linux clusters that rank among the largest-scale in the country, PerfSuite is gaining acceptance as a user-oriented and flexible software tool set that is as valuable on the desktop as it is on leading-edge terascale clusters.



Title

Development and Performance Analysis of a Simulation-Optimization Framework on TeraGrid Linux Clusters Characteristics

Author(s)

Baha Y. Mirghani, Derek A. Baessler, Ranji S. Ranjthan, Michael E. Tryby, Nicholas Karonis, Kumar G. Mahinthakumar

Author Inst.

North Carolina State, USA

Presenter

Baha Y. Mirghani

Abstract

A Large Scale Simulation Optimization (LASSO) framework is being developed by the authors. Linux clusters are the target platform for the framework, specifically cluster resources on the NSF TeraGrid. The framework is designed in a modular fashion that simplifies coupling with simulation model executables, allowing application of simulation optimization approaches across problem domains. In this paper the performance of the LASSO framework is coupled with a parallel groundwater transport simulation model. Performance is measured using a source history reconstruction problem and benchmarked against an existing MPI based implementation developed previously. Performance results indicate that communication overhead in the LASSO framework is contributing significantly to wall times. The authors purpose and will conduct several performance optimizations designed to ameliorate the problem.



Title

Optimizing Performance on Linux Clusters Using Advanced Communication Protocols: Achieving Over 10 Teraflops on an 8.6 Teraflops Linpack-Rated Linux Cluster

Author(s)

Manojkumar Krishnan

Author Inst.

Pacific Northwest National Laboratory, USA

Presenter

Manojkumar Krishnan

Abstract

Advancements in high-performance networks (Quadrics, Infiniband or Myrinet) continue to improve the efficiency of modern clusters. However, the average application efficiency is as small fraction of the peak as the system’s efficiency. This paper describes techniques for optimizing application performance on Linux clusters using Remote Memory Access communication protocols. The effectiveness of these optimizations is presented in the context of an application kernel, dense matrix multiplication. The result was achieving over 10 teraflops on HP Linux cluster on which LINPACK performance is measured as 8.6 teraflops.

   

Applications Papers IV: Applications, Visualization

Title

Scalable Visualization Clusters Using Sepia Technology

Author(s)

Jim Kapadia, Glenn Lupton, Steve Briggs

Author Inst

Hewlett Packard Corporation, USA

Presenter

Jim Kapadia

Abstract

The advent of low cost commodity components and open source software have made it possible to build Linux computing clusters with scalability and performance that put them in the class of supercomputers. As Linux computing clusters become more prevalent, they are increasingly being used to solve challenging scientific and technical problems. Users are now inundated with vast amounts of data needing visualization for better understanding and thus insight. Visualization systems need to keep up with advances in Linux cluster-based computing systems.

In this presentation, we will describe the SEPIA visualization system architecture, which leverages Linux clusters and industry standard commodity components such as PC class systems, graphic cards, and Infiniband interconnect. The original technology was developed in conjunction with US DOE ASCI program. Early versions of SEPIA systems have been deployed at several sites worldwide.



Title

The Case for an MPI ABI

Author(s)

Greg Lindahl

Author Inst

Pathscale

Presenter

Greg Lindahl

Abstract

MPI is a successful API (applications programming interface) for parallel programming. As an API, there is maximum freedom for library implementors, but recompilation is needed to move from one implementation to another. In a world where most users compile their own codes, the fact that you usually need to recompile to run on a different machine is not a problem.

Now that MPI has become very popular, two situations don't fit this model. The first is open-source codes for which most users don't typically compile the application. The second is commercial codes. The first situation makes codes less usable if a domain expert (non computer scientist) has to figure out how to build the code. The second situation means that portability is limited. ISVs (independent software vendors) in particular are typically choosing to only test and support 1 MPI implementation, which means that only a limited number of today's high-speed cluster interconnects and cluster environments are supported. Large, free applications such as MM5 (meso-scale weather model) and NWChem (description?), which are often not modified by their users, cannot be distributed in binary form because a large number of different executables would be needed. This is annoying to most MM5 and NWChem users.

Several vendors have offered to solve this problem by selling widely-portable MPI implementations which support a wide variety of systems, without requiring recompilation or relinking. Such vendors include Scyld, Scali, Verari, HP, and Intel. However, no one of these implementations seems likely to become universal, and each only supports a limited number of cluster interconnects. Not only does this make the recompile issue worse, but it inhibits the success of new interconnect hardware.

An alternative approach is to create an ABI -- an application BINARY interface -- for MPI. An ABI would allow applications to run on the widest variety of interconnects and MPI implementations without relinking or recompiling.

An ABI would need to standardize items which are not standardized in the MPI API. This would actually increase application and test portability, and would also improve the quality of MPI implementations.

I suspect (hope?) that the main barrier to writing an ABI is social. The investment of an MPI implementation to implement the ABI is modest compared to the cost of implementing all of MPI. However, projects which don't feel an ABI is important are unlikely to spend the effort.

An ABI is not a completely solution to the ISV/precompiled software issue. Testing issues would likely limit ISV enthusiasm for supporting their applications on untested interconnects and MPI implementations. However, such testing would be much more convenient than it is today, and could be reasonably automated. Testing could also be reasonably done by customers.

   

Systems Track Abstracts:

Systems Papers I: Reliability

Title

Towards More Reliable Commodity Clusters: A Software-Based Approach at Run Time

Author

Chung-Hsing Hsu

Author Inst

Los Alamos National Laboratory , USA

Presenter

Chung-Hsing Hsu

Abstract

Though the high-performance computing community continues to provide better and better support for Linux-based commodity clusters, cluster end-users and administrators have become more cognizant of the fact that large-scale commodity clusters fail quite frequently. The main source of these failures is hardware (e.g., disk storage, processors, and memory) with the primary cause being heat. This situation is expected to worsen as we venture forth into a new millennium with even larger-scale clusters powered by faster (and/or multi-core) processors.

In general, a faster processor consumes more energy and dissipates more heat. Having thousands of such processors complicates the air flow pattern of the heat dissipated by these processors. Consequently, cluster builders must resort to exotic cooling and fault tolerant technologies and facilities to ensure that the cluster stays cool enough so that it is not perpetually failing. We consider this approach to cluster reliability as being a reactive one. In contrast, we propose a complementary approach that more proactively addresses reliability by more intelligently dealing with power and cooling issues before they become an issue. Our preliminary experimental work demonstrates that our approach can easily be applied to commodity processors and can reduce heat generation by 30% on average with minimal e effect on performance when running the SPEC benchmarks.



Title

Towards Cluster Serviceability

Author

Box Leangsuksun1, Anand Tikotekar1, Makan Pourzandi2, and Ibrahim Haddad2

Author Inst

1Louisiana Tech University and 2Ericcson Research Canada

Presenter

Box Leangsuksun

Abstract

This paper propounds an investigation, a feasibility study, and performance benchmarking of vital management elements for critical enterprise and HPC infrastructure. We conduct a proof-of-concept of integrating high availability cluster mechanism with a secure cluster infrastructure. Our proposed architecture incorporates the Distributed Security Infrastructures (DSI) framework, an open source project providing secure infrastructure for carrier grade clusters and HA-OSCAR, an open source Linux cluster framework that meets the Reliability, Availability, Serviceability (RAS) needs. The result is a cluster infrastructure that is compliant with the Reliability, Analyzability, Serviceability and Security (RASS) principles. We conducted an initial feasibility study and experiment to gauge issues and the degree of success in the implementation of our proposed RASS framework. We verified the integration of HA-OSCAR release 1.0 and DSI release 0.3. Although there was a minimal performance overhead, having"RASS" in the mission critical settings by far outweighs the performance impact. We plan to further our proof-of-concept architecture to suit the required needs on the production environments.



Title

Defining and Measuring Supercomputer Reliability, Availability, and Serviceability (RAS)

Author

Jon Stearley

Author Inst

Sandia National Laboratories, USA

Presenter

Jon Stearley

Abstract

The absence of agreed definitions and metrics for supercomputer RAS obscures meaningful discussion of the issues involved and hinders their solution. This paper provides a survey of existing practices, and proposes standardized definitions and measurements. These are modeled after the SEMI-E10 specification which is widely used in the semiconductor manufacturing industry.

   

Systems Papers II: File Systems

Title

Active Storage Processing in a Parallel File System

Author

Evan J. Felix, Kevin Fox, Kevin Regimbal, Jarek Nieplocha

Author Inst

Pacific Northwest National Laboratory, USA

Presenter

Jarek Nieplocha

Abstract

This paper proposes an extension of the traditional active disk concept by applying it to parallel file systems deployed in modern clusters. Utilizing processing power of the disk controller CPU for processing of data stored on the disk has been proposed in the previous decade. We have extended and deployed this idea in context of storage servers of a parallel file system, where substantial performance benefits can be realized by eliminating the overhead of data movement across the network. In particular, the proposed approach has been implemented and tested in context of Lustre parallel file system used in production Linux clusters at PNNL. Furthermore, our approach allows active storage application code to take advantage of modern multipurpose operating Linux rather than a restricted custom OS used in the previous work. Initial experience with processing very large volume of bioinformatics data validate our approach and demonstrate the potential value of the proposed concept.



Title

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

Author

Jason Cope1, Michael Oberg1, Henry M. Tufo2, and Matthew Woitaszek1

Author Inst

1University of Colorado-Boulder and 2National Center for Atmospheric Research, USA

Presenter

Matthew Woitaszek

Abstract

In this paper, we examine parallel filesystems for shared deployment across multiple Linux clusters running with different hardware architectures and operating systems. Specifically, we deploy GPFS, Lustre, PVFS2, and TerraFS in our test environment containing Intel Xeon, Intel x86-64, and IBM PPC970 systems. We comment on the recent feature additions of each filesystem, describe our implementation and configuration experiences, and present initial performance benchmark results. Our analysis shows that all of the parallel filesystems outperform a legacy NFS system, but with different levels of complexity. Lustre provides the best performance but requires the most administrative overhead. Three of the systems – GPFS, Lustre, and TerraFS – depend on specific kernel versions that increase administrative complexity and reduce interoperability.



Title

Lustre: Is It Ready for Prime Time?

Author

Steve Woods

Author Inst

MCNC-GCNS, USA

Presenter

Steve Woods

Abstract

When dealing with large number of linux nodes in the HPC cluster market, one area that sometimes gets overlooked is the area of shared space among the nodes. This shared disk space can be divided into several areas which might include:

  • User $HOME space
  • Shared application space
  • Data Grid space
  • Backup/data migration space
  • Shared high speed scratch/tmp space

The decision of what technique to use for the various forms of shared space can be determine by any of a number of requirements. For example, if a site has only 32 linux nodes and disk activity is at a minimum, then a NFS mounted shared area coming from the head node or a separate node that has a disk raid attached might be sufficient. But what about those situations where the HPC cluster might be hundreds of nodes which require hundreds of megabytes or gigabytes per second performance. In these situations a NFS mounted file system from a head node would not be sufficient. Other techniques would need to be looked at to provide a global file system. Currently there are several products out there. For example, IBRIX, PVFS, GPFS, Sistina GFS, SpinServer, and Lustre just to name a few. Each has it s strengths and weaknesses. The one we plan to concentrate on is Lustre.

In our presentation on Lustre we plan to cover the following topics:

  • Intro – brief description of Lustre and its concepts
  • Current hardware configuration and philosophy
  • Current software configuration and its philosophy
  • Site experience – ease or difficulty of use, and reliability
  • Performance – OST to OSC performance, NFS exported performance
  • Conclusion – is it ready and which form of shared space does it best fit in

In this presentation it is hoped that sites that are not intimately experienced with Lustre can get a sense as to whether or not Lustre is worth investigating for their site. As for experienced sites, it could be a place to compare notes and experiences.

   

Systems Papers III: Cluster Management Systems

Title

Concept and Implementation of CLUSTERIX: National Cluster of Linux Systems

Author

Roman Wyrzykowski1, Norbert Meyer2, and Maciej Stroinski2

Author Inst.

1Czestochowa University of Technology, 2Poznan Supercomputing and Networking Center

Presenter

Roman Wyrzykowski

Abstract

This paper presents the concept and implementation of the National Cluster of Linux Systems (CLUSTERIX) - a distributed PC-cluster (or metacluster) of a new generation, based on the Polish Optical Network PIONIER. Its implementation makes it possible to deploy a production Grid environment, which consists of local PC-clusters with 64- and 32-bit Linux machines, located in geographically distant independent centers across Poland. The management software (middleware) developed as Open Source allows for dynamic changes in the metacluster configuration. The resulting system will be tested on a set of pilot distributed applications developed as a part of the project. The project is being implemented by 12 Polish supercomputing centers and metropolitan area networks.



 

Title

HA-Rocks: A Cost-Effective High-Availability System for Rocks-Based Linux HPC Cluster

Author

Tong Liu, Saeed Iqbal, Yung-Chin Fang, Onur Celebioglu, Victor Masheyakhi, and Reza Rooholamini

Author Inst.

Dell, USA

Presenter

Tong Liu

Abstract

Commodity Beowulf clusters are now an established parallel and distributed computing paradigm due to their attractive price/performance. Beowulf clusters are increasing being used in environments requiring improved fault tolerance and high availability. From the fault tolerance prospect, the traditional Beowulf cluster architecture has a single master node which creates a single point of failure (SPOF) in the system. Hence to meet the high availability requirements enhancements to the management system are critical. In this paper we propose such enhancements based on the commonly used Rocks management system, we call it high-availability Rocks (HA-Rocks). HA-Rocks is sensitive to the level of failure and provides mechanisms for graceful recovery to a standby master node. We also discuss the architecture and failover algorithm of HA-Rocks. Finally, we evaluate failover time under HA-Rocks.



 

Title

A Specialized Approach for HPC System Software

Author

Ron Brightwell1, Suzanne Kelly1, and Arthur B. Maccabe2

Author Inst.

1Sandia National Laboratories, 2University of New Mexico

Presenter

Ron Brightwell

Abstract

Arthur B. Maccabe
University of New Mexico
This technical presentation will describe our architecture for scalable, high performance system software. The system software architecture that we have developed is a vital component of a complete system. System software is an important area of optimization that directly impacts application performance and scalability, and one that also has implications beyond performance. System software not only impacts the ability of the machine to deliver performance to applications and allow scaling to the full system size, but also has secondary effects that can impact system reliability and robustness. The following present an overview of our system software architecture and provide important details necessary to understand how this architecture impacts performance, scalability,
reliability, and usability. We discuss examples of how our architecture addresses each of these areas and present reasons that we have chosen this specialized approach. We conclude with a discussion of the specifics of the implementation of this software architecture for the Sandia/Cray Red Storm system. 2. The Puma Operating System

   

Systems Papers IV: Monitoring and Detection

Title

Deploying LoGS to Analyze Console Logs on an IBM JS20

Author

James E. Prewett

Author Inst.

NPC@UNM, USA

Presenter

James E. Prewett

Abstract

In early 2005, The Center for High Performance Computing at The University of New Mexico acquired an IBM JS20[2] that has been given the name “Ristra”. The hardware consists of 96 blades, each with two 1.6 GHz PowerPC 970 processors and 4 GBs of RAM. The blades are housed in 7 IBM BladeCenter chassis. Myrinet is used as the high–speed, low–latency, interconnect. Included in the machine’s components is a management node which is an IBM x335 system.

This system uses the Xcat cluster management software[3]. The system was con-figured so that a management node would collect all of the system logs as well as all of the output to the console of the individual blades. Unfortunately, this logging of the blades’ console output put quite a heavy load on the system, especially the disk. It was our goal to also monitor scheduler logs on this machine by mounting the PBS MOM log directory via NFS to each of the blades. It seemed this would be quite problematic if we could not reduce the load on the administrative node.

Under certain conditions, the console logs were growing especially quickly. Some-times the flood of messages indicated an error with the hardware or software on the blades themselves, or with that used to gather and store the console output from the blades. In other cases, the output seemed to indicate normal operation of the monitoring infrastructure of the machine. In either case, the output was rather verbose and was being written to disk. This was putting a heavy load on the disk sub–system.

In order to solve the problem presented by these log files, we decided to replace the log files (in /var/log/consoles/) on disk with FIFO files that could be monitored by a log analysis tool. We decided to use LoGS as the tool to monitor these log FIFO files as it is capable of finding important messages and reacting to them. LoGS was then used to filter out innocuous messages, store important messages to files on disk, and react to certain conditions that it could repair without human intervention.



 

Title

Detection of Privilege Escalation for Linux Cluster Security

Author

Michael Treaster, Xin Meng, William Yurcik, Gregory A. Koenig

Author Inst.

NCSA/University of Illinois at Urbana-Champaign, USA

Presenter

Michael Treaster

Abstract

Cluster computing systems can be among the most valuable resources owned by an organization. As a result, they are high profile targets for attackers, and it is essential that they be well-protected. Although there are a variety of security solutions for enterprise networks and individual machines, there has been little emphasis on securing cluster systems despite their great importance.

NVisionCC is a multifaceted security solution for cluster systems built on the Clumon cluster monitoring infrastructure. This paper describes the component responsible for detecting unauthorized privilege escalation. This component enables security monitoring software to detect an entire class of attacks in which an authorized local user of a cluster is able to improperly elevate process privileges by exploiting some kind of software vulnerability. Detecting this type of attack is one of many facets in an all-encompassing cluster security solution.

   

Systems Papers V: Hardware and Systems

Title

Performance of Two-Way Opteron and Xeon Processor-Based Servers for Scientific and Technical Applications

Author

Douglas Pase and James Stephens

Author Inst

IBM, USA

Presenter

Douglas Pase

Abstract

There are three important characteristics that affect the performance of Linux clusters used for High-Performance Computation (HPC) applications. Those characteristics are the performance of the Arithmetic Logic Unit (ALU) or processor core, memory performance and the performance of the high-speed network used to interconnect the cluster servers or nodes. These characteristics are themselves affected by the choice of processor used in the server. In this paper we compare the performance of two servers that are typical of those used to build Linux clusters. Both are two-way servers based on 64-bit versions of x86 processors. The servers are each packaged in a 1U (1.75 inch high) rack-mounted chassis. The first server we describe is the IBM® eServer™ 326, based on the AMD Opteron™ processor. The second is the IBM xSeries™ 336, based on the Intel® EM64T processor. Both are powerful servers designed and optimized to be used as the building blocks of a Linux cluster that may be as small as a few nodes or as large as several thousand nodes.

In this paper we describe the architecture and performance of each server. We use results from the popular SPEC CPU2000 and Linpack benchmarks to present different aspects of the performance of the processor core. We use results from the STREAM benchmark to present memory performance. Finally, we discuss how characteristics of the I/O slots affect the interconnect performance, whether the choice is Gigabit Ethernet, Myrinet, InfiniBand, or some other interconnect.



Title

A First Look at BlueGene/L

Author

Martin Margo, Christopher Jordan, Patricia Kovatch, and Phil Andrews

Author Inst.

San Diego Supercomputer Center, USA

Presenter

Martin Margo

Abstract

An IBM BlueGene/L machine achieved 70.72 TeraFlops on the November 2004
Top 500 list to become the most powerful computer in the world as measured by the list.
This new machine has “slow” but numerous processors along with two high bandwidth
interconnection networks configurable in different topologies. This machine is optimized
for specific data-intensive simulations, modeling and mining applications. The San
Diego Supercomputer Center (SDSC) recently installed and configured a single rack
BlueGene/L system. This system has a peak speed of 5.7 TeraFlops with its 2048 700
MHz processors and 512 GB of memory and a Linpack measurement of 4.6 TeraFlops.
SDSC specifically configured this machine to have the maximum number of I/O nodes to
provide the best performance for data-intensive applications. In this paper, we discuss
how BlueGene/L is configured at SDSC, the system management and user tools and our
early experiences with the machine.



 

Title

Deploying an IBM e1350 Cluster

Author

Aron Warren

Author Inst.

HPC@UNM, USA

Presenter

Aron Warren

Abstract

UNM HPC in it's technical presentation will describe their experience in bringing online a high density 126 node dual processor IBM E1350 cluster (BladeCenter JS20 + Myrinet 2000) as the first large cluster of it's type deployed to academia within the US (Barcelona Supercomputing Center's MareNostrum cluster achieved #4 in the 2004 TOP500 list). In depth description of the challenges in landing a high density cluster will be discussed along with the problems encountered in deploying and managing a cluster of this type including new compilers and software stacks. Highlights of the successes made with this clustering environment will also be shown.

   

Systems Papers VI: Systems and Experiences

Title

To InfiniBand or Not InfiniBand, One Site's Perspective

Author

Steve Woods

Author Inst.

MCNC-GCNS, USA

Presenter

Steve Woods

Abstract

When dealing in the world of high-performance computing where the source for user computational needs are being met by clusters of Linux-based systems, the situation often arises as to what interconnect to use in connecting dozens if not hundreds of systems together. Generally the answer results in the asking of a few basic questions.

  • Do the applications being run require multiple nodes i.e. parallel
  • How much communication is being done by the applications
  • What are the sizes of the messages being passed between processes
  • And the tough one, what can you afford

Almost all Linux-based nodes come with gigabit Ethernet. For many applications the performance of gigabit Ethernet would be sufficient especially when accompanied with a good quality switch. But what about those situations where communication latency becomes important, as well as, scalability and overall wall clock performance. In these situations it may become necessary to investigate other forms of interconnect for the cluster. Unfortunately, most of the choices for low latency and high bandwidth interconnects are proprietary. Interconnects like Myrinet and Quadrics. Both very good products with varying degrees of performance and cost associated with them. In more recent years Infiniband has come and gone and come back into the spotlight again. But the question is, is it here to stay?

What we plan to present is our experience with Infiniband. Starting with just a few nodes on a small switch and later expanding to encompass all nodes of our 64 node cluster plus support nodes. What is planned to be covered is the following:

  • Brief overview of Infiniband and it’s protocols
  • Our current configuration which includes not only native Infiniband, but also
    gateways to gigabit Ethernet
  • Base performance looking at both bandwidth and latency for various MPI
    message sizes
  • Application performance, what type of improvement has been seen and how
    Infiniband affected scaling of certain applications.
  • Future – where we see Infiniband and it’s usage going
  • Conclusion – Finally our opinion of Infiniband and how valuable it is in the HPC
    marketplace

From this presentation, it is hoped that sites who are considering alternate forms of
connecting cluster nodes other than Ethernet might gain some insight as to the viability of Infiniband as an interconnect media.



 

Title

The Road to a Linux-Based Personal Desktop Cluster

Author

Wu-Chun Feng

Author Inst.

Los Alamos National Laboratory, USA

Presenter

Wu-Chun Feng

Abstract

The proposed talk starts with background information on how unconstrained power consumption has led to the construction of highly inefficient clusters with increasingly high failure rates. By appropriately and transparently constraining power consumption via a cluster-based, power-management tool, we created the super-efficient and highly reliable Green Destiny cluster 1 that debuted three years ago. Since then, Green Destiny has evolved in two different directions: (1) architecturally into a low-power Orion Multisystems DT-12 personal desktop cluster (October 2004) and (2) via adaptive system software into a power-aware CAFfeine desk-side cluster based on AMD quad-Opteron compute nodes (November 2004 at SC2004). These dual paths of evolution and their implications would be elaborated upon in this talk.



 

Title

What Do Mambo, VNC, UML and Grid Computing Have in Common?

Author

Sebastien Goasguen, Michael McLennan, Gerhard Klimeck, and Mark S. Lundstrom

Author Inst.

Purdue University, USA

Presenter

Sebastien Goasguen

Abstract

The Network for Computation Nanotechnology (NCN) operates a web computing application portal called the nanoHUB. This portal serves thousands of users from the nanotechnology community, ranging from students to experienced researchers, using both academic and industrial applications. The content of the nanoHUB is highly dynamic and diverse, consisting of video streams, presentation slides, educational modules, research papers and on-line simulations. This paper presents the core components of the infrastructure supporting the nanoHUB. Committed to the open source philosophy, the NCN has selected the Mambo content management system, and uses it in conjunction with Virtual Network Computing (VNC) to deliver graphical applications to its users. On the backend, these applications run on virtual machines, which provide both a sandbox for the applications and a consistent login mechanism for the NCN user base. User Mode Linux (UML) is used to boot the virtual machines on geographically dispersed resources , and Local Directory Access Protocol (LDAP) is used to validate users against the NCN registry.

   

Vendor Presentations

 

Title

Cooling for Ultra-High Density Racks and Blade Servers

Author(s)

Richard Sawyer

Author Inst

American Power Conversion (APC), USA

Presenter

Richard Sawyer

Abstract

The requirement to deploy high-density servers within single racks is presenting data center managers with a challenge; vendors are now designing servers which will demand up to 20kW of cooling if installed in a single rack. With most data centers designed to cool an average of no more than 2kW per rack, some innovative cooling strategies are required.  Planning strategies to cope with ultra-high power racks are described along with practical solutions for both new and existing data centers.



Title

Bridging Discovery and Learning at Purdue University through Distributed Computing

Author(s)

Krishna Madhavan

Author Inst

Purdue University, USA

Presenter

Krishna Madhavan

Abstract

While there have been significant advances in the field of distributed and high performance computing, the diffusion of these innovations into day-to-day science, technology, engineering, and mathematics curricula continues to remain a major challenge. This presentation focuses on some on-going initiatives lead by Information Technology at Purdue, the central IT organization at Purdue University, to narrow this gap between discovery and learning. The innovative design and deployment of distributed computing tools have significant impact on various pedagogical theories such as problem-based learning and scaffolding approaches. The presentation will not only describe these tools and their pedagogical relevance, but also highlight how they fit in with the larger vision of promoting the diffusion of distributed and high performance computing tools in educational praxis.

   

 

 

Title

AMD's Road to Dual Core

Author(s)

Doug O'Flaherty

Author Inst

AMD, USA

Presenter

Doug O'Flaherty

Abstract

Good architecture is no accident. With silicon planning horizons out years in advance of product delivery, early choices can have an impact on final products. In his presentation, Douglas O'Flaherty, AMD HPC marketing manager, covers the architecture choices that enabled AMD's dual-core processors and how those choices affect performance.



Title

Author(s)

Patrick Geoffray

Author Inst

Myricom, USA

Presenter

Patrick Geoffray

Abstract

Available soon.