| Plenary Presentations
|
| Plenary I
|
Title |
|
Opportunities and Challenges
in High-End Computing for Science and Engineering |
Author(s) |
|
Thom H. Dunning, Jr. |
Author Inst |
|
NCSA/University of Illinois Urbana-Champaign |
Presenter |
|
Thom H. Dunning, Jr. |
| Abstract |
|
Computational modeling and simulation were among
the most significant developments in the practice of scientific
inquiry in the 20th century. They were significant contributions
to scientific and engineering research programs, finding
increasing use in a broad range of industrial applications. We
are presently in the midst of a revolution in computing technologies
with an order-of-magnitude increase in computing capability every
three to five years. The most pressing question now is:
What must scientists and engineers do to harness the power of
high-end computing and information technologies to solve the
most critical problems in science and engineering? Realizing
the scientific and engineering advances promised by the revolution
in computing technologies will require a holistic approach to
computational science and engineering. It will require
advances in the theoretical and mathematical sciences leading
to computational models of ever increasing predictive power and
fidelity. It will require close collaboration among
computational scientists, computer scientists, and applied mathematicians
to translate these advances into scientific and engineering applications
that can realize the full potential of high-end computers. It
will require educating a new generation of scientists and engineers
to use computational modeling and simulation to address the challenging
problems that they will face in the 21st century. In this
presentation, we will briefly discuss all of these issues. |
|
|
| Plenary II
|
Title |
|
Transforming
the Sensing and Prediction of Intense Local Weather Through
Dynamic Adaptation: People and Technologies Interacting
with the Atmosphere |
Author(s) |
|
Kelvin Droegemeier |
Author Inst |
|
University of Oklahoma |
Presenter |
|
Kelvin Droegemeier |
| Abstract |
|
Each year across the United States,
floods, tornadoes, hail, strong winds, lightning, and winter
storms—so-called mesoscale weather events—cause
hundreds of deaths, routinely disrupt transportation and
commerce, and result in annual economic losses greater than
$13B. Although mitigating the impacts of such events
would yield enormous economic and societal benefits, the
ability to do so is stifled by rigid IT frameworks that cannot
accommodate the real time, on-demand, and dynamically
adaptive needs
of mesoscale weather research; its disparate, high-volume
data sets and streams; and the tremendous computational demands
of its numerical models and data assimilation systems.
This presentation describes a major paradigm shift now under way in the
field of meteorology—away from today's environment in which remote
sensing systems, atmospheric prediction models, and hazardous
weather detection systems operate in fixed configurations,
and on fixed schedules largely independent of weather—to one in
which they can change their configuration dynamically in
response to the evolving weather. This transformation involves
the creation of Grid-enabled systems that can operate on
demand and obtain the needed computing, networking, and
storage resources with little prior planning, as weather
and end-user needs dictate. In addition to describing the research and
technology development being performed to establish this capability, I
discuss the associated economic and societal implications of dynamically
adaptive weather systems and the manner in which this new paradigm can
serve as an underpinning for future cyberinfrastructure development. |
|
|
| Plenary III
|
Title |
|
Life, Liberty and the
Pursuit of Larger Clusters |
Author(s) |
|
Mark Seager |
Author Inst |
|
Lawrence Livermore National Laboratory |
Presenter |
|
Mark Seager |
| Abstract |
|
The state of the art in Linux clusters
is O(1K) nodes. Now many institutions are wondering if even
larger clusters can be built. Also, many cluster interconnects
can practically scale to O(4K) to O(6K). In this talk, we offer
some thoughts on the current limitations in Linux clusters
and their impact on scaling up. We also offer some thoughts
on what topics Linux cluster developers should focus on in
order to enable much larger clusters. |
|
|
| Applications
Track Abstracts:
|
| Applications
Papers I:
|
| Title |
|
Performance
Metrics for Ocean and Air Quality Models on Commodity Linux
Platforms |
| Author(s) |
George Delic |
| Author Inst. |
HiPERiSM Consulting |
| Presenter |
George Delic |
| Abstract |
Introduction
This a report on a project to evaluate industry
standard fortran 90/95 compilers for IA-32 Linux commodity platforms
when applied to Air Quality Models (AQM). The goal is to determine
the optimal performance and workload though-put achievable with
commodity hardware. Only a few AQM's have been successfully converted
to OpenMP (CAMs), or MPI (CMAQ) and considerable work remains
to be done on others. In exploring the potential for parallelism
it has been interesting to discover the problems with serial
performance on several AQM codes. For this reason we have searched
for more precise metrics of performance as an aid to measuring
progress in performance enhancement. The historical analogy is
the programming environment on Cray architectures which enable
the development of performance attributes for either individual
codes or workloads using hardware performance counters. Since
commodity processors also have performance counters, software
interfaces such as PAPI, may be used to read them.
This study applied the PAPI library in understanding what delivered
performance is for two AQM's. ISCST3 and AERMOD and what the
optimal achievable performance can be . For the latter, as a
base line, two Ocean models with good vector character have been
included. These are used to measure the optimal performance to
be expected on commodity hardware with available compiler technology.
In addition to performance metrics (as derived from hardware
performance counter values) some comments on I?O storage performance
are also included because of the special character of I?o requirements
in AQMs. |
|
|
Title |
Cluster Computing
Through an Application-Oriented Computational Chemistry Grid |
Author(s) |
Kent Milfeld |
Author Inst. |
TACC/University of Texas at Austin |
Presenter |
Kent Milfeld |
Abstract |
Over the last 20 years, personal computers and
networking infrastructures have greatly enhanced the working
environment and communication of researchers. Also, Linux clusters
now flourish, providing significant resources for executing parallel
applications. But there is still a gap between desktop environments
of individuals and the wide assortment of Unix-flavored HPC system
environments. Grid technologies are delivering tools to bridge
this gap, especially between HPC systems; but the difficulty
of implementing the infrastructure software (installation, configuration,
etc.) has discouraged adaptation of
grid software at the desktop level. Hence, users who employ long-running
parallel applications in their research still log into a grid-enabled
machine to submit batch jobs and manipulate data within a grid.
An infrastructure model adapted by the Computational Chemistry
Grid (CCG)1 eliminates dependence on grid software at the desktop,
is based on the need to run chemistry applications on HPC systems,
and uses a “client” interface
for job submission. A middleware server with grid software components
is employed to handle the deployment and scheduling of jobs and
resource management transparently. This same infrastructure can
be used to implement other client/server paradigms requiring pre-
and post-processing of application data on the desktop, and application
execution on large high-performance computing (HPC) systems, as
well as small departmental (Linux) clusters. This paper describes
the structure and implementation of the CCG infrastructure and
discusses its adaptation to other client/server application needs.
|
|
|
Title |
A Resource Management
System for Adaptive Parallel Applications in Cluster Environments |
Author(s) |
Sheikh Ghafoor |
Author Inst. |
Mississippi State University |
Presenter |
Sheikh Ghafoor |
Abstract |
Adaptive parallel applications that can change
resources during execution, promise better system utilization,
increased application performance, and furthermore, they open
the opportunity for developing a new class of parallel applications
driven by unpredictable data and events, capable of amassing
huge resources on demand. This paper discusses requirements for
a resource management system to support such applications including
communication and negotiation of resources. To schedule adaptive
applications, interaction between the applications and the resource
management system is necessary. While managing adaptive applications
is a multidimensional complex research problem, this paper focuses
only on support that a RMS requires to accommodate adaptive applications.
An early prototype implementation shows that scheduling of adaptive
applications is possible in a cluster environment and the overhead
of management of applications is low compared to the long running
time of typical parallel applications. The prototype implementation
supports a variety of adaptive parallel applications in addition
to rigid parallel applications.
|
|
| Applications
Papers II:
|
| Title |
Large Scale Simulations
in Nanostructures with NEMO3-D on Linux Clusters |
| Author(s) |
Marek Korkusinski, Faisal Saied, Haiying
Xu, Seungwon Lee, Mohamed Sayeed, Sebastien Goasguen, and Gerhard
Klimeck |
| Author Inst |
Purdue University, USA |
| Presenter |
Faisal Saied |
| Abstract |
NEMO3D is a quantum mechanical based simulation
tool created to provide quantitative predictions for nanometer-scaled
semiconductor devices. NEMO3D computes strain field using an
atomistic valence force field method and electronic quantum states
using an atomistic tight-binding Hamiltonian. Target applications
for NEMO3D include semiconductor quantum dots and semiconductor
quantum wires. The atomistic nature of the model, and the need
to go to 100 million atoms and beyond, make this code computationally
very demanding. High performance computing platforms, including
large Linux clusters, are indispensable for this research.
The
key features of NEMO3D, including the underlying physics model,
the application domains, the algorithms and parallelization have
been described in detail previously [1-3],. NEMO3D has been
developed with Linux clusters in mind, and has been ported to
a number of other HPC platforms. Also, a sophisticated graphical
user interface is under development for NEMO3D. This work is
a part of a wider project, the NSF Network for Computational
Nanotechnology (NCN) and the full paper will include more details
on that project (http://nanohub.org/ ).
The main goal of this paper is to present new capabilities
that have been added to NEMO3D to make it one of the premier
simulation tools for design and analysis of realistically sized
nanoelectronic devices. These recent advances include algorithmic
refinements, performance analysis to identify the best computational
strategies, and memory saving measures. The combined effect of
these enhancements is the ability to increase the strain problem
size from about 20 to 64 million atoms and the electronic state
calculation from 0.5 to 21 million atoms. These two computational
domains correspond to physical device domain of around 15x298x298
nm 3 and 15x178x178 nm 3 , large enough to consider realistic
components of a nano-structured array with imperfections and
irregularities. The key challenges are the reduction of the memory
footprint to allow the initialization of large systems and numerically
the extraction of interior, degenerate eigenvectors.
|
|
|
| Title |
High Performance
Algorithms for Scalable Spin-Qubit Circuits with Quantum Dots |
| Author |
John Fettig |
| Author Inst |
NCSA/University of Illinois at Urbana-Champaign,
USA |
| Presenter |
John Fettig |
| Abstract |
This report details improvements made on a code
used for computer assisted design (CAD) of scalable spin-qubit
circuits based on multiple quantum dots. It provides a brief
scientific
framework as well as an overview of the physical and numerical
model. Then modifications¯cations and improvements to the
code based on utilization of PETSc are listed. Then new and old
codes are benchmarked on three NCSA computer clusters. The speed-up
of the code is considerable: about 10 times for the eigenvalue
solver and 2 times for the Poisson equation solver. An example
of code application towards quantum dot modeling is also given.
Finally, conclusions and recommendations for future work are
provided.
|
|
|
| Title |
Parallel Multi-Zone
Methods for Large-Scale Multidisciplinary Computational Physics
Simulations |
| Author |
Ding Li, Guoping Xia, and Charles L. Merkle |
| Author Inst |
Purdue University,
USA |
| Presenter |
Ding Li |
| Abstract |
A parallel multi-zone method for the simulation
of large-scale multidisciplinary applications involving field
equations from multiple branches of physics is outlined. The
equations of mathematical physics are expressed in a unified
form that enables a single algorithm and computational code to
describe problems involving diverse, but closely coupled, physics.
Specific sub-disciplines include fluid and plasma dynamics, electromagnetic
fields, radiative energy transfer, thermal/mechanical stress
and strain distributions and conjugate heat transfer in solids.
Efficient parallel implementation of these coupled physics must
take into account the different number of governing field equations
in the various physical zones and the close coupling inside and
between regions. This is accomplished by implementing the unified
computational algorithm in terms of an arbitrary grid and a flexible
data structure that allows load balancing by sub-clusters. Capabilities
are demonstrated by a trapped vortex liquid spray combustor,
an MHD power generator, combustor cooling in a rocket engine
and a pulsed detonation engine-based combustion system for a
gas turbine. The results show a variety of interesting physical
phenomena and the efficacy of the computational implementation.
|
|
|
| Applications Papers III:
Performance Measurement
|
| Title |
PerfSuite: An Accessible,
Open Source Performance Analysis Environment for Linux Development
and Performance |
| Author(s) |
Rick Kufrin |
| Author Inst |
NCSA/University of Illinois at Urbana-Champaign,
USA |
| Presenter |
Rick Kufrin |
| Abstract |
The motivation, design, implementation, and current
status of a new set of software tools called PerfSuite that is
targeted to performance analysis of user applications on Linux
based systems is described. The primary emphasis of these tools
is ease of use/deployment, and portability/reuse, both in implementation
details as well as in data representation and format. After a
year of public beta availability and production deployment on
Linux clusters that rank among the largest-scale in the country,
PerfSuite is gaining acceptance as a user-oriented and flexible
software tool set that is as valuable on the desktop as it is
on leading-edge terascale clusters.
|
|
|
Title |
Development and
Performance Analysis of a Simulation-Optimization Framework
on TeraGrid Linux Clusters
Characteristics |
Author(s) |
Baha Y. Mirghani, Derek A. Baessler, Ranji S.
Ranjthan, Michael E. Tryby, Nicholas Karonis, Kumar G. Mahinthakumar |
Author Inst. |
North Carolina State, USA |
Presenter |
Baha Y. Mirghani |
Abstract |
A Large Scale Simulation Optimization (LASSO)
framework is being developed by the authors. Linux clusters are
the target platform for the framework, specifically cluster resources
on the NSF TeraGrid. The framework is designed in a modular fashion
that simplifies coupling with simulation model executables, allowing
application of simulation optimization approaches across problem
domains. In this paper the performance of the LASSO framework
is coupled with a parallel groundwater transport simulation model.
Performance is measured using a source history reconstruction
problem and benchmarked against an existing MPI based implementation
developed previously. Performance results indicate that communication
overhead in the LASSO framework is contributing significantly
to wall times. The authors purpose and will conduct several performance
optimizations designed to ameliorate the problem.
|
|
|
Title |
Optimizing Performance
on Linux Clusters Using Advanced Communication Protocols: Achieving
Over 10 Teraflops on an 8.6 Teraflops Linpack-Rated Linux Cluster |
Author(s) |
Manojkumar Krishnan |
Author Inst. |
Pacific Northwest National Laboratory,
USA |
Presenter |
Manojkumar Krishnan |
Abstract |
Advancements in high-performance networks (Quadrics,
Infiniband or Myrinet) continue to improve the efficiency of
modern clusters. However, the average application efficiency
is as small fraction of the peak as the system’s efficiency.
This paper describes techniques for optimizing application performance
on Linux clusters using Remote Memory Access communication protocols.
The effectiveness of these optimizations is presented in the
context of an application kernel, dense matrix multiplication.
The result was achieving over 10 teraflops on HP Linux cluster
on which LINPACK performance is measured as 8.6 teraflops.
|
|
|
| Applications Papers IV:
Applications, Visualization
|
Title |
Scalable Visualization
Clusters Using Sepia Technology |
Author(s) |
Jim Kapadia, Glenn Lupton, Steve Briggs |
Author Inst |
Hewlett Packard Corporation, USA |
Presenter |
Jim Kapadia |
| Abstract |
The advent of low cost commodity components and
open source software have made it possible to build Linux computing
clusters with scalability and performance that put them in the
class of supercomputers. As Linux computing clusters become more
prevalent, they are increasingly being used to solve challenging
scientific and technical problems. Users are now inundated with
vast amounts of data needing visualization for better understanding
and thus insight. Visualization systems need to keep up with advances
in Linux cluster-based computing systems.
In this presentation,
we will describe the SEPIA visualization system architecture,
which leverages Linux clusters and industry standard commodity
components such as PC class systems, graphic cards, and Infiniband
interconnect. The original technology was developed in conjunction
with US DOE ASCI program. Early versions of SEPIA systems have
been deployed at several sites worldwide. |
|
|
| Title |
The Case for
an MPI ABI |
| Author(s) |
Greg Lindahl |
| Author Inst |
Pathscale |
| Presenter |
Greg Lindahl |
| Abstract |
MPI is a successful API (applications programming
interface) for parallel programming. As an API, there is maximum
freedom for library implementors, but recompilation is needed
to move from one implementation to another. In a world where
most users compile their own codes, the fact that you usually
need to recompile to run on a different machine is not a problem.
Now
that MPI has become very popular, two situations don't fit
this model. The first is open-source codes for which most users
don't typically compile the application. The second is commercial
codes. The first situation makes codes less usable if a domain
expert (non computer scientist) has to figure out how to build
the code. The second situation means that portability is limited.
ISVs (independent software vendors) in particular are typically
choosing to only test and support 1 MPI implementation, which
means that only a limited number of today's high-speed cluster
interconnects and cluster environments are supported. Large,
free applications such as MM5 (meso-scale weather model) and
NWChem (description?), which are often not modified by their
users, cannot be distributed in binary form because a large number
of different executables would be needed. This is annoying to
most MM5 and NWChem users.
Several vendors have offered to solve
this problem by selling widely-portable MPI implementations
which support a wide variety of systems, without requiring recompilation
or relinking. Such vendors include Scyld, Scali, Verari, HP,
and Intel. However, no one of these implementations seems likely
to become universal, and each only supports a limited number
of cluster interconnects. Not only does this make the recompile
issue worse, but it inhibits the success of new interconnect
hardware.
An alternative approach is to create an ABI -- an application
BINARY interface -- for MPI. An ABI would allow applications
to run on the widest variety of interconnects and MPI implementations
without relinking or recompiling.
An ABI would need to standardize
items which are not standardized in the MPI API. This would
actually increase application and test portability, and would
also improve the quality of MPI implementations.
I suspect (hope?) that the
main barrier to writing an ABI is social. The investment of
an MPI implementation to implement the ABI is modest compared
to the cost of implementing all of MPI. However, projects which
don't feel an ABI is important are unlikely to spend the effort.
An
ABI is not a completely solution to the ISV/precompiled software
issue. Testing issues would likely limit ISV enthusiasm for supporting
their applications on untested interconnects and MPI implementations.
However, such testing would be much more convenient than it is
today, and could be reasonably automated. Testing could also
be reasonably done by customers. |
|
|
| Systems
Track Abstracts:
|
| Systems Papers I: Reliability
|
| Title |
Towards More Reliable
Commodity Clusters: A Software-Based Approach at Run Time |
| Author |
Chung-Hsing Hsu |
| Author Inst |
Los Alamos National Laboratory , USA |
| Presenter |
Chung-Hsing Hsu |
| Abstract |
Though the high-performance computing community
continues to provide better and better support for Linux-based
commodity clusters, cluster end-users and administrators have
become more cognizant of the fact that large-scale commodity
clusters fail quite frequently. The main source of these failures
is hardware (e.g., disk storage, processors, and memory) with
the primary cause being heat. This situation is expected to worsen
as we venture forth into a new millennium with even larger-scale
clusters powered by faster (and/or multi-core) processors.
In
general, a faster processor consumes more energy and dissipates
more heat. Having thousands of such processors complicates the
air flow pattern of the heat dissipated by these processors.
Consequently, cluster builders must resort to exotic cooling
and fault tolerant technologies and facilities to ensure that
the cluster stays cool enough so that it is not perpetually failing.
We consider this approach to cluster reliability as being a reactive
one. In contrast, we propose a complementary approach that more
proactively addresses reliability by more intelligently dealing
with power and cooling issues before they become an issue. Our
preliminary experimental work demonstrates that our approach
can easily be applied to commodity processors and can reduce
heat generation by 30% on average with minimal eeffect on performance
when running the SPEC benchmarks. |
|
|
| Title |
Towards Cluster
Serviceability |
| Author |
Box Leangsuksun1, Anand Tikotekar1, Makan Pourzandi2,
and Ibrahim Haddad2 |
| Author Inst |
1Louisiana Tech University and 2Ericcson
Research Canada |
| Presenter |
Box Leangsuksun |
| Abstract |
This paper propounds an investigation, a feasibility
study, and performance benchmarking of vital management elements
for critical enterprise and HPC infrastructure. We conduct a
proof-of-concept of integrating high availability cluster mechanism
with a secure cluster infrastructure. Our proposed architecture
incorporates the Distributed Security Infrastructures (DSI) framework,
an open source project providing secure infrastructure for carrier
grade clusters and HA-OSCAR, an open source Linux cluster framework
that meets the Reliability, Availability, Serviceability (RAS)
needs. The result is a cluster infrastructure that is compliant
with the Reliability, Analyzability, Serviceability and Security
(RASS) principles. We conducted an initial feasibility study
and experiment to gauge issues and the degree of success in the
implementation of our proposed RASS framework. We verified the
integration of HA-OSCAR release 1.0 and DSI release 0.3. Although
there was a minimal performance overhead, having"RASS" in
the mission critical settings by far outweighs the performance
impact. We plan to further our proof-of-concept architecture
to suit the required needs on the production environments. |
|
|
| Title |
Defining and Measuring
Supercomputer Reliability, Availability, and Serviceability
(RAS) |
| Author |
Jon Stearley |
| Author Inst |
Sandia National Laboratories, USA |
| Presenter |
Jon Stearley |
| Abstract |
The absence of agreed definitions and metrics
for supercomputer RAS obscures meaningful discussion of the issues
involved and hinders their solution. This paper provides a survey
of existing practices, and proposes standardized definitions
and measurements. These are modeled after the SEMI-E10 specification
which is widely used in the semiconductor manufacturing industry. |
|
|
| Systems Papers II: File
Systems
|
| Title |
Active Storage
Processing in a Parallel File System |
| Author |
Evan J. Felix, Kevin Fox, Kevin Regimbal, Jarek
Nieplocha |
| Author Inst |
Pacific Northwest National Laboratory, USA |
| Presenter |
Jarek Nieplocha |
| Abstract |
This paper proposes an extension of the traditional
active disk concept by applying it to parallel file systems deployed
in modern clusters. Utilizing processing power of the disk controller
CPU for processing of data stored on the disk has been proposed
in the previous decade. We have extended and deployed this idea
in context of storage servers of a parallel file system, where
substantial performance benefits can be realized by eliminating
the overhead of data movement across the network. In particular,
the proposed approach has been implemented and tested in context
of Lustre parallel file system used in production Linux clusters
at PNNL. Furthermore, our approach allows active storage application
code to take advantage of modern multipurpose operating Linux
rather than a restricted custom OS used in the previous work.
Initial experience with processing very large volume of bioinformatics
data validate our approach and demonstrate the potential value
of the proposed concept. |
|
|
| Title |
Shared Parallel
Filesystems in Heterogeneous Linux Multi-Cluster Environments |
| Author |
Jason Cope1, Michael Oberg1, Henry M. Tufo2, and
Matthew Woitaszek1 |
| Author Inst |
1University of Colorado-Boulder and 2National
Center for Atmospheric Research, USA |
| Presenter |
Matthew Woitaszek |
| Abstract |
In this paper, we examine parallel filesystems
for shared deployment across multiple Linux clusters running
with different hardware architectures and operating systems.
Specifically, we deploy GPFS, Lustre, PVFS2, and TerraFS in our
test environment containing Intel Xeon, Intel x86-64, and IBM
PPC970 systems. We comment on the recent feature additions of
each filesystem, describe our implementation and configuration
experiences, and present initial performance benchmark results.
Our analysis shows that all of the parallel filesystems outperform
a legacy NFS system, but with different levels of complexity.
Lustre provides the best performance but requires the most administrative
overhead. Three of the systems – GPFS, Lustre, and TerraFS – depend
on specific kernel versions that increase administrative complexity
and reduce interoperability.
|
|
|
| Title |
Lustre: Is It
Ready for Prime Time? |
| Author |
Steve Woods |
| Author Inst |
MCNC-GCNS, USA |
| Presenter |
Steve Woods |
| Abstract |
When dealing with large number of linux nodes
in the HPC cluster market, one area that sometimes gets overlooked
is the area of shared space among the nodes. This shared disk
space can be divided into several areas which might include:
- User $HOME space
- Shared application space
- Data Grid space
- Backup/data migration space
- Shared high speed
scratch/tmp space
The decision of what technique to use for the
various forms of shared space can be determine by any of a number
of requirements. For example, if a site has only 32 linux nodes
and disk activity is at a minimum, then a NFS mounted shared
area coming from the head node or a separate node that has a
disk raid attached might be sufficient. But what about those
situations where the HPC cluster might be hundreds of nodes which
require hundreds of megabytes or gigabytes per second performance.
In these situations a NFS mounted file system from a head node
would not be sufficient. Other techniques would need to be looked
at to provide a global file system. Currently there are several
products out there. For example, IBRIX, PVFS, GPFS, Sistina GFS,
SpinServer, and Lustre just to name a few. Each has it s strengths
and weaknesses. The one we plan to concentrate on is Lustre.
In
our presentation on Lustre we plan to cover the following topics:
- Intro – brief
description of Lustre and its concepts
- Current hardware
configuration and philosophy
- Current software configuration
and its philosophy
- Site experience – ease or difficulty
of use, and reliability
- Performance – OST to OSC
performance, NFS exported performance
- Conclusion – is
it ready and which form of shared space does it best fit in
In
this presentation it is hoped that sites that are not intimately
experienced with Lustre can get a sense as to whether or not
Lustre is worth investigating for their site. As for experienced
sites, it could be a place to compare notes and experiences.
|
|
|
| Systems Papers III: Cluster
Management Systems
|
Title |
Concept and Implementation of CLUSTERIX: National
Cluster of Linux Systems |
Author |
Roman Wyrzykowski1, Norbert Meyer2, and Maciej
Stroinski2 |
Author Inst. |
1Czestochowa University of Technology, 2Poznan
Supercomputing and Networking Center |
Presenter |
Roman Wyrzykowski |
Abstract |
This paper presents the concept and implementation
of the National Cluster of Linux Systems (CLUSTERIX) - a distributed
PC-cluster (or metacluster) of a new generation, based on the
Polish Optical Network PIONIER. Its implementation makes it possible
to deploy a production Grid environment, which consists of local
PC-clusters with 64- and 32-bit Linux machines, located in geographically
distant independent centers across Poland. The management software
(middleware) developed as Open Source allows for dynamic changes
in the metacluster configuration. The resulting system will be
tested on a set of pilot distributed applications developed as
a part of the project. The project is being implemented by 12
Polish supercomputing centers and metropolitan area networks. |
|
|
Title |
HA-Rocks: A Cost-Effective High-Availability
System for Rocks-Based Linux HPC Cluster |
Author |
Tong Liu, Saeed Iqbal, Yung-Chin Fang, Onur Celebioglu,
Victor Masheyakhi, and Reza Rooholamini |
Author Inst. |
Dell, USA |
Presenter |
Tong Liu |
Abstract |
Commodity Beowulf clusters are now an established
parallel and distributed computing paradigm due to their attractive
price/performance. Beowulf clusters are increasing being used
in environments requiring improved fault tolerance and high availability.
From the fault tolerance prospect, the traditional Beowulf cluster
architecture has a single master node which creates a single point
of failure (SPOF) in the system. Hence to meet the high availability
requirements enhancements to the management system are critical.
In this paper we propose such enhancements based on the commonly
used Rocks management system, we call it high-availability Rocks
(HA-Rocks). HA-Rocks is sensitive to the level of failure and
provides mechanisms for graceful recovery to a standby master
node. We also discuss the architecture and failover algorithm
of HA-Rocks. Finally, we evaluate failover time under HA-Rocks. |
|
|
| Title |
A Specialized Approach for HPC System Software |
Author |
Ron Brightwell1, Suzanne Kelly1, and Arthur B.
Maccabe2 |
Author Inst. |
1Sandia National Laboratories, 2University of New
Mexico |
Presenter |
Ron Brightwell |
Abstract |
Arthur B. Maccabe
University of New Mexico
This technical presentation will describe our architecture for
scalable, high performance system software. The system software
architecture that we have developed is a vital component of a complete
system. System software is an important area of optimization that
directly impacts application performance and scalability, and one
that also has implications beyond performance. System software
not only impacts the ability of the machine to deliver performance
to applications and allow scaling to the full system size, but
also has secondary effects that can impact system reliability and
robustness. The following present an overview of our system software
architecture and provide important details necessary to understand
how this architecture impacts performance, scalability,
reliability, and usability. We discuss examples of how our architecture
addresses each of these areas and present reasons that we have
chosen this specialized approach. We conclude with a discussion
of the specifics of the implementation of this software architecture
for the Sandia/Cray Red Storm system. 2. The Puma Operating System |
|
|
| Systems Papers IV: Monitoring
and Detection
|
Title |
Deploying LoGS to Analyze Console Logs on an
IBM JS20 |
Author |
James E. Prewett |
Author Inst. |
NPC@UNM, USA |
Presenter |
James E. Prewett |
Abstract |
In early 2005, The Center for High Performance
Computing at The University of New Mexico acquired an IBM JS20[2]
that has been given the name “Ristra”.
The hardware consists of 96 blades, each with two 1.6 GHz PowerPC
970 processors and 4 GBs of RAM. The blades are housed in 7 IBM
BladeCenter chassis. Myrinet is used as the high–speed,
low–latency,
interconnect. Included in the machine’s components is a
management node which is an IBM x335 system.
This system uses
the Xcat cluster management software[3]. The system was con-figured
so that a management node would collect all of the system logs
as well as all of the output to the console of the individual
blades. Unfortunately, this logging of the blades’ console
output put quite a heavy load on the system, especially the disk.
It was our goal to also monitor scheduler logs on this machine
by mounting the PBS MOM log directory via NFS to each of the
blades. It seemed this would be quite problematic if we could
not reduce the load on the administrative node.
Under certain
conditions, the console logs were growing especially quickly.
Some-times the flood of messages indicated an error with the
hardware or software on the blades themselves, or with that used
to gather and store the console output from the blades. In other
cases, the output seemed to indicate normal operation of the
monitoring infrastructure of the machine. In either case, the
output was rather verbose and was being written to disk. This
was putting a heavy load on the disk sub–system.
In order
to solve the problem presented by these log files, we decided
to replace the log files (in /var/log/consoles/) on disk with
FIFO files that could be monitored by a log analysis tool. We
decided to use LoGS as the tool to monitor these log FIFO files
as it is capable of finding important messages and reacting to
them. LoGS was then used to filter out innocuous messages, store
important messages to files on disk, and react to certain conditions
that it could repair without human intervention. |
|
|
Title
|
Detection of Privilege Escalation for Linux Cluster
Security |
Author |
Michael Treaster, Xin Meng, William Yurcik, Gregory
A. Koenig |
Author Inst. |
NCSA/University of Illinois at Urbana-Champaign,
USA |
Presenter |
Michael Treaster |
Abstract |
Cluster computing systems can be among the most
valuable resources owned by an organization. As a result, they
are high profile targets for attackers, and it is essential that
they be well-protected. Although there are a variety of security
solutions for enterprise networks and individual machines, there
has been little emphasis on securing cluster systems despite
their great importance.
NVisionCC is a multifaceted security
solution for cluster systems built on the Clumon cluster monitoring
infrastructure. This paper describes the component responsible
for detecting unauthorized privilege escalation. This component
enables security monitoring software to detect an entire class
of attacks in which an authorized local user of a cluster is
able to improperly elevate process privileges by exploiting some
kind of software vulnerability. Detecting this type of attack
is one of many facets in an all-encompassing cluster security
solution. |
|
|
| Systems Papers V: Hardware
and Systems
|
| Title |
Performance of
Two-Way Opteron and Xeon Processor-Based Servers for Scientific
and Technical Applications |
| Author |
Douglas Pase and James Stephens |
| Author Inst |
IBM, USA |
| Presenter |
Douglas Pase |
| Abstract |
There are three important characteristics that
affect the performance of Linux clusters used for High-Performance
Computation (HPC) applications. Those characteristics are the
performance of the Arithmetic Logic Unit (ALU) or processor core,
memory performance and the performance of the high-speed network
used to interconnect the cluster servers or nodes. These characteristics
are themselves affected by the choice of processor used in the
server. In this paper we compare the performance of two servers
that are typical of those used to build Linux clusters. Both
are two-way servers based on 64-bit versions of x86 processors.
The servers are each packaged in a 1U (1.75 inch high) rack-mounted
chassis. The first server we describe is the IBM® eServer™ 326,
based on the AMD Opteron™ processor.
The second is the IBM xSeries™ 336, based on the Intel® EM64T
processor. Both are powerful servers designed and optimized to
be used as the building blocks of a Linux cluster that may be as
small as a few nodes or as large as several thousand nodes.
In
this paper we describe the architecture and performance of each
server. We use results from the popular SPEC CPU2000 and Linpack
benchmarks to present different aspects of the performance of
the processor core. We use results from the STREAM benchmark
to present memory performance. Finally, we discuss how characteristics
of the I/O slots affect the interconnect performance, whether
the choice is Gigabit Ethernet, Myrinet, InfiniBand, or some
other interconnect. |
|
|
Title |
A First Look at BlueGene/L |
Author |
Martin Margo, Christopher Jordan, Patricia Kovatch,
and Phil Andrews |
Author Inst. |
San Diego Supercomputer Center, USA |
Presenter |
Martin Margo |
Abstract |
An IBM BlueGene/L machine achieved 70.72 TeraFlops
on the November 2004
Top 500 list to become the most powerful computer in the world
as measured by the list.
This new machine has “slow” but numerous processors
along with two high bandwidth
interconnection networks configurable in different topologies.
This machine is optimized
for specific data-intensive simulations, modeling and mining applications.
The San
Diego Supercomputer Center (SDSC) recently installed and configured
a single rack
BlueGene/L system. This system has a peak speed of 5.7 TeraFlops
with its 2048 700
MHz processors and 512 GB of memory and a Linpack measurement of
4.6 TeraFlops.
SDSC specifically configured this machine to have the maximum number
of I/O nodes to
provide the best performance for data-intensive applications. In
this paper, we discuss
how BlueGene/L is configured at SDSC, the system management and
user tools and our
early experiences with the machine. |
|
|
Title |
Deploying an IBM e1350 Cluster |
Author |
Aron Warren |
Author Inst. |
HPC@UNM, USA |
Presenter |
Aron Warren |
Abstract |
UNM HPC in it's technical presentation will describe
their experience in bringing online a high density 126 node dual
processor IBM E1350 cluster (BladeCenter JS20 + Myrinet 2000)
as the first large cluster of it's type deployed to academia
within the US (Barcelona Supercomputing Center's MareNostrum
cluster achieved #4 in the 2004 TOP500 list). In depth description
of the challenges in landing a high density cluster will be discussed
along with the problems encountered in deploying and managing
a cluster of this type including new compilers and software stacks.
Highlights of the successes made with this clustering environment
will also be shown. |
|
|
| Systems Papers VI: Systems
and Experiences
|
Title |
To InfiniBand
or Not InfiniBand, One Site's Perspective |
Author |
Steve Woods |
Author Inst. |
MCNC-GCNS, USA |
Presenter |
Steve Woods |
Abstract |
When dealing in the world of high-performance computing
where the source for user computational needs are being met by
clusters of Linux-based systems, the situation often arises as
to what interconnect to use in connecting dozens if not hundreds
of systems together. Generally the answer results in the asking
of a few basic questions.
- Do the applications being run require multiple nodes i.e.
parallel
- How much communication is being done by the applications
- What
are the sizes of the messages being passed between processes
- And
the tough one, what can you afford
Almost all Linux-based nodes
come with gigabit Ethernet. For many applications the performance
of gigabit Ethernet would be sufficient especially when accompanied
with a good quality switch. But what about those situations where
communication latency becomes important, as well as, scalability
and overall wall clock performance. In these situations it may
become necessary to investigate other forms of interconnect for
the cluster. Unfortunately, most of the choices for low latency
and high bandwidth interconnects are proprietary. Interconnects
like Myrinet and Quadrics. Both very good products with varying
degrees of performance and cost associated with them. In more
recent years Infiniband has come and gone and come back into
the spotlight again. But the question is, is it here to stay?
What
we plan to present is our experience with Infiniband. Starting
with just a few nodes on a small switch and later expanding to
encompass all nodes of our 64 node cluster plus support nodes.
What is planned to be covered is the following:
- Brief overview
of Infiniband and it’s protocols
- Our current configuration
which includes not only native Infiniband, but also
gateways to gigabit Ethernet
- Base performance looking at both
bandwidth and latency for various MPI
message sizes
- Application performance, what type of improvement
has been seen and how
Infiniband affected scaling of certain applications.
- Future – where
we see Infiniband and it’s usage
going
- Conclusion – Finally our opinion of Infiniband
and how valuable it is in the HPC
marketplace
From this presentation, it is hoped that sites who
are considering alternate forms of
connecting cluster nodes other than Ethernet might gain some
insight as to the viability of Infiniband as an interconnect
media. |
|
|
Title |
The Road to a Linux-Based
Personal Desktop Cluster |
Author |
Wu-Chun Feng |
Author Inst. |
Los Alamos National Laboratory, USA |
Presenter |
Wu-Chun Feng |
Abstract |
The proposed talk starts with background information
on how unconstrained power consumption has led to the construction
of highly inefficient clusters with increasingly high failure
rates. By appropriately and transparently constraining power
consumption via a cluster-based, power-management tool, we created
the super-efficient and highly reliable Green Destiny cluster
1 that debuted three years ago. Since then, Green Destiny has
evolved in two different directions: (1) architecturally into
a low-power Orion Multisystems DT-12 personal desktop cluster
(October 2004) and (2) via adaptive system software into a power-aware
CAFfeine desk-side cluster based on AMD quad-Opteron compute
nodes (November 2004 at SC2004). These dual paths of evolution and
their implications would be elaborated upon in this talk. |
|
|
Title |
What Do Mambo, VNC, UML and Grid Computing Have in
Common? |
Author |
Sebastien Goasguen, Michael McLennan, Gerhard Klimeck,
and Mark S. Lundstrom |
Author Inst. |
Purdue University, USA |
Presenter |
Sebastien Goasguen |
Abstract |
The Network for Computation Nanotechnology (NCN) operates
a web computing application portal called the nanoHUB. This portal
serves thousands of users from the nanotechnology community,
ranging from students to experienced researchers, using both
academic and industrial applications. The content of the nanoHUB
is highly dynamic and diverse, consisting of video streams, presentation
slides, educational modules, research papers and on-line simulations.
This paper presents the core components of the infrastructure
supporting the nanoHUB. Committed to the open source philosophy,
the NCN has selected the Mambo content management system, and
uses it in conjunction with Virtual Network Computing (VNC) to
deliver graphical applications to its users. On the backend,
these applications run on virtual machines, which provide both
a sandbox for the applications and a consistent login mechanism
for the NCN user base. User Mode Linux (UML) is used to boot
the virtual machines on geographically dispersed resources ,
and Local Directory Access Protocol (LDAP) is used to validate
users against the NCN registry. |
|
|
| Vendor
Presentations
|
|
|
Title |
Cooling for Ultra-High
Density Racks and Blade Servers |
Author(s) |
Richard Sawyer |
Author Inst |
American Power Conversion (APC), USA |
Presenter |
Richard Sawyer |
| Abstract |
The requirement to deploy high-density servers
within single racks is
presenting data center managers with a challenge; vendors are now designing
servers which will demand up to 20kW of cooling if installed in a single
rack. With most data centers designed to cool an average of no more than
2kW per rack, some innovative cooling strategies are required. Planning
strategies to cope with ultra-high power racks are described along with
practical solutions for both new and existing data centers. |
|
|
Title |
Bridging Discovery and Learning at Purdue University through
Distributed Computing |
Author(s) |
Krishna Madhavan |
Author Inst |
Purdue University, USA |
Presenter |
Krishna Madhavan |
| Abstract |
While there have been significant advances in the field of distributed
and high performance computing, the diffusion of these innovations
into day-to-day science, technology, engineering, and mathematics
curricula continues to remain a major challenge. This presentation
focuses on some on-going initiatives lead by Information
Technology at Purdue, the central IT organization at Purdue University, to narrow
this gap between discovery and learning. The innovative design
and deployment of distributed computing tools have significant
impact on various pedagogical theories such as problem-based
learning and scaffolding approaches. The presentation will
not only describe these tools and their pedagogical relevance,
but also highlight how they fit in with the larger vision
of promoting the diffusion of distributed and high performance
computing tools in educational praxis. |
|
|
|
|
Title |
AMD's Road to
Dual Core |
Author(s) |
Doug O'Flaherty |
Author Inst |
AMD, USA |
Presenter |
Doug O'Flaherty |
| Abstract |
Good architecture is no accident. With silicon
planning horizons out years in advance of product delivery, early
choices can have an impact on final products. In his presentation,
Douglas O'Flaherty, AMD HPC marketing manager, covers the architecture
choices that enabled AMD's dual-core processors and
how those choices affect performance. |
|
|
Title |
|
Author(s) |
Patrick Geoffray |
Author Inst |
Myricom, USA |
Presenter |
Patrick Geoffray |
| Abstract |
Available soon. |
|
|