This is an archived page of the 2001 conference


Linux Clusters: the HPC Revolution

Plenary, Applications Development, and Systems Administration Track Abstracts
Last updated: 12 April 2001

Plenary Sessions

Title Libertie, Egalitie, Fraternatie: Viva Linux Revolution!
Author Dan Reed
Author Inst NCSA
Presenter Dan Reed
Abstract Commodity processors and storage, very high-performance WAN and LAN interconnects, Linux and open source HPC and Grid software are transforming the way we develop and deploy high-performance computing resources. Concurrently, a new model of nationally integrated, tightly coupled scientific data archives and instruments is emerging. This talk outlines a new world view of distributed data, computing, and network resources for 21st century science and engineering.
Title Clusters: What's all the fuss about anyway?
Author Tim Mattson
Author Inst Intel Corporation
Presenter Tim Mattson
Abstract Supercomputing-people started building clusters 15 to 20 years ago. So why is it that after 15 years as the supercomputer-of-choice for economically disadvantaged academics, they've shot-up in importance to become the dominant architecture for High Performance Computing? In this talk, we'll explore this transition and put cluster computing technology into its larger context in the universe of supercomputing. This context will give us a platform to jump off as we look at projections for HPC clusters for next few years. Finally, I will "rain on my own parade" and talk about challenges we face that could stop the whole cluster revolution dead in its tracks.

Applications Track Abstracts:

Applications Session I: Performance Issues - Looking Under the Hood
Title Using PAPI for Hardware Performance Monitoring on Linux Clusters
Author Jack Dongarra, Kevin London, Shirley Moore, and Phil Mucci
Author Inst University of Tennessee-Knoxville
Presenter Shirley Moore
Abstract PAPI is a specification of a cross-platform interface to hardware performance counters on modern microprocessors. These counters exist as a small set of registers that counter events, which are occurrences of specific signals related to a processor's function. Monitoring these events has a variety of uses in application performance analysis and tuning. The PAPI specification consists of both a standard set of events deemed most relevant for application performance tuning, as well as both high-level and low-level sets of routines for accessing the counters. The high level interface simply provides the ability to start, stop, and read sets of events, and is intended for the acquisition of simple but accurate measurement by application engineers. The fully programmable low-level interface provides sophisticated options for controlling the counters, such as setting thresholds for interrupt on overflow, as well as access to all native counting modes and events, and is intended for third-party tool writers or users with more sophisticated needs.

PAPI has been implemented on a number of platforms, including Linux/x86 and Linux/IA64. The Linux/x86 implementation requires a kernel patch which provides a driver for the hardware counters. The driver memory maps the counter registers into user space and allows virtualizing the counters on a per-process or per-thread basis. The kernel patch is being proposed for inclusion in the main Linux tree. The PAPI library provides access on Linux platforms not only to the standard set of events mentioned above but also to all the Linux/x86 and Linux/IA64 native events.

PAPI has been installed and is in use, either directly or through incorporation into third-party end-user performance analysis tools, on a number of Linux clusters, including the New Mexico LosLobos cluster and Linux clusters at NCSA and the University of Tennessee being used for the CGrADS project.
Title The Web100 Project: The Global Impact of Eliminating the World Wide Wait
Author Matt Mathis
Author Inst PSC
Presenter Matt Mathis
Abstract The goal of the web100 project ( is to drive the wide deployment of advanced TCP implementations that can support high performance applications without intervention from network wizards. This is accomplished through extensive internal instrumentation of the stack itself, plus TCP autotuning, etc. This will permit simple tools to "autotune" TCP as well as diagnose other components of the overall system: inefficient applications and sick Internet paths. Web100 will bring better observability to these last two classes of problems, leading to long term overall improvements in both applications and the underlying infrastructure. These changes will ultimately bring high performance networking to the scientific masses.

My talk will describe the web100 project in detail, followed by an outline of a number of unresolved research, engineering and deployment problems that are likely to become critical as better TCP implementations are deployed.
Applications Session II: Libraries for Application Performance
Title Performance of the MP_Lite Message-Passing Library on Clusters
Author Dave Turner
Author Inst Ames Laboratory
Presenter Dave Turner
Abstract MP_Lite is a light-weight message-passing library designed to deliver the maximum performance to applications in a portable and user-friendly manner. It supports a subset of the most common MPI commands. MP_Lite can boost the communication rate by 30% over Fast Ethernet and by up to 60% over Gigabit Ethernet compared to other MPI implementations, all while maintaining a much lower latency. This is accomplished by streamlining the flow of the data at all stages, minimizing the memory-to-memory copying that results from buffering messages. An OS-bypass module based on the M-VIA project is being developed that will provide even lower latencies and higher bandwidths. Channel bonding two Fast Ethernet cards per machine in a PC cluster using MP_Lite provides a nearly ideal doubling of the bandwidth while only adding a small amount to the overall cost of a cluster.
Title Cluster Programming with Shared Memory on Disk
Author Sarah E. Anderson, B. Kevin Edgar, David H. Porter & Paul R. Woodward
Author Inst Laboratory for Computational Science & Engineering (LCSE), University of Minnesota
Presenter Sarah E. Anderson
Abstract We will describe an implementation of our PPM (Piecewise-parabolic method) hydrodynamics solver adapted to run efficiently and robustly on a cluster of Linux Itanium systems. The target problem will be computed on a 64 CPU (32 node) cluster at NCSA, and will be the largest such problem we have ever attempted on any system, a computation on a mesh size of 2048 x 2048 x 2048 zones.

The advent of clusters with high performance interconnects offers the opportunity to compute problems in new ways. Most previous cluster implementations have left domain decomposed independent work contexts in memory, sending only an updated "halo" of boundary information to neighboring nodes. On these new interconnect/CPU balanced clusters, our algorithm has enough separation between data production and re-use to completely overlap the read or write of two complete contexts with the update of a third. Any node may update any context, resulting in a coarse-grained load balancing capability. Leaving the contexts resident on disk gives us robustness, with essentially continuous check pointing. In addition, we can stop and restart a computation on any number of nodes. This permits any subset of the cluster to run "out of core", solving problems larger than would fit in even the whole cluster's memory. Further, we implement a high-performance remote I/O capability, giving us the ability to fetch, or write back a context resident on any remote node's local disk.

Our application is based on a small Fortran or C callable library, offering routines to simplify master/worker task queue management & communication. The library also implements asynchronous remote I/O, reading and writing named data objects residing anywhere on the cluster. To the application programmer, it offers the basic interlock-free read/write capabilities of a shared file system without giving up the performance provided by the high-speed interconnect.
Applications Session III: Experience with Discipline-Specific Cluster Construction
Title Building a High Performance Linux Cluster for Large-Scale Geophysical Modeling
Author Hans-Peter Bunge, Mark Dalton
Author Inst Bunge - Princeton University (Dept. of Geosciences), Dalton - Cray Research
Presenter Hans-Peter Bunge
Abstract Geophysical models of seismic sound wave propagation, solid state convection in the Earth's mantle or magneto hydrodynamic simulations of the Earth's magnetic field generation promise great advances in our understanding of the complex processes that govern the Earth's interior. They also rank among some of the most demanding calculations computational physicists can perform today, exceeding the limitations of some of the largest high-performance computational systems by a factor of ten to one hundred. In order to progress, a significant reduction in computational cost is required. Off-the-shelf high-performance Linux clusters allow us to achieve this cost reduction by exploiting price-advantages in mass produced PC hardware. Here we review our experience of building a 140 Processor Pentium-II fully switched 100 BaseT Ethernet Linux Cluster targeted specifically for large-scale geophysical modeling. The system was funded in part by the National Science Foundation and has been in operation at Princeton's Geosciences Department for more than two years. During this time we experienced nearly 100 percent system uptime and have been able to apply the system to a large range of geophysical modeling problems. We will present performance results and will explore some of the scientific results that have been accomplished on the system to date.
Title Aeroacoustic and Turbulent Flow Simulations Using a Cluster of Workstations
Author Anirudh Modi and Lyle N. Long
Author Inst Institute for High Performance Computing Applications (IHPCA) Pennsylvania State University
Presenter Anirudh Modi
Abstract The COst effective COmputing Array (COCOA) and its successor COCOA-2, are a Penn State Department of Aerospace Engineering initiative to bring low cost parallel computing to the departmental level. COCOA is a 50 processor linux cluster of off-the-shelf PCs (dual PII-400s) with 12.5 GB memory connected via fast-ethernet (100 Mbit/sec), whereas COCOA-2 is a 40 processor rackmounted cluster (dual PIII-800s) with 20 GB memory connected via two fast-ethernet adaptors (using Linux channel bonding). Each of the clusters have 100 GB of usable disk space. The PCs are running RedHat Linux 7.0 with MPI (both LAM MPI and MPICH) and C/C++/FORTRAN for parallel programming and DQS for queueing the jobs. COCOA cost us $100,000 (1998 US dollars), whereas COCOA-2 cost us a meagre $60,000 (2001 US dollars). Owing to the simplicity of using such commonly available open-source software and commodity hardware, these clusters are managed effortlessly by the research students themselves. The choice of the hardware suited for our applications, and the problems encountered during the making of the cluster have also been summarized. A wide range of complex and huge Aeroacoustics and turbulent Computational Fluid Dynamics (CFD) problems like unsteady separated flow over helicopters and complex ship geometries are being run on COCOA successfully, making it an ideal platform for debugging and development of our research codes. Several problems utilizing huge amounts of memory (2-5 GBs) have been found to run very well on these clusters even as compared to the traditional supercomputers. MPI performance for these clusters has also been tested thoroughly to enable us to write parallel programs that are ideally suited to such high-latency clustering environments. Some comprehensive benchmarks have also been run on the clusters showing their effectiveness at tackling current computational problems successfully. Several research members are using these clusters very successfully on a daily basis for developing, testing and running their parallel programs, something that required a privileged access to a National Supercomputing Center not so long ago.
Applications Session IV: Application Performance Analyses
Title Benchmarking Production Codes on Beowulf Clusters: the Sissa Case Study
Author Stefano Cozzini
Author Inst INFM, udr Sissa (TS ITALY)
Presenter Stefano Cozzini
Abstract In the Condensed Matter group at Sissa (Trieste) we routinely use a certain number of parallel codes to perform scientific research in computational condensed matter physics. Codes are developed in house and implements all the modern computational techniques. In particular we can currently classify three main classes of parallel applications: (i) First-principles or ab-initio codes where the quantum description of the electronic properties of matter is taken into account; (ii) Classical Molecular Dynamics (MD) codes and (iii) Quantum Monte Carlo (QMC) techniques. Codes belonging to different classes are different also from the computational point of view. In this presentation we want to show the behavior of a representative suite of these three classes of codes on two kinds of Beowulf clusters we have at our disposal. The first machine is an Intel based cluster using Myrinet networks built here at Sissa. The second one is an alpha based cluster connected through QSNET network hosted at the supercomputer center at Cineca (Italy). A careful performance analysis was carried out for all the applications in terms of the computational characteristics of the benchmarking codes. Our measurements on the two clusters spot out a rich and interesting scenario: for instance some of bottlenecks experienced by certain classes of codes on the Intel cluster are easily overcame on the alpha cluster, while for other classes the Intel architecture still has to be preferred in term of performance/price ratio. Performance results on Beowulf clusters are also compared with data obtained on other parallel machine available. The list includes Origin3000, T3E and SP3. At the light of all the results we collected we are now able to draw some indications for future acquisition of computational power.
Title Performance of Tightly Coupled Linux Cluster Simulation using PETSc of Reaction and Transport Processes During Corrosion Pit Initiation
Author Eric Webb, Jay Alameda, William Gropp, Joshua Gray, and Richard Alkire
Author Inst UIUC ChemE, NCSA, ANL
Presenter Joshua Gray
Abstract The most dangerous forms of corrosion occur unexpectedly and often out of sight of ordinary inspection methods. Therefore, remote monitoring is increasingly installed on bridges, tunnels, waste repositories, pipelines and marine structures, for example. Predicting material failure based on field data, however, depends upon accurate understanding of corrosion processes, which involve phenomena that span more than ten orders of magnitude in time and distance scales. Consequently, the situation represents an extreme challenge.

In this work, we consider a seminal moment: the initiation of a small corrosion pit that possesses the ability to grow into a larger pit, and then to form crevices and cracks that may cause catastrophic failure. Improved understanding of mechanisms of pit initiation is needed. Direct laboratory measurement of local events during pit initiation is surprisingly difficult. The complexity of the system creates a situation where experiments seldom can be compared directly with simulation results, not to mention with each other. Multiple scientific hypotheses therefore emerge. Simulations that accurately emulate experimental observations are essential in order to reduce the number of experiments needed to verify scientific understanding of the system, as well as to design corrosion monitoring technology. The role of sulfur-rich inclusions on pit initiation on a surface of stainless steel was simulated numerically as well as investigated experimentally. Mechanistic hypotheses suggested by the experimental data were simulated in numerical models in order to test their validity.

In this work, we describe our experience in solving the mass balance and species transport relationships for an electrically neutral solution, in the form of a system of coupled, non-linear partial differential and algebraic equations. This system was solved numerically with the use of a finite difference method with second-order centered finite differences in two dimensions being used to transform the partial differential equations into a system of non-linear algebraic equations, and second-order forward- and backward-difference approximations were used on the boundaries. A fully implicit scheme was implemented to step forward in time. To solve the resulting system of equations, we used the Portable, Extensible Toolkit for Scientific Computation (PETSc), which provided a robust parallel computing architecture, numerical software libraries and data structures, and sparse matrix storage techniques. In particular, we describe our experience in moving our PETSc-based code from the NCSA Origin Array to the emerging linux cluster architecture, and the resulting code and solver performance that resulted from our work on the linux cluster.

Through this work, we will show that the dual strategies of taking strategic advantage of object oriented nature of C++ to create chemistry and geometry classes, which simplified the task of testing hypotheses, as well as employing PETSc as our solver, allowed a straightforward transition to the linux cluster architecture, while enjoying good code performance to allow simulation of complex electrochemical phenomena. This approach provides an excellent starting point to more detailed computational investigations of electrochemical phenomena in the future.
Applications Session V: Applications on Clusters
Title A Parallel Chemical Reactor Simulation Using Cactus
Author Karen Camarda, Yuan He, and Ken Bishop
Author Inst U. Kansas, Department of Chemical and Petroleum Engineering
Presenter Karen Camarda
Abstract Efficient numerical simulation of chemical reactors is an active field of research, both in industry and academia. As members of the Alliance Chemical Engineering Application Technologies (AT) Team, we have developed a reactor simulation for the partial oxidation of ortho-xylene to phthalic anhydride, an important industrial compound. This problem is of interest to chemical engineers for both theoretical and practical reasons. Theoretically, one is trying to understand the kinetics of the reactions involved. Currently, none of the several kinetic models that have been proposed is able to reproduce both the temperature profile inside the reactor and the spectrum of products observed in industry. Practically, an accurate simulation of this reaction would allow industrial chemical engineers to optimize plant operations, while ensuring a safe working environment.

We perform our simulation using a pseudo-homogeneous, non-isothermal, packed bed catalytic reactor model. The equations solved are a set of diffusive-advective partial differential equations, each with a generation term introduced by the chemical reaction model used. In general, these generation terms couple the equations in a nonlinear way, as is the case with the model we implement.

The discrete equations to be solved are derived by finite-differencing the partial differential equations to obtain the Crank-Nicholson representation, which is then linearized. Because solving the resulting system of equations is computationally expensive, an efficient parallel implementation is desired.

In order to achieve parallelism and portability with a minimum of development effort, we have ported our existing reactor model to the Cactus computational environment. To solve the linear systems that result, we have also ported a solver that uses a conjugate gradient method.

In this paper, we discuss the nature of the equations solved and the algorithm used to solve them. We then present a performance analysis of running this code on the Linux cluster at the National Center for Supercomputing Applications (NCSA).
Title Massively Parallel Visualization on Linux Clusters with Rocketeer Voyager
Author Robert Fiedler & John Norris
Author Inst UIUC, Center for Simulation of Advanced Rockets
Presenter Robert Fiedler
Abstract This paper describes Rocketeer Voyager, a 3-D scientific visualization tool which processes in parallel a series of HDF output dumps from a large scale simulation. Rocketeer reads data defined on many types of grids and displays translucent surfaces and isosurfaces, vectors as glyphs, surface and 3-D meshes, etc. An interactive version is first used to view a few snapshots, orient the image, and save a set of graphical operations that will be applied by the batch tool to the entire set of snapshots. The snapshots may reside on separate or shared file systems. Voyager broadcasts the list of operations, reads the data ranges from all snapshots to determine the color scales, and then processes the snapshots concurrently to produce hundreds of frames for animation in little more time than it would take for the interactive tool to display a single image.

Rocketeer is based on the Visualization Toolkit, which uses OpenGL to exploit hardware graphics acceleration. On a linux cluster, particularly a heterogeneous one, the nodes may have different graphics hardware or none at all. To circumvent defects in images rendered using Mesa, CSAR contracted with Xi Graphics, Inc. to develop an efficient and flawless X-server that runs in a "virtual frame buffer" and installs on all nodes in the cluster with little administrator intervention. Even though the virtual frame buffer does not take advantage of any graphics hardware, individual images are rendered as quickly as they are on all but the fastest graphics workstations.

Systems Administration Track Abstracts:

Systems Session I: Tools for Building Clusters
Title SCE : A Fully Integrated Software Tool for Beowulf Cluster System
Author Putchong Uthayopas, Sugree Phatanapherom, Thara Angskun, Somsak Sriprayoonsakul
Author Inst Parallel Research Group, CONSYL, Department of Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand
Presenter Putchong Uthayopas
Abstract One of the problem with the wide adoption of clusters for mainstream high performance computing is the difficulty in building and managing the system. There are many efforts in solving this problem by building fully automated, integrated stack of software distribution from several well known open source software. The problem is that these set of software comes from many sources. and never been designed to work together as a truly integrated system.

With the experiences and tools developed to build many clusters in our site , we decided to build an integrate software tool that is easy to use for cluster user community. These software tools, called SCE (Scalable Computing Environment), consists of cluster builder tool, complex system management tool (SCMS), scalable real-time monitoring, web base monitoring software(KCAP), parallel unix command, and batch scheduler. These software run on top of our cluster middleware that provides cluster wide process control and many services. MPICH are also included. All tools in SCE are designed to be truly integrated since all of them except MPI and PVM are built by our group. SCE also provides more than 30 APIs to access system resources information, control remote process execution, ensemble management and more. These APIs and the interaction among software components allows user to extends and enhance SCE in many way. SCE is also designed to be very easy to use. Most of the installation and configuration are automated by complete GUI and Web. This paper will discuss the current SCE design, implementation, and experiences. SCE is expected to be available as a developer version in June.
Title Node Abstraction Techniques for Linux Installation
Author Sean Dague & Rich Ferri
Author Inst IBM
Presenter Sean Dague
Abstract Every Linux distribution is different. This is both a strength, as different distributions are better suited for different environments, and a weakness, as someone familiar with one distribution, may be lost when presented with a different distribution. The differences hurt you even more when writing an application for Linux, as a developer may make assumptions that he/she believes are valid for Linux in general, but in reality are only true for certain distributions.

Lui (Linux Utility for cluster Installation) is an open source project whose goal is to be a universal cluster installer for all distributions and architectures. It has existed in the community since April of 2000. In redesigning for the 2.0 release, we spent time researching where Linux distributions diverge in order to build an abstraction layer over the more contentious areas of difference. This presentation will look at the various ways one could abstract these differences, and the path that the Lui project has taken for the 2.0 release. We will investigate abstractions necessary both for different Linux distributions, as well as different system architectures.

This presentation will discuss the data abstractions one can use to support multiple distributions and architectures, and how we used these abstractions in the design of Lui 2.0.
Systems Session II: Production Clusters
Title Setting Up and Running a Production Linux Cluster at Pacific Northwest National Laboratory
Author Gary Skouson, Ryan Braby
Author Inst Pacific Northwest National Laboratory
Presenter Gary Skouson
Abstract With the low price and increasing performance of commodity computer hardware, it is interesting to study the viability of using clusters of relatively inexpensive computers to produce a stable system capable of the current demands for a high performance computer. A 192-processor cluster was installed to test and develop methods that would make the PC cluster a workable alternative to using other commercial systems. By comparing the PC clusters with the cluster systems sold commercially, it became apparent that the tools to manage the PC cluster as a single system were not as robust or as well integrated as in some commercial systems. This presentation will focus on the problems encountered and solutions used to stabilize this cluster for both production and development use. This included the use of extra hardware such as, remote power control units and multi-port adapters to provide remote access to both the system console and system power. A giganet cLan fabric was also used to provide a high speed, low latency interconnect. Software solutions were used for resource management, job scheduling and accounting, parallel filesystems, remote network installation and system monitoring. Although there are still some tools missing for debugging hardware problems, the PC cluster continues to be very stable and useful for our users.
Title High Throughput Linux Clustering at Fermilab
Author Steven C. Timm, L. Giacchetti, D. Skow
Author Inst Fermilab
Presenter Steven C. Timm
Abstract Computing in experimental High Energy Physics is a natural problem to use coarse-grained parallel computing structures. This report will describe the development and management of computing farms at Fermilab. We will describe the hardware and configuration of our 1000-processor cluster, its interface to petabyte-scale network-attached storage, the batch software we have developed, and the cluster management and monitoring tools that we use.
Systems Session III: Accounting / Monitoring
Title SNUPI: A Grid Accounting and Performance System Employing Portal Services and RDBMS Back-End
Author Victor Hazlewood, Ray Bean, Ken Yoshimoto
Author Inst SDSC
Presenter Ray Bean
Abstract SNUPI, the System, Network, Usage and Performance Interface, provides a portal interface to system management, monitoring and accounting data for heterogenous computer systems, including Linux clusters. SNUPI provides data collection tools, recommended RDBMS schema design, and Perl-DBI scripts suitable for portal services to deliver reports at the system, user, job and process level for heterogeneous systems across the enterprise, including Linux clusters. The proposed paper and presentation will describe the background of accounting systems for UNIX, detail process and batch accounting (JOBLOG) capabilities available for Linux, and describe the collection tools, the RDBMS schema and portal scripts that make up the Open Source SNUPI grid accounting and performance system as employed on the Linux cluster and other systems at NPACI/SDSC.
Title Cluster Monitoring at NCSA
Author Tom Roney
Author Inst UIUC/NCSA
Presenter Tom Roney
Abstract System monitoring and performance monitoring at the National Center for Supercomputing Applications have always been treated as separate arenas, due to the nature of their different functions. The function of a system monitor has been that of a virtual system operator, monitoring for events that depict a system fault and also automating the management of such events. The function of a performance monitor has been to collect data from different layers of the system -- hardware, kernel, service, and application layers -- for administrators and users who collect such data for diagnostic purposes in order to finely tune the performance of the different system layers. The collected performance data has never been associated with the system-monitoring framework. However, the legacy of this division of interest dissolves within the new arena of cluster monitoring. This paper explains why this is so, and how the National Center for Supercomputing Applications has merged the arenas of system monitoring and performance monitoring in a cluster environment.
Systems Session IV: Lessons Learned
Title Linux Cluster Security
Author Neil Gorsuch
Author Inst NCSA
Presenter Neil Gorsuch
Abstract Modern Linux clusters are under increasing security threats. This paper will discuss various aspects of cluster security, including: firewalls, ssh, kerberos, stateful and stateless packet filtering, honeypots, intrusion detection systems, penetration testing, and newer kernel security features. Example packet filtering rule sets for various configurations will be detailed along with the rationale behind them.
Title Lessons Learned from Proprietary HPC Cluster Software
Author James B. White III
Author Inst Oak Ridge National Laboratory
Presenter James B. White III
Abstract Production high-performance-computing (HPC) centers have common needs for infrastructure software, but different centers can have very different strategies for meeting these needs. Even within a single center, systems from different vendors often present very different strategies. The open-cluster community has the opportunity to define de facto standards for cluster infrastructure software. To this end, we summarize the basic needs of production HPC centers, and we describe the strategies for meeting these needs used by the Center for Computational Sciences (CCS) at Oak Ridge National Laboratory. The CCS has two large, proprietary clusters, an IBM RS/6000 SP and a Compaq AlphaServer SC. The SP and SC lines have dominated recent large HPC procurements, and they present the most direct competition to high-end open clusters. We describe our experience with the infrastructure software for these proprietary systems, in terms of successes the open-cluster community may want to emulate and mistakes they may want to avoid.
Systems Session V: Something Different
Title Cheap Cycles from the Desktop to the Dedicated Cluster: Combining Opportunistic and Dedicated Scheduling with Condor
Author Derek Wright
Author Inst U. Wisconsin
Presenter Derek Wright
Abstract Science and industry are increasingly reliant on large amounts of computational power. With more CPU cycles, larger problems can be solved and more accurate results obtained. However, large quantities of computational resources have not always been affordable.

For over a decade, the Condor High Throughput Computing System has scavenged otherwise wasted CPU cycles from desktop workstations. These inconsistently available resources have proven to be a significant source of computational power, enabling scientists to solve ever more complex problems. Condor efficiently harnesses existing resources, reducing or eliminating the need to purchase expensive supercomputer time or equivalent hardware.

Dedicated clusters of commodity PC hardware running Linux are becoming widely used as computational resources. The cost to performance ratio for such clusters is unmatched by other platforms. It is now feasible for smaller groups to purchase and maintain their own clusters.

Software for controlling these clusters has, to date, used dedicated scheduling algorithms. These algorithms assume constant availability of resources to compute fixed schedules. Unfortunately, due to hardware or software failures, dedicated resources are not always available over the long-term. Moreover, these dedicated scheduling solutions are only applicable to certain classes of jobs, and can only manage dedicated clusters or large SMP machines.

Condor overcomes these limitations by combining aspects of dedicated and opportunistic scheduling in a single system. This paper will discuss Condor's research into dedicated scheduling, how elements of opportunism are used and integrated, and why this approach provides a better computational resource for the end-user. Our experience managing parallel MPI jobs alongside serial applications will be addressed. Finally, future work in the area of merging dedicated and opportunistic scheduling will be explored.

By using both desktop workstations and dedicated clusters, Condor harnesses all available computational power to enable the best possible science at a low cost.
Title An Innovative Inter-connect for Clustering Linux Systems
Author John M. Levesque
Author Inst Times N Systems
Presenter John M. Levesque
Abstract Times N Systems, a systems company in Austin Texas, has a unique interconnect strategy. Multiple Intel nodes are connected via a large Shared Memory using fiber optics, the hardware runs either W2K or Linux. Individual nodes can access shared memory by directly addressing data beyond their local memory. This collection of hardware, nodes and interconnect is refer to as a team.

The current system has a latency of 2.3 microseconds to shared memory and the next SMN (Shared Memory Node) will see 1/2 this latency. Other enhancements to the current system include 1) increasing Shared Memory from 1Gbyte to 16 Gbytes, 2) increasing the number of nodes from 16 to 128 and 3) increasing the 65 Mbytes/sec bandwidth to 2 Gbytes/Sec. These improvements will be shipped the end of this year.

The Scientific and Technical offering will include ISV supplied Fortran and C compilers and debuggers, MPI and performance tools from TNS. MPI has been optimized to use the SMN for both message passing and synchronization. The Times N System is an excellent I/O performer as well. Disks can be allocated as SMD Disks and striped across all processors in the team. Additionally TNS has ported gfs a Linux global file system to the TNS Team.

The talk will concentrate on the overall architecture of the Times N System, illustrating how popular scientific codes will benefit from the fastest processors connected with a low latency interconnect.