Minnesota Technolog
Institute of TechnologyBoard of PublicationsUniversity of Minnesota
Horizontal Line

Constructing a Computational Colossus

A Visit to the Laboratory for Computational Science and Engineering
by Tom Ruwart and Paul Woodward

Ten years ago, in 1986, the state of the art in high performance computing was defined by 1 GigaFlop/s supercomputer systems, 1 GigaByte 14-inch disk drives with 9 MegaByte per second interfaces, local area networks running at 10 MegaBits per second, and graphics engines capable of displaying images of 1280 x 1024 pixels on a single computer monitor at roughly 1 frame per second. A short 10 years later the fortunate among us are now working with new computer architectures, storage subsystems, high-speed communications technology, and high-performance computer graphics capabilities which make the supercomputers of 1986 look like PCs. In the high-performance computing world of today, clusters of shared memory multi-processors (SMPs) run at speeds in the tens of GigaFlop/s, 3.5-inch disk drives with dual 100 MegaByte per second interfaces hold 8.7 GigaBytes each and can be combined into single disk array subsystems with TeraByte capacity and 500 MegaByte per second transfer rates, local area networks run at 1000 MegaBits per second, and computer graphics systems display images with 3200 x 2400 pixel resolution at 15 frames per second. Researchers at the Laboratory for Computational Science and Engineering (LCSE) have played and continue to play an active role in the development of these technologies as well as applications and system software which exploit them.

The Laboratory for Computational Science and Engineering (LCSE) is a new facility located in the basement of the Electrical Engineering and Computer Science building on the University's East Bank campus. It is directed by Professor Paul R. Woodward, of the Astronomy Department, and by assistant directors Professor Matthew T. O'Keefe, Electrical Engineering, and Thomas M. Ruwart, Astronomy.

horizontal line
The LCSE has a broad mandate to develop innovative high-performance computing technologies and capabilities in computational science and engineering.
horizontal line

Formed in June of 1995, the LCSE provides a facility in which innovative hardware and system software solutions to problems in computational science and engineering can be tested and applied. The LCSE is built around strong and long-standing collaborations between Laboratory researchers and the computer industry. It also has a mandate for outreach to industries which now use or would like to use high performance computing to expand their capabilities. Although the Laboratory builds new combinations of computing elements and experiments with new computing paradigms, the LCSE combines these functions with applications of the new technologies, and thus provides significant high-performance computing re-sources to its users. The LCSE has a broad mandate to develop innovative high-performance computing technologies and capabilities in computational science and engineering.

The LCSE has a unique relationship with several leading-edge computer technology companies which provide the LCSE with state-of-the-art computer hardware and software on a loaner basis or, in some cases, as donations to the University of Minnesota. Loaned equipment is returned at the end of the loan period, typically in 9 to12 months, to be replaced by newer, current equipment. Although the University does not own this loaned equipment, these loans have distinct advantages. The equipment is continually upgraded, and it is maintained by the lending company. The benefit in this arrangement for the sponsoring company is the access which is provided by the LCSE to its environment for collaborative work. Such a collaboration might be a proof-of-concept demonstration of the capabilities of the new equipment for demanding applications from federally funded research projects of LCSE investigators. Those research projects benefit, in turn, from use of the latest equipment in the LCSE. This synergy of government and industry, catalyzed by the LCSE, can result in new research capabilities for the government and in new product concepts for the participating industries. This bringing together of industry- and government-sponsored research in a development lab rather than a production computing facility gives the LCSE its special flavor.

The LCSE has its roots in the research group of its Director, Paul Woodward, and in the associated work in the group of Assistant Director Matt O'Keefe. Research grants from the National Science Foundation (NSF), the Department of Energy (DoE), NASA, and the Office of Naval Research (ONR) provide the core applications which drive developments in the LCSE. A special research project with Silicon Graphics and the Army Research Laboratory which involves Distributed and Virtual Shared Memory technology has resulted in a 12-processor Silicon Graphics Power Challenge machine with 2 Gbyte memory being provided in the lab. Recently, the NSF awarded to the LCSE a MetaCenter Regional Alliance grant which explicitly integrates the LCSE into the NSF Supercomputer Centers Program. In addition to industrial and K-12 outreach components, this MetaCenter Regional Alliance focuses on the new supercomputer architecture represented by clusters of shared memory multiprocessors (SMPs), like the LCSE's two major Silicon Graphics machines. The LCSE is open to suggestions from IT faculty for further collaborative projects which can leverage the lab's resources and enhance its activities. Some representative projects carried out by LCSE researchers in collaboration with the computer industry are described briefly below. These projects began before the founding of the LCSE and provided a major impetus for the formation of this new lab.

The CHALLENGE® Array Project

The CHALLENGE® Array Project was conceived in 1993 out of the need to perform a single Computational Fluid Dynamics (CFD) calculation that was too large to fit on any computer on the planet at that time. The Woodward Research group worked with Silicon Graphics, Inc., to construct an array of 16 CHALLENGE® computer systems each with twenty 100Mhz R4400 processors and 1.75 GigaBytes of memory, for a total of 320 processors and 28 GigaBytes of main memory. The computer systems were then connected together via a 3D torus using a network of 20 FDDI rings.

horizontal line
Lab for Computational Sci & Eng
In an LSCE open house, 37 disk arrays were connected to a single Silicon Graphics machine to test read and write capacities. (Photo by Reed Lauer)
horizontal line

A three-dimensional simulation of homogeneous, compressible fluid turbulence on 1024 x 1024 x 1024 mesh points, or more than one billion volume elements, was spread across this array of computers, and the calculation was set in motion. After five days of continuous computation, the simulation generated more than 500 GigaBytes of data and performed at a rate of more than 4.9 GigaFlop/s. Putting this into perspective, the same CFD code ran at roughly 7 GigaFlop/s on the 512-node Thinking Machines CM-5 at the University, the second largest supercomputer of its kind

Research on the CHALLENGE® Array continues at the LCSE with the Woodward Research Group as part of a project involving the Army Research Lab (ARL) in Aberdeen, Maryland, and Silicon Graphics. A 12-processor POWER CHALLENGE® machine with 2 GigaBytes of main memory, provided for this research project by ARL, is a core part of the LCSE equipment infrastructure. Together with an 8-processor POWER Onyx on extended loan from Silicon Graphics, Inc., a small "array" of shared memory multiprocessor machines (SMPs) can be built and customized to research use in the LCSE. Work on SMP clusters in the LCSE is also directed to-wards support of this new computing paradigm at the National Center for Super-computing Applications (NCSA), through the NSF Meta-Center Regional Alliance program, and at the Department of Energy's Los Ala-mos and Liver-more laboratories, through the DoE Accelerated Strategic Computing Initiative (ASCI).

The M.A.X. Project

In April of 1994, the I/O Systems Research Group from Electrical Engineering led by Professor Matt O'Keefe and Tom Ruwart, of the Woodward Research Group, constructed an experiment to test the Maximum Achievable transfer (Xfer) rate (MAX) of a Silicon Graphics Onyx computer system. For this test, 31 Ciprico disk array controllers were connected to an Onyx at the Army High Performance Computing Research Center (AHPCRC) on 31 independent 20 MegaByte per second SCSI channels. Each disk array was then turned on, one at a time, until all 31 were transferring data. The maximum sustained transfer rate was measured at 509.8 MegaBytes per second. Furthermore, the speedup was linear from 2 to 31 disk array controllers and showed no signs of dropping off.

The M.A.X. project was only one in a number of development projects involving fast I/O subsystems which has been carried out over the years by researchers now at the LCSE. The most recent of these projects involved the demonstration of a TeraByte disk file system at the Supercomputing '95 conference and in the LCSE in December 1995. This time 37 disk arrays, mostly from MTI, a sponsor company based in Chicago, were connected to a single Silicon Graphics machine. Disk reads of 520 MByte/sec and writes of 200 MByte/sec were demonstrated at an LCSE open house on the week before Christmas.

The applications driving these I/O system developments in the LCSE involve the interactive visualization of very large data sets either from computer simulations or from physical experiments. The computer simulations include turbulent convection in the sun (Woodward's group), circulations in the world's oceans (O'Keefe's group), and convection in the earth's mantle (Yuen's group). Observational data sets include the digitized Palomar sky survey, from Roberta Humphreys' Automated Plate Scanner project in the Astronomy Department, and digitized microscopy of the brains of rats and mice, from George Wilcox's Brain Project in the Department of Pharmacology.

The PowerWall

Developments, like those outlined above, in supercomputing power and in the I/O subsystems which store the data from computer simulations have made possible very high resolution simulations and the voluminous data sets which record the results. However, computer graphics display technologies have not kept pace with these trends. In the last decade, high resolution computer monitors have grown from 1280x1024 pixels to only 1600x1200 pixels, an embarrassingly meager advance for so long a time. The PowerWall Project addresses this issue by parallelizing the display of high resolution images and/or animation data across multiple computer systems running multiple graphics engines. These drive multiple rear projection monitors that illuminate a single large screen. The result is a single display with a resolution of 3200x2400 pixels (or more, given enough equipment).

The first PowerWall was demonstrated at the Supercomputing '94 conference in Washington, D.C., in November 1994. The construction and demonstration of that prototype system was a collaboration of the research groups of Woodward and O'Keefe with Silicon Graphics, with support from the AHPCRC, the DoE, NSF, NASA, and ONR. Since that demonstration, Silicon Graphics has included the PowerWall in its product literature, and a number of PowerWalls have been proposed or are in the process of being constructed at various sites around the country. The first of these PowerWalls is now in the LCSE, funded as part of a recent NSF CISE Research Infrastructure grant through the Minnesota Supercomputer Institute (MSI) and the Department of Computer Science. This PowerWall is available to all MSI researchers.

Current Projects

The three projects listed above serve to give a flavor of the work which forms the principal focus of the LCSE. Each involves extending the limits of high performance computation utilizing the combined efforts of University researchers and industry. In the SMP cluster computing project, a government lab, ARL, is also involved. Collaborations of industry, government labs, and the University are increasing in importance at the LCSE because of the effectiveness which can be achieved when all these resources are applied in a coordinated fashion to a technological problem. LCSE researchers have proposals pending for two such programs, the NASA Grand Challenge Applications program and the DoE Accelerated Strategic Computing Initiative. One active current project serves to illustrate this category.

Transcontinental Distributed Computing on Clusters of SMPs

In developing application software for SMP clusters like the Silicon Graphics Power Challenge Array, LCSE researchers have devised methods for restructuring more traditional application codes so that the relatively high latencies and low bandwidths of the SMP cluster network do not substantially reduce code performance. These methods, if aggressively applied, can lead to tolerance of latencies of 1/10 second and network bandwidths of as little as 5 to 10 MByte/sec. These relaxed network requirements could be used to allow the implementation of a very low cost network, or they could be exploited to allow extension of the SMP cluster network over transcontinental distances using ATM OC-3 channels with bandwidths above 5 MByte/sec. This second alternative is of great interest to agencies like the NSF, DoE, and DoD, which must distribute high-performance computing resources among widely separated centers, but which would nevertheless like to combine these resources upon occasion in order to achieve unique computing capabilities. In collaboration with NCSA, the Army Research Laboratory, and Silicon Graphics, the LCSE is working to demonstrate this capability on tightly coupled fluid dynamical simulations.

Summary

The LCSE is engaged in a wide range of research projects. This new lab focuses mainly upon collaborative projects in which government-sponsored basic research at the University of Minnesota can benefit from involvement of industry and the development of new high-performance computing technology at the LCSE. H

If you are interested in becoming involved in the LCSE programs, contact Ms. Julia Sytine at 625-4097.

The Laboratory for Computational Science and Engineering has a broad mandate to develop innovative high performance computing technologies and capabilities in computational science and engineering.

For more information, visit the web site for the Laboratory for Computational Science and Engineering.

horizontal line
| main | issues | subscribe | advertise | contact | links |