SCI-TECH & AGRI

Transferring data at record speed

MOVIES AND popular music are moving from tape to disk, but tape is still on a roll at the San Diego Supercomputer Centre (SDSC) at the University of California, San Diego. The centre's huge, updated tape storage system has illustrated its effectiveness by transferring data at 828 megabytes per second. "Nobody in academia has such a fast data archive system," said Phil Andrews, program director for High-End Computing at SDSC.

Supercomputers are ranked by their raw computational speed, but rapid movement of increasingly large data sets has become vital to researchers in scientific disciplines from anatomy and astronomy, to climatology and particle physics. Data sets in those fields have grown over the past decade, by multiples of 1,000, from megabytes (millions of bytes) to gigabytes, to terabytes. Some data sets are now approaching or exceeding one petabyte (one quadrillion bytes).

Even though tape costs a fraction of disk, supercomputer centres have been investing heavily in more expensive disk storage because its data-transfer rates are higher, by a factor of at least 10, than what had been the best tape systems. At the same time, however, supercomputer centres are struggling to pay for the disk capacity required to keep up with the explosive growth of scientific data.

``While we are expanding our disk storage capacity, we're also adding new higher-density tapes, faster tape drives and more of them, and other technology in a highly tuned, extremely capable data management system,'' said Andrews.

New technology at SDSC involved in the tape-to-disk data transfers includes tape drives from StorageTek{lcub}logicalnot_shy{rcub} (Storage Technology Corp.), switches from Brocade Communications Systems, fibre channel adapters from QLogic, and end-to-end systems and technology from Sun Microsystems.

The combined effect the new technology has been a reduction (frac12) from days to hours in the transfer of multi-terabyte data sets from SDSC's tape-storage system to its IBM Blue Horizon supercomputer, which has a peak speed of 1.7 teraflops (trillion floating point operations per second).

"Our supercomputer centre is dealing with more multi-terabyte data sets such as the 10-terabyte Digital Sky astronomy project and very fast transfers will allow astronomers to make discoveries faster," said Andrews.

"With tape, we have achieved a data-transfer rate that very few people realized was feasible." No industry association keeps track of data-transfer speed records. "As far as the use of tape drives or multiple tape drives is concerned, the data transfer rates of most academic groups have been well under 100 megabytes per second," said John Marshal, program manager at StorageTek.

"The groups that have achieved data transfer rates in the range of 800 megabytes per second have not used tape: they have done it with much larger investments in disk drives, switches, and routers." SDSC is replacing its 20-gigabyte-capacity tape cartridges with 200-gigabyte native capacity tape cartridges manipulated robotically in five silos with a total storage capacity of six petabytes.

The centre also installed 24 StorageTek T9940B tape drives, each of which can transfer data to and from tape at roughly 30-megabytes-per-second throughput without data compression. (Compression of data increases its rate of transmission.) SDSC has measured over 60 megabytes-per-second peak speed with the new tape drives with roughly twofold data compression.

(The actual measured data-transfer rate depends on the degree to which data is compressed, the type of software used, and the capacity of the associated hardware involved in the transfer.)

As part of its data-management system, SDSC has also installed Sun Microsystems' SAM-FS Advanced Storage Management and QFS High Performance Shared File System software to provide maximum scalability, performance, and throughput for the most data-intensive applications. SDSC has also installed Sun's High Performance Computing Storage Area Network (SAN) storage solution.

Such SANs allow multiple computers, using a range of operating systems, to share data seamlessly. The disk capacity attached to SDSC's SAN will be increased from 50 terabytes to 500 terabytes by 2003.

Recommended for you