Big Data, Big Tape

Data volumes are exploding. A recent IEEE Spectrum article explaining "the DNA data deluge" highlights one part of the problem. Cloud computing providers, whether public or private, are already facing these problems. Where to put all (or even some) of these data?

The best answer, in part, is a classic one: tape. Did you think tape storage was dead? Think again.

The basic problem is that gigabytes and terabytes aren't enough to support the flood of data generated in more and more industries. Petabyte-range storage requirements are becoming commonplace — the Large Hadron Collider generates about 30 petabytes per year — and now some organizations are trying to cope with exabytes. What to do?

Tape! One excellent example: the IBM TS3500 Enterprise Tape Library equipped with IBM TS1140 Enterprise Tape Drives. Each TS1140 tape cartridge can hold 4 TB uncompressed. With a maximally configured TS3500 complex that translates into 2.7 exabytes of accessible compressed data — and much more of course if you're willing to store and fetch tapes manually outside the library, or if you get another TS3500 complex.

OK, but isn't tape slow? Yes and no. If you want fast random read-write access, then you'll (also) need other types of storage such as magnetic hard disks, flash memory, and other types of electronic memory. Unfortunately they cost more than tape, at least in the multi-petabyte and exabyte ranges. (Enterprise tape tends to have an initial fixed cost then comparatively low marginal costs, i.e. it has excellent economies of scale. Sound familiar?) The trick is to place data in "tiers" according to business requirements with the most frequently accessed data situated on the more expensive (and more performance-appropriate) tiers. IBM and zEnterprise (particularly z/OS) happen to do that really well.

But tape isn't actually slow for sequentially accessed data. In fact, it's rather high performance. And a lot of the world's data are like that, trickling or flooding in as "streams." There are also ways to make sense of such data in sequential-friendly ways. IBM's Scalable Architecture for Financial Reporting (SAFR) is an excellent example.

Tape also happens to be very energy-efficient. Cartridges consume no power themselves, and they only expect a reasonable climate when they are space-efficiently stored.

Is tape technology advancing? Yes, certainly. IBM previewed the next two generations (page 5) to follow the TS1140. According to IBM, the next generation should double (or more) the uncompressed cartridge capacity, and the generation after that should do much the same. Data transfer rates should increase by 44% to 50% with each generation. ESCON will fade into the sunset, with applause, and FCoE connectivity will make an appearance.

IBM pioneered tape storage beginning with the IBM 726 in 1952 with its breakthrough vacuum column loading system. Tape is the oldest form of digital storage still in widespread use. The fundamental challenge hasn't changed much. Back in the early 1950s the U.S. Social Security Administration faced big data challenges, and tape came to the rescue, displacing 80-column card storage. Tape is still the best approach for sequential storage and archiving of massive amounts of data, and its future is bright.

by Timothy Sipples June 28, 2013 in Future


TrackBack URL for this entry:

Listed below are links to weblogs that reference Big Data, Big Tape:


The comments to this entry are closed.

The postings on this site are our own and don’t necessarily represent the positions, strategies or opinions of our employers.
© Copyright 2005 the respective authors of the Mainframe Weblog.