VLDB Vision

Richard Winter


EMC's Jim Rothnie discusses current trends and influences in storage systems

A New Era in Large-Scale
Storage Architecture

VLDB architects and planners traditionally ignore everything about storage except its capacity and price. Most think of it as "just a bunch of disks" and focus their energy on such matters as DBMS selection and database design.
      But there is a recent trend toward more capable storage systems: systems that can add value in performance, availability, and storage management and systems that can affect the sharability and accessibility of data across the enterprise. It appears that storage architecture is being elevated to a strategic issue in not only VLDB implementation but in the enterprise infrastructure as a whole.
      Feeling that this is a significant trend in itself, and curious about what the industry's leading storage provider plans to do about the immense volumes of data storage that are accumulating in many organizations, I arranged to interview Jim Rothnie. As EMC's senior vice president and chief marketing technical officer, Rothnie directs product planning that influences the storage architectures and products on which many VLDBs will be implemented.
      For the next installment in my series of interviews with the individuals shaping the future of the very large database, I talked with Rothnie about trends and directions in storage architecture and developments at EMC.

Winter: Some respondents to our VLDB Survey indicate that they will be managing 100TB of data within another 18 to 36 months. If current practices continue, that could easily translate into several hundred terabytes of storage. Do you concur with the forecast that data volumes will continue to increase at spectacular rates?

Rothnie: Yes, I do concur that storage system requirements will continue grow at extraordinary rates. That growth rate has been at about 100 percent annually for most of this decade. In the year 2000 alone, users will install more storage capacity than in the entire decade of the 1990s.
      In fact, there are some forces operating that promise to accelerate the growth further--in particular, the rapid shift toward the storage of more unstructured data, such as images for Internet access.
      And we do have many customers today with 10TB of disk storage; we will certainly see customers with more than 100TB of disk in 1998.

Winter: What storage system requirements come with those volumes?
Rothnie: I think they lie mainly in information management, information protection, and information accessibility or sharing.
      Information management becomes difficult because of the huge quantities involved. For example, if you have 100TB of important information, you generally don't want to manage it by making specific allocation decisions concerning each of the 25,000 4GB disk drives that make up your system.
      Protection is another crucial element. To a first approximation, all 100TB of information are important to the owner. So the storage system must be able to keep it all available in the face of equipment failures, natural disasters, software errors, and even planned events like equipment upgrades or data center consolidations.
      Finally, there is accessibility or information sharing. You want all this information to be available to any appropriate application.

Winter: What do you see as the major changes coming in high-end storage technology and architecture over the next few years to address those requirements? Do you expect that storage systems will be significantly different in some way, or will they be incrementally faster, more efficient versions of what we have today?
Rothnie: I do think that there are major changes coming. In effect, such changes are demanded by the volumes of data, its increasing requirements for its availability, and the new ways in which it will be accessed.
      That all leads to a need for an architectural revolution that we refer to as "enterprise storage." We use that term to describe storage deployed as a common pool shared by a group of servers of various types--including mainframes--and numbering potentially in the hundreds. Enterprise storage replaces the bad old way of doing things in which storage was deployed as a peripheral to a single machine, subjecting stored information to increasingly daunting problems in management, protection, and sharing.
      Enterprise storage really represents a back-end network providing both connectivity and highly reliable data persistence to connected servers.

Winter: Do you see a connection between this enterprise storage concept and data warehousing? For example, consider the process of extracting data from operational systems and incorporating it into the warehouse for decision support and analysis.
Rothnie: I think there is a relationship there, but what we have in mind here does not require establishing a particular storage format that is designed for accumulating everything. It's really intended to take data stored in the format of the storing platform and then make that available to other consumers across the enterprise. So it's akin to some of the data warehousing notions, but the method is somewhat different.

Winter: So the concept is that data stored in the format of one operational system, say under DB2 on the mainframe, could actually be accessed in place by other systems.
Rothnie: That is exactly right. We have a product on the market today, called DataReach, which deals with the context in which there is enterprise storage connected to an MVS system, a Unix system, and other platforms. If there is DB2 data stored there supporting some operational system, that data on the enterprise storage platform can be accessed directly by the Unix system.
      Of course, an important application for doing that, as you suggested before, is the loading of a data warehouse that might be used then for analytic purposes to understand the meaning of that operational data.

Winter: Data availability seems to be an area ripe for change. On the one hand, there is a growing intolerance in business for any kind of downtime. On the other hand, it is a big problem to back up and recover the really large databases--the ones that are in the terabytes today and moving into the tens of terabytes. What do you see as the direction in data availability?
Rothnie: Data availability is a critical issue for many IT organizations in the late '90s, and I believe that storage systems can provide the key to getting that problem solved.
      A quiet revolution has been taking place in the most forward-looking of the global 2000 IT organizations for a couple of years now. These companies have been deploying storage-based remote mirroring to ensure large database availability. By far the largest number of these have been deploying an EMC solution called Symmetrix Remote Data Facility. We have installed more than 2,000 licenses to this product since 1995. The idea behind this product is this: Users keep an absolutely current copy of a database at a remote site so that it is available for recovery instantly in the case of an outage at the primary site. This remote copy can also be used to drive tape backup (if you want to do that) without any operational system outage and without any additional load on the application host or network. We employ that method today in our EDM backup system. Other backup vendors are also adapting their software to connect to Symmetrix Remote Data Facility.

Winter: Is this accomplished by sending images of changed data blocks over some sort of long-haul network? And if so, is it practical even for high transaction volume systems?
Rothnie: That's exactly how it is done--and it is used, for example, in high-rate demand deposit accounting systems. There is, of course, a limit to the distance over which that is practical because of speed-of-light considerations. But over a distance of a few hundred miles, it is in fact very common, we have many customers doing that.

Winter: Now the performance of this solution is largely independent of database size; it's a function of transaction volume and size.
Rothnie: That's right. It deals with differentials, not the total size of stored data.

Winter: Do companies employing this approach also mirror data locally?
Rothnie: Yes, they do. They often mirror at both ends of the connection. In addition, they frequently employ another product we have called TimeFinder, which creates still another mirror within a local site. This additional mirror is frequently broken away from the primary mirror to drive some offline process such as data warehouse loading or backup.
      The whole scheme is built around mechanisms in Symmetrix that deal with differential data and deal very smoothly with reestablishing a broken link. Updates that occur during the period when the link was broken are remembered and applied after the link has been reestablished. That kind of very flexible handling of copies is going to be a significant paradigm for many IT operations; it allows operations to be resilient in the face of unplanned outages but also to deal in an effective way with what really accounts for most outages today: planned downtime that occurs because of the need to move operational data into some secondary purpose.

Winter: Speaking of that, another area in which availability requirements seem to be rising rapidly is data warehousing. The scheme you described with remote mirrors, maintaining four or five copies of the data, sounds like it would be more readily applied for mission-critical transaction processing systems. Do you see people going in the same direction for data warehousing?
Rothnie: Yes, we do, because those systems are becoming increasingly fundamental to business operations. Particularly where they overlap with an operational data store they really are mission critical in themselves. The cost of making additional copies of data, even in terabyte-scale systems, is affordable when you compare it with the cost of nonavailability, even for short periods. We have seen, across the whole storage market, price/performance gains of 20 to 30 percent annually. It turns out that the affordability of that storage yields more and more uses involving making more copies. Many customers conclude that additional copies are by far the most cost-effective way of dealing with availability requirements.

Winter: What can you say about EMC's product direction? What's ahead in terms of new architecture and capabilities?
Rothnie: Our products will continue to focus on the enterprise storage model. You will see continuing emphasis on software-implemented value-added capabilities in the areas of information sharing, information protection, and information management. We will apply this particularly to customer environments in which fiber channel networks connect servers to storage. This is a significant change in the storage market that's occurring particularly in 1998 and 1999. We call this environment an enterprise storage network. It will extend the physical distances and the number of connected hosts well beyond what has been possible in the past.

Winter: What do you see as the major implications of those changes for customers?
Rothnie: Well, I think the deployment of the enterprise storage notion will profoundly change the nature of information management.
      We are extending the distances. Today, in the SCSI world, we are limited to 25 meters [between the storage device and the host]. That is changing right now to 500 meters and it will be 10 kilometers within 12 months. Picture a situation in which there are servers all over a campus sharing information on a storage subsystem at a central location. The enterprise can then manage and protect this information much more effectively than if the storage were scattered all over the area.
      In addition, we are greatly increasing the number of connected systems. Today, for practical reasons, you can't connect more than about 30 hosts to an enterprise storage configuration. But with an enterprise storage network, you can connect hundreds of servers. Think about what that means in a corporation's campus setting. You no longer have terabytes of data scattered around departments and buildings. We have many customers who are profoundly concerned. They may be concerned about the IT department's fiduciary responsibility to control the data and protect it, but they have little idea of what it is, where it is, or whether or not it is being protected.

Winter: So if you have hundreds of servers scattered throughout multiple buildings, an enterprise storage network allows you physically to concentrate the associated storage in one physical location. Because all this storage is now in one location, you can protect better physically; you can manage it better, you can more easily maintain copies for data protection onsite and offsite, and so on.
Rothnie: Physical protection, data management, and data protection all exhibit strong economies of scale, both in equipment cost and in management cost.

Winter: Even though the data is physically centralized under this model, is the enterprise storage network designed to deliver I/O performance that is the same as if the devices were local to the servers accessing them? Is it the idea that transfer rates and network switching times are similar to what you would see inside a single computer room?
Rothnie: Yes, and its more than just an architectural principle--it's a reality. Over those distances [10 km], speed-of-light considerations and switching times have a negligible impact on large-scale storage latency.

Winter: To take an example, suppose you had 10TB of data scattered over 100 locations on a campus, each with its own server or servers. And let's suppose you want to move it to a single location, leave all the servers in place, and have them access the data over an enterprise storage network. What is that likely to do to your cost of storing data? Does your overall cost double or triple or what?
Rothnie: The overall cost is probably going to be cut in half--or perhaps to a third of what it did before. Incidentally, there is an excellent International Data Corp. (IDC) white paper that details the various considerations in this kind of decision.

Winter: In addition to enterprise storage networks, are there any other particular milestones or events you see coming in large-scale data storage in 1998?
Rothnie: One important market trend is a wave of consolidation of NT storage. Many companies we deal with have a few hundred NT servers with 20 to 40 GB of online storage each. And they are finding that this is an abysmal arrangement with regard to data management, data protection, data security, and data sharing. And as the data becomes increasingly important to the enterprise, this issue rises in significance.
      We already have a number of forward-looking customers who are connecting all these systems to centrally deployed and managed enterprise storage systems. And I think that will change the dynamics of NT server deployment.
      It reminds me of what happened when Symmetrix was first introduced into the Unix market. At that time, the conventional wisdom was that there would be no demand for Unix storage systems larger than 50GB. As you know, that has now completely changed. In our customer base, the Symmetrix systems connected to Unix servers have--on average--the same capacity as those in mainframe environments. There is no distinction.
      My guess is that you will see the same pattern as NT servers penetrate the enterprise market.
 
Winter's post-interview note: Jim Rothnie has described a sweeping change in the architecture of enterprise computing based on nothing less than an industrywide rethinking of the role of storage. Rather than buying storage for a server, Rothnie sees a world in which companies buy storage for the enterprise. Except as limited for security purposes, the content of every disk drive is available directly, via a fiber channel network, to every one of the enterprise's servers within a 10 kilometer radius. Terabytes of files and databases read and written by hundreds of autonomously operating servers can be, if desired, managed with standard processes for backup, recovery, and sharing. And all this is accomplished at cost a factor of two to three times lower than today's costs through economies of scale in equipment and storage management.
      Clearly, our notions of disk storage system architecture and function--which have been fundamentally the same for about 35 years--are undergoing a major change.

Richard Winter is a specialist in large database technology and implementation and president of Boston-based Winter Corp. You can reach him via email at richard.winter@wintercorp.com or by fax at (617) 338-4499.
 

 

 
search - home - archives - contacts - site index
 

Copyright 1997 Miller Freeman Inc. All Rights Reserved
Redistribution without permission is prohibited.

Questions? Comments? We would love to hear from you!