
With the 1998 VLDB Survey Program results in, we can now report a historic milestone. For the first time, the world's largest in-production, commercial database--a 16.8TB, DB2-based system at United Parcel Service (UPS)--is running on an off-the-shelf relational DBMS.
For those of you who have watched the relational saga unfold, this moment is a truly dramatic one. When E. F. Codd published his revolutionary paper in 1970, the relational concept was a radical idea that few people thought would actually work. Indeed, many professionals of the day would have scoffed at the thought of the world's largest databases running on relational engines.
Twenty-eight years after Codd's seminal paper, relational database engines dominate the entire database scene, including the very largest systems. Of course, relational database vendors can't rest on their laurels: object-oriented technology is knocking on the door. But for this year at least, we can say that relational database management more or less rules the VLDB industry.
There is also a historic "nonmilestone" in the '98 program. For the second year in a row, mainframes show no sign of losing ground in the VLDB marketplace. Not only are mainframes far from dead, but they don't even look endangered. For high-end decision support, Unix is the leading choice. But according to VLDB Survey Program data, mainframes never owned this market in the first place. Mainframes continue to dominate for large-scale, mission-critical OLTP.
For the second year in a row, we nominate NT as the biggest VLDB phenomenon that didn't happen. Although no one forecasted terabyte-sized NT databases for early '98, over the past three years, the drumbeat of NT database scalability has been growing steadily louder. We're ready to create separate categories and awards in the program for NT, but thus far, there has been no reason to do so. In fact, the '98 program garnered only one qualifying NT response, although the entry was impressive: 542GB of data in Microsoft SQL Server running on a DG AViiON server. This may be a harbinger of things to come, but we're not there yet.
Meanwhile, just like last year, the databases are a whole lot bigger. (We hope that the vendors will quickly be able to make their engines smarter, because we're going to need all the optimization they can deliver.) The biggest ones more than doubled in size over the last 12 months. And for the first time, we're into double-digit terabytes for raw data; some organization may even reach the 25TB mark by year's end.
With companies expanding because of strong financial performances, favorable economic conditions, mergers, and acquisitions, VLDBs have become de rigeur for many corporations. Now more than ever, the need to better understand, plan for, implement, and manage the systems that transform raw data into savvy business intelligence is a top priority. Regardless of whether the company is new, mature, or well established, the corporate database is among the most prized components of the enterprise. The Winter VLDB Survey Program, which assists companies in improving business results and reducing risks, speaks directly to this compelling business need.
The VLDB Survey Program, now in its fourth year, assesses the methodologies, product choices, and practices of the world's large database installations. The survey, which is cosponsored by Database Programming & Design, is a central component of the Winter Research and Recognition Program, a research service devoted to tracking trends and issues in large database technology exclusively. Program reports and publications feature fact-based research about using large-scale data resources to implement successful and timely business strategies. The VLDB Survey Program has these objectives:
Build a reservoir of actionable knowledge about very large databases
Assist VLDB professionals in raising organizational productivity, managing risk, and reaching strategic objectives
Identify the largest databases in the world and celebrate those responsible for them.
Interest in the survey program has grown steadily and spread to publications and industries outside of computing. Articles in Computerworld, Information Week, PC Week, Chicago Software News, and Bank Technology News cited the 1997 survey program, and it was even covered on television.
Preparation for the 1998 program began in the fall of 1997, four months before the launch. Winter Corp. drafted the 1998 questionnaire and circulated it for review among staff members, outside consultants, and representatives of the leading database, storage, and system vendors. The final survey contained a demographic section and 12 multipart questions on topics such as DBMS choice, hardware and storage environments, database size and usage, database and system architectures, growth patterns, and workload activity.
The data collection campaign was launched in December 1997 and completed in February 1998. One of its objectives was to distribute the questionnaire as widely as possible. As with each preceding campaign, Winter Corp. increased the number and variety of distribution techniques. Program information was mailed to former participants, conference attendees, and database professionals listed in commercial mailing directories, followed by telemarketing calls to select recipients. The survey program was also listed in The Data Administrator's Newsletter (TDAN), an Internet-based publication for corporate data managers. One of the high points in publicizing the campaign was the article on the program that appeared in Information Week in early February.
This year, the Internet played a leading part in distributing the questionnaire and collecting the data. Seventy percent of participants either downloaded the survey electronically or printed it out locally. Not surprisingly, the Internet was the primary conduit for the international sites, which represented 14 countries and contributed more than 30 percent of the surveys received this year. In a typical week, there were about 4,300 hits to the Winter Corp. Web site, 15 to 20 percent of which originated abroad.
The 1998 program used three metrics to identify the leading installations:
Most data, including user data, summaries, aggregates, and indexes, but excluding freespace and redundancy. You may see articles about databases that are larger than those in the survey program, but their figures will include disk space allotted to freespace and redundancy. The program assesses size based only on the dimensions of usable database components.
Most rows, records, or objects.
Peak online activity. For OLTP systems, this metric represents the highest number of transactions per second (TPS). For decision-support systems, the criterion applies to the most concurrent, online (not batch), in-flight user queries, reports, and updates.
Separate awards were given for decision-support and transaction processing systems, and Unix environments were distinguished from traditional operating platforms. This year, Winter Corp. also differentiated systems by database architecture. Participants were asked to characterize their system as either centralized, distributed, or federated. A federated database is implemented on multiple engines that operate autonomously but presents a single, integrated image of the database to users and applications. Because federated systems differ significantly from conventional databases, in instances where the Grand Prize (first place) winner is a federated system, Winter Corp. also announced a nonfederated Grand Prize winner. Based on these definitions, the '98 program had 12 database categories and 17 Grand Prize winners. The awards ceremony occurred in March at the VLDB Summit in Beverly Hills, Calif.
Most Data and Most Rows In a Transaction processing System, All Environments, Federated Architecture
Sheer volume is the metric that captures the most imaginations. Accordingly, by far the largest database of any entrant in the '98 program is the one run by UPS. This unrivaled system weighs in at 16.8TB--11.3TB of data and 5.5TB of indexes. (Here's another way of describing how big the database is at UPS: Each DBA is responsible for supporting 216 billion records.) Some of you will recall the twin UPS Grand Prize winners in this category last year; UPS has combined these two databases and three others into a single federated system called the Package Level Detail Repository (PLDR). Richard Bader, UPS database analyst, submitted the survey.
The PLDR "virtual" database comprises five distinct nodes and functions: pickup information, internal scanning data, package delivery details, premium delivery service, and a repository for service exceptions. Two pieces of software, both developed by UPS, provide an integrated view of the database: Internal users access the Online Custom Automation System program while external customers use the Full Visibility Tracking system. Anyone who has tracked a package through the UPS package center network will confirm the merits of this remarkable system.
PLDR summaries and aggregates are housed in a separate data warehouse. The database is implemented in IBM's DB2 and hosted on a five-node cluster of SMPs, including multiple IBM S/390 machines and a Hitachi Data Systems 427. There are 45 processors in the total configuration. IBM RAMAC and EMC 5500 and 5430 devices provide storage.
Also contributing to PLDR's expansion is its steady growth rate over the course of 1997. What's more, UPS reports that the system is still growing; Bader projects that the system will swell by 20 percent next year.
Table 1 shows that many of the other winners in this category are repeat winners from 1997. Telstra, Experian, and IZB Software--all well-known names from past programs--again finish among the leaders.
| Rank | Organization | DBMS | Processor | Architecture | Storage | Data (GB) |
|---|---|---|---|---|---|---|
| 1 | UPS (PLDR; federated) | DB2 | IBM S/390, Hitachi Skyline | SMP | EMC, IBM | 11,090 |
| 1 | Telstra | DB2 | Hitachi Skyline | Cluster | HDS, EMC, IBM | 4,350 |
| 2 | Experian (credit reporting) | DB2 | Hitachi Skyline 62 | SMP | EMC | 2,739 |
| 3 | SK Telecom | DB2 | IBM S/390 | MPP | HDS, IBM | 1,955 |
| 4 | IZB Software | Adabas | Amdahl 5995 | SMP | EMC, Comparex | 1,310 |
| 5 | Caixa Economica Federal | CA-IDMS | IBM S/390 | SMP | EMC, HDS, IBM | 661 |
| 6 | Victoria's Secret Catalogue | DB2 | Amdahl | SMP | IBM, EMC | 660 |
| 7 | DBS Systems Corp. | Oracle RDB | Digital VAX7830 | MPP | DEC | 631 |
| 8 | Mitsukoshi Information Service Co. | Teradata | NCR 5100 | MPP | EMC | 600 |
| 9 | CentreLink | CCA Model 204 | IBM S/390 | Cluster | IBM, EMC | 582 |
| 10 | Metromail Corp. | Oracle | Sequent Symmetry 5000 | SMP | EMC | 405 |
| TABLE 1. Database size, all environments, transaction processing systems. | ||||||
UPS also gained a second Grand Prize for most rows in a transaction processing system, any environment. PLDR tops the field with a whopping 324 billion rows of data, a figure more than six times that of the next contender. Telstra walked off with a Grand Prize for most rows in a centralized database, while the second place citation was again awarded to Experian.
Most Data and Most Rows in a Transaction processing System, All Environments
A champion in its own right, Telstra also earned a pair of Grand Prizes. The Melbourne, Australia telecommunications company doubled its laurels from 1997. Michel Antoine, manager of capacity planning and tuning, submitted the survey.
The Telstra installation is a four-year-old customer billing system. Its 4.35TB of data is a hefty 35 percent increase from the 3.2TB reported last year. Telstra's second Grand Prize was for most rows--an impressive 51 billion. In this database, a single online table 1.2TB in size contains a copy of every customer invoice issued over the past three years. Thus the Telstra database contains a single table that's a bona fide VLDB in its own right.
IBM DB2 is the mainstay DBMS at Telstra; it's distributed among a three-node cluster of Hitachi Data Systems Skyline and IBM S/390 machines. There are 20 processors in the cluster and a total of 13.5TB of memory involved. Hitachi, along with IBM and EMC, provide storage devices for the system. In peak times, the system can process 300TPS.
Table 1 and Table 2 illustrate the extraordinary strength and staying power of IBM and mainframe-class products in transaction processing applications. At the core of many Global 1000 companies are tried-and-true mainframe solutions. DB2, Adabas, CA-IDMS, Model 204, and others are performing the daily operations of many of the world's most sophisticated and successful companies.
| Rank | Organization | DBMS | Processor | Architecture | Storage | Rows/Records (m) |
|---|---|---|---|---|---|---|
| 1 | UPS (PLDR; federated) | DB2 | IBM S/390, Hitachi Skyline | SMP | EMC,IBM | 324,000 |
| 1 | Telstra | DB2 | Hitachi Skyline | Cluster | HDS, EMC, IBM | 51,000 |
| 2 | Experian (credit reporting) | DB2 | Hitachi Skyline 62 | SMP | EMC | 15,018 |
| 3 | SK Telecom | DB2 | IBM S/390 | MPP | HDS, IBM | 5,870 |
| 4 | Caixa Economica Federal | CA-IDMS | IBM S/390 | SMP | EMC, HDS, IBM | 5,000 |
| 5 | Deere & Co. | DB2 | IBM RS/6000 | SMP | IBM | 2,503 |
| 6 | Metromail Corp. | Oracle | Sequent Symmetry 5000 | SMP | EMC | 2,500 |
| 7 | IZB Software | Adabas | Amdahl 5995 | SMP | EMC, Comparex | 1,900 |
| 8 | DBS Systems Corp. | Oracle RDB | Digital VAX7830 | MPP | DEC | 915 |
| 9 | Victoria's Secret Catalogue | DB2 | Amdahl | SMP | IBM, EMC | 800 |
| 10 | CentreLink | CCA Model 204 | IBM S/390 | Cluster | IBM, EMC | 734 |
| TABLE 2. Most rows, all environments, transaction processing systems. | ||||||
Leading all Unix-based, transaction processing systems for honors in database size is Mitsukoski Information Service Co. Ltd. of Tokyo. This 600GB database is implemented in NCR Teradata and hosted on a WorldMark 5100. The system configuration contains four nodes and 32 processors; storage is provided by two EMC disk drives. Seiichiro Honda, manager of analyst relations for NCR Japan, submitted the survey on behalf of the company.
Mitsukoshi Information Service is the financial arm of The Mitsukoshi, one of the premier department stores in Japan. For fiscal year 1997, The Mitsukoshi reported annual revenues of $5.8 billion, making it the second-highest grossing department store in that country. The Mitsukoshi database comprises 100GB of user data, 400GB of summaries and aggregates, and 100GB of indexes. It contains 400 million rows/records of data. 1997 was a banner year for the database, which doubled in size in 12 months. Mitsukoshi predicts an even greater increase for 1998, forecasting that the database will triple in size by the end of the year.
Although the system performs a mixed workload of transaction processing and decision support, its primary use is transaction processing. On average, it processes 100 TPS, a figure that more than doubles to 220 TPS during heightened activity.
Table 3 shows the remaining winners in this category. Notice that a unique combination of DBMS and system hardware supports each winner. At this time, no specific vendor of these components has established dominance in Unix-based transaction processing activity. However, in terms of storage, EMC is clearly the leader.
| Rank | Organization | DBMS | Processor | Architecture | Storage | Data (GB) |
|---|---|---|---|---|---|---|
| 1 | Mitsukoshi Service Co. | Teradata | NCR 5100 | MPP | EMC | 600 |
| 2 | Metromail Corporation | Oracle | Sequent Symmetry 5000 | SMP | EMC | 405 |
| 3 | Deere & Co. | DB2 | IBM RS/6000 | SMP | IBM | 356 |
| 4 | Chase Manhattan Bank | Informix-Dynamic Server | HP 9000 | SMP | EMC | 175 |
| TABLE 3. Database size, Unix only, transaction processing systems. | ||||||
A new face in the crowd this year is Deere & Co., the winner for rows in a transaction processing, Unix-only environment. Deere & Co. was awarded a Grand Prize for having exactly 2,502,739,507 rows in its database. The system is a hybrid comprising primarily DB2 with some additional Oracle and SQL Server components. Martin Spratt, project manager at the company, submitted the winning survey.
As a federated database, Deere & Co. uses IBM's DataJoiner software to provide an integrated view of the various database modules. DataJoiner does not contain the actual data. Instead, it serves as a large, intelligent metadata catalog for globally distributed physical tables and indexes.
The Deere & Co. installation runs on a two-node cluster of IBM RS/6000 machines and plans to move to four-way SMP capability. Big Blue disk drives also provide most of the storage capacity. In the year and a half since it went into production, the system has ballooned from 7,000 transactions per week to its current range of 30,000 to 45,000 transactions per week.
Spratt reports that sometime next year he expects to add IMS and VSAM databases, now only part of the development environment, to the federation. He characterizes the future configuration as a "virtual database geoplex" that will be nearly twice the size of the current database.
Table 4 offers clear evidence of the steadily expanding dimensions of VLDBs. Last year in this category, The Handleman Co. took the Grand Prize for a database containing 1,300 million rows. This year, Deere & Co. and the second-place finisher, Metromail Corp., almost double that mark with 2,503 and 2,500 million rows, respectively.
| Rank | Organization | DBMS | Processor | Architecture | Storage | Rows/Records (m) |
|---|---|---|---|---|---|---|
| 1 | Deere & Co. | DB2 | IBM RS/6000 | SMP | IBM | 2,503 |
| 2 | Metromail Corp. | Oracle | Sequent Symmetry 5000 | SMP | EMC | 2,500 |
| 3 | Mitsukoshi Information Service Co. | Teradata | NCR 5100 | MPP | EMC | 400 |
| 4 | Chase Manhattan Bank | Informix-Dynamic Server | HP 9000 | SMP | EMC | 344 |
| TABLE 4. Most rows, Unix only, transaction processing systems. | ||||||
Peak Online Activity in a Transaction processing System, All Environments
In the category of peak online workload in all environments, the Grand Prize goes to Roadway Express. Roadway is a seasoned VLDB Survey Program participant whose database proved no match for the other challengers this year. This mixed-usage system primarily performs OLTP and can process 1,820 TPS. In fact, even under average conditions, Roadway Express executes 650 TPS, a figure that would also have captured first place for the company. Kevin Carracher, manager of development support services, supplied the winning survey with assistance from Chris Orlowski, a consultant with Caliber Technology Inc.
The Roadway Express database is a shipment management system that has been in production for 11 years. The system uses CCA's Model 204 on an IBM S/390 platform with seven processors. Most of the data by far--133GB--is kept at the detail level, with 1.4GB of summaries and aggregates and just 25GB of indexes. The database contains 336 million rows of data. IBM and Hitachi Data Systems provide the storage.
If you want to know where the frenzied OLTP workload activity is taking place, Table 5 reveals the industries on the edge of the envelope. Three of the top five companies--Telstra, Pacific Telecom, and SK Telecom--are telecommunications organizations. Government and banking/financial services are also well represented among the leaders.
| Rank | Organization | DBMS | Processor | Architecture | Storage | TPS |
|---|---|---|---|---|---|---|
| 1 | Roadway Express Inc. | CCA Model 204 | IBM S/390 | SMP | IBM, HDS | 1,820 |
| 2 | Telstra | DB2 | Hitachi Skyline | Cluster | HDS, EMC, IBM | 300 |
| 2 | Pacific Telecom Inc. | CA-IDMS | Amdahl 5995 M | SMP | IBM, Spectris, EMC | 300 |
| 3 | CentreLink | CCA Model 204 | IBM S/390 | Cluster | IBM, EMC | 272 |
| 4 | SK Telecom | DB2 | IBM S/390 | MPP | HDS, IBM | 250 |
| 5 | UPS (PLDR) | DB2 | IBM S/390, Hitachi Skyline | SMP | EMC, IBM | 220 |
| 6 | Caixa Economica Federal | CA-IDMS | IBM S/390 | SMP | EMC, HDS, IBM | 205 |
| 7 | Progressive Corp. | CA-IDMS | IBM S/390 | SMP | EMC | 140 |
| 8 | IZB Software | Adabas | Amdahl 5995 | SMP | EMC, Comparex | 135 |
| 9 | DBS Systems Corp. | Oracle RDB | Digital VAX7830 | MPP | DEC | 110 |
| 10 | Metromail Corp. | Oracle | Sequent Symmetry 5000 | SMP | EMX | 92 |
| TABLE 5. Peak online activity, all environments, transaction processing systems. | ||||||
The next winner is a veteran program participant, but 1998 marks its first appearance in the Grand Prize winner's circle. We are proud to bestow not just one, but two Grand Prizes on Metromail Corp. The Metromail database, a centralized transaction processing system, achieves distinction in two categories: most rows or records and highest online workload for a Unix-based system. Brian Foreman, DBA for the company, submitted the winning survey.
The Metromail database contains names, addresses, phone numbers, and other public information about more than 100 million U.S. households. Commercial organizations use the data for direct marketing purposes and nonprofit organizations use it for fund-raising activities. Metromail uses Oracle aboard Sequent Symmetry 5000 servers, with EMC 5500 devices providing storage. The database contains 179GB of user data and 226GB of indexes. There are no summaries or aggregates because the applications running against the system require data at the detail level (actual names, addresses, and so on).
Metromail captures its first Grand Prize for the 2.5 billion rows of data in the database. In capturing the crown, the company's achievement can be traced to growth of the database, which nearly doubled in size over the past year. Metromail earns a second Grand Prize award for average transaction processing speed, 26 TPS, but peaks at more than three times that speed, 92 TPS.
Table 6 shows the other winners in this category. This list illustrates the diversity in row design among large databases. Furthermore, number of rows does not necessarily correlate to database size. Notice that the Metromail database, which is two-thirds the size of the Mitsukoshi installation, contains eight times as many rows.
| Rank | Organization | DBMS | Processor | Architecture | Storage | TPS |
|---|---|---|---|---|---|---|
| 1 | Metromail Corp. | Oracle | Sequent Symmetry 5000 | SMP | EMC | 92 |
| 2 | Chase Manhattan Bank | Informix-Dynamic Server | HP 9000 | SMP | EMC | 25 |
| 3 | Mitsukoshi Information Service Co. | Teradata | NCR 5100 | MPP | EMC | 15 |
| 4 | Deere & Co. | DB2 | IBM RS/6000 | SMP | IBM | 11 |
| TABLE 6. Peak online activity, Unix only, transaction processing systems. | ||||||
Now under a new name, the next blue ribbon company repeats as a double Grand Prize winner. The Dialog Corp., formerly Knight Ridder Information, achieves distinction in two categories. In the realm of decision support in any environment, this federated installation led all participants in database size and most rows. Shelley Giles, programmer analyst, entered the winning questionnaire.
Weighing in at an imposing 6.3TB, the Dialog system is a commercial information retrieval and document delivery service that draws information from many different types of data--bibliographic, company directory, patent, newspaper, trademark, chemical, and more. Over the past 12 months, 50 billion new rows of data were added to the system to reach the 150 billion-row milestone.
One of the Dialog system's unique characteristics is that it uses a proprietary database management system that has evolved significantly during its 26 years in operation. The DBMS runs on an SMP system comprising a seven-processor Hitachi Data Systems GX8724 box plus four uniprocessor Sun SPARC servers. Three primary operating systems support the system: VM/CMS for online retrieval, MVS for file updating, and Unix for user access to the file servers. Storage is provided by a medley of devices: EMC, Hitachi, IBM, and Sun disk devices for DASD and Kubic Multi CD-ROM for offline storage.
Table 7 underscores how Unix is the preferred platform for large decision-support installations. When we assess the sheer amount of data, the Dialog mainframe-based system is the largest DSS site. However, it is a federated system. In terms of centralized or distributed DSS installations, the four largest and seven of the top 10 run on Unix platforms. Unquestionably, Unix is the operating environment of choice for meeting high-end decision-support requirements.
| Rank | Organization | DBMS | Processor | Architecture | Storage | Data (GB) |
|---|---|---|---|---|---|---|
| 1 | The Dialog Corp.(federated) | Proprietary | Hitachi GX8724,Sun SPARC | SMP | EMC, HDS, IBM,Sun, Kubic | 6,300 |
| 1 | Sears (SPRS) | Teradata | NCR 5100 | MPP | EMC | 4,630 |
| 2 | HCIA | Informix-Dynamic Server | Sun 6000 | SMP | Seagate,Quantum | 4,500 |
| 3 | Wal-Mart Stores Inc. | Teradata | NCR 5100 | MPP | Seagate | 4,422 |
| 4 | Tele Danmark A/S | DB2 | IBM RS/6000 | MPP | IBM | 2,840 |
| 5 | CitiCorp | DB2 | IBM SP | MPP | IBM | 2,468 |
| 6 | MCI (database marketing) | Informix-Dynamic Server EP | IBM SP | MPP | IBM | 1,884 |
| 7 | NDC Health Information Services | Oracle | Sequent NUMA-Q 2000 | NUMA | EMC, DG Clariion | 1,850 |
| 8 | Dayton Hudson Corp. | NonStop SQL | Tandem Himalaya | MPP | Tandem | 1,315 |
| 9 | Sprint | Teradata | NCR 5100 | MPP | NCR | 1,300 |
| 10 | Ford Motor Co. | Oracle | Sequent NUMA-Q 2000 | NUMA | EMC | 1,200 |
| TABLE 7. Database size, all environments, decision-support systems. | ||||||
Table 8 provides even more evidence that VLDB dimensions are expanding rapidly. Figure 1 compares the figures in this category between the 1997 and 1998 programs. To eliminate any aberrations, we'll disregard the largest and the smallest site from each top 10 list. Last year, the second largest site had 20 billion rows, the number nine winner had 6 billion, and the average for the top 10 sites was 16.6 billion. This year, the second largest site has 50 billion rows, the number nine winner has more than 9 billion, and the average row count for the category is 19.7 billion.
| Rank | Organization | DBMS | Processor | Architecture | Storage | Rows/Records (m) |
|---|---|---|---|---|---|---|
| 1 | The Dialog Corp.(federated) | Proprietary | Hitachi GX8724, Sun SPARC | SMP | EMC,HDS, IBM, Sun, Kubic | 150,000 |
| 1 | Wal-Mart Stores Inc. | Teradata | NCR 5100 | MPP | Seagate | 50,000 |
| 2 | Sears (SPRS) | Teradata | NCR 5100 | MPP | EMC | 33,000 |
| 3 | Dayton Hudson Corp. | NonStop SQL | Tandem Himalaya | MPP | Tandem | 24,000 |
| 4 | MCI (database marketing) | Informix-Dynamic Server EP | IBM SP | MPP | IBM | 16,345 |
| 5 | Catalina Marketing Corp. | Red Brick | Digital Alpha 8400 | SMP | EMC, MTI | 15,277 |
| 6 | Tele Danmark A/S | DB2 | IBM RS/6000 | MPP | IBM | 10,100 |
| 7 | HCIA | Informix-Dynamic Server | Sun 6000 - 1000 | SMP | Seagate, Quantum | 10,000 |
| 8 | CitiCorp | DB2 | IBM SP | MPP | IBM | 9,744 |
| 9 | VarTec Telecom Inc. | MS SQL Server | DG AViiON 3600 | SMP | DG Clariion | 9,600 |
| 10 | Sears (data warehouse) | Informix-Dynamic Server EP | IBM SP-SMP | MPP | IBM | 8,229 |
| TABLE 8. Most rows, all environments, decision-support systems. | ||||||
Most Data and Most Rows in a Decision-Support System, All Environments
Winter Corp. was pleased to confer another two Grand Prize awards on Sears, Roebuck and Co. Sears outpaces all centralized or distributed in database size in two categories: all environments and Unix environments only. Jean Brizzolara, systems manager for Sears, submitted this survey.
The Sears system is known as the Strategic Performance Reporting System (SPRS). Designed for decision support, SPRS is the single authoritative source for the company for merchandising information such as sales, inventory, and margin analysis.
Sears received two Grand Prizes for the amount of data in SPRS, 4.63TB. This figure breaks down into 4.3TB of user data and 330GB of summaries and aggregates. Within SPRS, a single table contains 40 percent of the data; it contains weekly inventory information down to the SKU level for each Sears store, distribution center, and warehouse! Not only is the overall size of the database extraordinary, but one table, on its own, contains nearly 2TB of data. Counting the disk allotted for freespace and redundancy, the database approaches the 10TB mark, more than double the size of the data alone.
SPRS is implemented in Teradata and runs on a 48-node NCR WorldMark 5100M system with 384 processors. EMC provides storage for the system.
Table 9 shows how the top participants in this category provided the closest competition in the '98 program. In amount of data, Sears, HCIA, and Wal-Mart differed by only 2 to 3 percent.
| Rank | Organization | DBMS | Processor | Architecture | Storage | Data (GB) |
|---|---|---|---|---|---|---|
| 1 | Sears (SPRS) | Teradata | NCR 5100 | MPP | EMC | 4,630 |
| 2 | HCIA | Informix-Dynamic Server | Sun 6000 | SMP | Seagate, Quantum | 4,500 |
| 3 | Wal-Mart Stores Inc. | Teradata | NCR 5100 | MPP | Seagate | 4,422 |
| 4 | Tele Danmark A/S | DB2 | IBM RS/6000 | MPP | IBM | 2,840 |
| 5 | CitiCorp | DB2 | IBM SP | MPP | IBM | 2,468 |
| 6 | MCI (database marketing) | Informix-Dynamic Server EP | IBM SP | MPP | IBM | 1,884 |
| 7 | NDC Health Information Services | Oracle | Sequent NUMA-Q 2000 | NUMA | EMC, DG Clariion | 1,850 |
| 8 | Sprint | Teradata | NCR 5100 | MPP | NCR | 1,300 |
| 9 | Ford Motor Co. | Oracle | Sequent NUMA-Q | NUMA | EMC | 1,200 |
| 10 | Acxiom Corp | Oracle | Digital 8400 | Cluster | DEC | 1,125 |
| TABLE 9. Database size, Unix only, decision-support systems. | ||||||
Another repeat winner from last year is Wal-Mart Stores Inc. The company captured top honors in double categories: most rows in a decision-support system (centralized or distributed) in all environments as well as in Unix environments only. This mixed-purpose system is a merchandising data warehouse implemented in Teradata and supported by a 96-node NCR WorldMark system. Seagate Barracuda drives provide more than 16TB of DASD. Randy Salley, director of IS, entered the survey.
Wal-Mart outpaces all other entries by reporting a colossal 50 billion rows of data in the system. This figure represents a 150 percent explosion from 20 billion in 1997. This astonishing growth corresponds with a gigantic increase in database size. Over the last 12 months, the database nearly doubled, leapfrogging from 2.4TB in early 1997 to a prodigious 4.2TB a year later. What's more, Wal-Mart reports that the database is undergoing voracious growth and projects another 50 percent gain by 1999.
Table 10 shows the leaders in this category. The companies on this list characterize the types of industries making decision-support activities an integral part of their operations. At these sites users are using their systems to compose customer profiles, track sales incentives, understand and anticipate buying patterns, hone customer service techniques, and so on. Three out of the top four companies and half of those in the top 10 are retail businesses; the telecommunications industry is also well represented.
| Rank | Organization | DBMS | Processor | Architecture | Storage | Rows/Records (m) |
|---|---|---|---|---|---|---|
| 1 | Wal-Mart Stores Inc. | Teradata | NCR 5100 | MPP | Seagate | 50,000 |
| 2 | Sears (SPRS) | Teradata | NCR 5100 | MPP | EMC | 33,000 |
| 3 | MCI (database marketing) | Informix-Dynamic Server EP | IBM SP | MPP | IBM | 16,345 |
| 4 | Catalina Marketing Corp. | Red Brick | Digital Alpha 8400 | SMP | EMC, MTI | 15,277 |
| 5 | Tele Danmark A/S | DB2 | IBM RS/6000 | MPP | IBM | 10,100 |
| 6 | HCIA | Informix-Dynamic Server | Sun 6000 - 1000 | SMP | Seagate, Quantum | 10,000 |
| 7 | CitiCorp | DB2 | IBM SP | MPP | IBM | 9,744 |
| 8 | Sears (data warehouse) | Informix-Dynamic Server EP | IBM SP-SMP | MPP | IBM | 8,229 |
| 9 | Walgreen Co. | HOPS | Digital Alpha 4100 | SMP | DEC | 6,600 |
| 10 | Union Pacific Railroad | Teradata | NCR 5100M | MPP | Seagate | 6,112 |
| TABLE 10. Most rows, Unix only, decision-support systems. | ||||||
For decision-support systems, the VLDB Survey Program defines peak online activity as the highest number of concurrent, online, in-flight queries, reports, and updates. Outdistancing all other contenders in this consideration is JCPenney. The company earned a pair of Grand Prizes for its ability to execute 784 concurrent online processes. This figure exceeded all entrants operating on any platform as well as those in Unix-only environments. John Mayrack, systems development manager, supplied the winning survey.
The JCPenney database is a customer-centric data warehouse that performs a mixed workload of ad hoc queries and data maintenance processes. It is implemented in Teradata and hosted on an NCR WorldMark 5100M system with 12 nodes and 96 processors. EMC 5000 and 5430 devices provide storage for the system. Most of the data--560GB--is at the detail level. There are 22GB of summary and aggregate data plus an additional 20GB of indexes, for a total of 602GB.
Keep an eye on this database. Not only did it grow by 50 percent last year, but JCPenney projects an even greater expansion, 70 percent, for 1998. And over the next three years, the company predicts the database will triple in size, easily propelling it over the terabyte bar.
One significant observation you can glean from Table 11 is Unix's supremacy in handling high online workload levels for DSS. Other than The Dialog Corp.'s use of MVS and a minor presence of enduring TOS/System 3600 combos, every other winning site in this category uses Unix. With Unix's proven strength in supporting large amounts and many rows of data, this finding clearly confirms Unix as the platform for DSS.
No less obvious is the DBMS best suited for high workload activity. Both Tables 11 and 12 reveal Teradata and Oracle as the premier choices. The Dialog Corp.'s proprietary solution excluded, all top 10 winners use either of these DBMSs.
| Rank | Organization | DBMS | Processor | Architecture | Storage | Concurrent Queries |
|---|---|---|---|---|---|---|
| 1 | JCPenney | Teradata | NCR 5100M | MPP | EMC | 784 |
| 2 | SBC Corp. | Teradata | NCR 5100 | MPP | NCR | 750 |
| 3 | AT&T | Teradata | NCR 3600 | MPP | Seagate | 600 |
| 4 | Fidelity Systems Co. | Oracle | Sun e10000, e6000 | Cluster | Sun | 500 |
| 5 | The Dialog Corp. | Proprietary | Hitachi GX8724, Sun SPARC | SMP | EMC,HDS,IBM, Sun, Kubic | 500 |
| 6 | SNCF | Teradata | NCR 5100M | MPP | EMC | 300 |
| 7 | Hewlett-Packard | Oracle | HP 9000 | SMP | HP | 300 |
| 8 | Experian (financial database marketing) | Oracle | SGI Challenge XL | SMP | Amdahl | 256 |
| 9 | NCR Corp. | Teradata | NCR 3600, NCR 5100M | MPP | Symbios | 214 |
| 10 | UPS (data warehouse) | Oracle | HP 9000 | SMP | EMC | 180 |
| TABLE 11. Peak online activity, all environments, decision-support systems. | ||||||
| Rank | Organization | DBMS | Processor | Architecture | Storage | Concurrent Queries |
|---|---|---|---|---|---|---|
| 1 | JCPenney | Teradata | NCR 5100M | MPP | EMC | 784 |
| 2 | SBC Corp. | Teradata | NCR 5100 | MPP | NCR | 750 |
| 3 | Fidelity Systems Co. | Oracle | Sun e10,000, e6000 | Cluster | Sun | 500 |
| 4 | Hewlett-Packard | Oracle | HP 9000 | SMP | HP | 300 |
| 5 | SNCF | Teradata | NCR 5100M | MPP | EMC | 300 |
| 6 | Experian (financial database marketing) | Oracle | SGI Challenge XL | SMP | Amdahl | 256 |
| 7 | NCR Corp. | Teradata | NCR 3600, NCR 5100M | MPP | Symbios | 214 |
| 8 | UPS (data warehouse) | Oracle | HP 9000 | SMP | EMC | 180 |
| 9 | Boeing | Teradata | NCR 5100M | MPP | Symbios | 150 |
| 10 | National Association of Securities Dealers | Oracle | Sequent NUMA-Q | NUMA | EMC | 150 |
| TABLE 12. Peak online activity, Unix only, decision-support systems. | ||||||
Several shifts in the industry are readily apparent from the 1998 data. As we mentioned earlier, the first is the growing presence of Unix at decision-support sites. In the 1997 program, 68 percent of the DSS participants were running on a Unix platform; this year, the number leaped to 87 percent. However, Unix has only a minor presence at transaction processing installations. In the 1997 program, 21 percent of OLTP installations were hosted on Unix platforms. A year later, this number decreased slightly to 19 percent.
Another development is the increased visibility of the federated database architecture. In the 1997 program, only one participant, National Processing Co., reported a federated database. One year later, that number has jumped to seven, of which four run transaction processing applications exclusively or primarily and three are used for decision support.
It goes without saying that VLDBs just keep getting bigger and bigger. Consider how transaction processing systems have expanded in the past 12 months. Over the course of 1997, UPS combined a pair of more than 3TB systems with several other systems to form one monstrous 16TB federated system, the survey program's first double-digit terabyte site. Fellow OLTP leaders Telstra erupted by 36 percent and Experian mushroomed by an incredible 56 percent to reach the 4.3TB and 2.7TB marks, respectively.
The boundaries of the DSS world are stretching as well. The Sears database, no shrinking violet last year at 1.3TB, catapulted into the top tier in the '98 program: The Sears system grew by more than three and a half times to reach a formidable 4.63TB. As part of this growth, Sears added 550 percent more rows for a total of 33 billion. Wal-Mart, a perennial leader in the VLDB Survey Programs, almost doubled its data content and more than duplicated its row count to reach the 4.42TB and 50,000-row marks.
The answer to the question is already clear: up and up and up. The report on the 1997 program a year ago ("Giants Walk the Earth," September 1997) predicted the presence of a 9TB database in the '98 program. This musing actually underestimated database growth in practice. At 16.8TB, the UPS package tracking and delivery system exceeded our prediction by more than 60 percent! What other company--or companies--will surpass the 10+TB border next year?
In addition to celebrating the ever-expanding VLDB frontier, we as database specialists must also capitalize on this trend in practice. From these leading implementations, we can learn how to identify the risks, master the critical elements, and understand the success factors of VLDBs by building a common knowledge base about large databases. We begin collecting data for the next campaign in September 1998. Be a participant in the program and find out how to guide your VLDB into terabyte territory.
More information about the Winter VLDB Survey is available at www.wintercorp.com.
Richard Winter is president and Kathy Auerbach is research program manager of Winter Corp. in
Boston, an international consulting practice that advises executives on large database strategies,
parallel architectures, risk management, and critical implementation projects. You can reach them
by email at richard.winter@wintercorp.com and
kauerbach@wintercorp.com respectively, or by telephone at (617) 695-1800.
Participating in the VLDB Survey Program helps you create industry recognition for yourself, your database team, and your company. The program gives you an opportunity to tell customers, prospects, and competitors about your success in implementing and operating a large database. At the same time, you're helping to build a body of knowledge about the best practices in the VLDB industry.
The program creates far-reaching professional visibility via an ongoing press relations campaign. Furthermore, Winter Corp. posts all winners' names, company information, and database accomplishments on our Web site for a year. Achievements in the program serve as an excellent source of material for company collateral, advertising, press releases, Web site content, and so on.
All program winners receive a variety of awards, including gourmet chocolate, airline miles, framed certificates, and crystal plaques. We'll also send you a free copy of the Members Report, a summary of program highlights and research findings. If you're a Grand Prize winner, you'll also receive a complimentary Winter Corp. technology briefing at a location of your choosing.