
The role of data warehouses and data marts in strategic decision making is increasing rapidly; practically everyone in the business community plans to implement one at some point. Some companies are asking their software vendors to bundle a decision-support solution either as part of their standard offering or as an option, while others want to build their own solutions.
Although many vendors now offer bundled tools for building a data warehouse/data mart solution, many customers are discovering that they need to do some work themselves, mainly because each company has unique corporate culture, environment, and requirements.
Creating a DSS requires building a database as well as an application. The degree of work involved varies by the number of application subject areas and their scope. While the enterprise data warehouse-type of DSS spans multiple subject areas, the data mart solution usually involves a single subject area.
As early implementers have found out the hard way, creating an enterprise data warehouse with clean and consistent data is neither easy nor cheap. The project scope, especially in companies with high volumes of data from multiple sources, is usually enormous. As a result, companies are now looking for a DSS strategy that will comprise multiple subject areas but at the same time allow for an iterative implementation.
Implementing a corporatewide DSS requires a lot of good planning. According to a recent survey from Sentry Market Research Inc., managing expectations was rated as the most challenging aspect of building such a system. Another 1996 survey by the Data Warehouse Institute shows that the number of enterprise DSS users usually doubles in the first six months of production. Therefore, choosing a scalable implementation methodology is a major goal. In this article, Ill present a way of accomplishing that goal while minimizing the costs and risks of an enterprisewide DSS project.
Current Implementation Methodologies
The two most common methodologies for implementing a multitiered, corporatewide DSS are the top-down and bottom-up approaches. Among the software vendors, these two approaches to building a DSS constitute a major difference factor for the preferred choice of implementation.
As shown in Figure 1, in a top-down approach one big corporate database (the data warehouse) containing the historical data is created from the operational systems, followed by smaller, subject-specific databases (the data marts). The danger in this approach because of its large scope is that building from the top down may not serve each departments specific needs. When that happens, the end-user applications are created with what is available rather what is required. Furthermore, one of the most difficult tasks is getting all the key people from different departments to agree on the same interpretation of the data before building any applications. During this phase, it is not uncommon to encounter risky company politics when deficiencies are uncovered in the current system.


Going Hybrid
As discussed earlier, building many independent data marts without a common architecture makes creating a single, corporate data warehouse for the enterprise much harder down the line. But it is also a well-known fact that building a full-blown, easy-to-use data warehouse is riskier and more expensive than building a data mart, even for corporations with deep pockets. So how can you increase the chances of putting a DSS in production within a reasonable timeframe while minimizing the efforts needed to build an enterprise data warehouse later? The answer is to combine the strengths of the top-down and bottom-up approaches into a method I call the hybrid approach. In this approach, one inclusively designs for the enterprise data warehouse and department data mart. However, the data mart is built first with the data warehouse following later. (This method is also referred to as a combination approach by H. S. Gill and P. C. Rao in their Official Guide to Data Warehousing [Que Corp., 1996.])
The degree to which the enterprise and department data models are taken during the design process is one of the key characteristics of the hybrid approach. With this approach, it is not necessary to create a fully completed enterprise data model at the beginning--a time- and resource-consuming effort that only corporations with big budgets can undertake. However, it is necessary to identify all major problem areas that eventually will be included in the total enterprise solution. For high-level enterprise design, we only need to know which key fields will relate to each other. But for the initial subject area chosen, a detailed low-level design with granular data needs to be completed. In other words, a data mart subset of the enterprise data model should be agreed upon by the key people before the department-specific data mart is built. At this point, the data naming conventions, types, and sizes must be clearly defined for the data mart at hand only. These conventions will function as the standards for b
uilding all DSSs in the enterprise. As a result, you will be building a strong foundation for the corporate data model one data mart at a time. The result is that scalability neednt require big budgets and the goal of creating a common enterprise data model becomes achievable.
There are a couple different schools of thought on which subject area should be addressed first; some prefer the first subject to be in a low- to medium-priority area, while others think it should be in a high-priority area. This issue is less important with the hybrid approach because of its ability to provide better scalability. However, in order to entice more project sponsors and attract more user interest, choosing a highly visible problem area will do a better job of proving the importance of having an enterprise DSS.
As shown in Figure 3, in the hybrid approach, the DSS is designed at the top and bottom levels simultaneously; when completed, data marts are rolled up into an enterprise data warehouse. With this approach, the effort required to build an data warehouse is greatly reduced because all the data models for the data marts will be designed with a corporate strategy in mind.

Migration to Enterprise Level
An important point must be stressed here: When the value of data marts has been proven, you must avoid building too many of them (generally no more than two to three) without implementing an enterprise data warehouse. The main reason for this limit is to preserve your ability to create a common data model for the enterprise by which smaller marts can be refreshed, maintained, and tuned for performance. Thats where the strength of the top-down side of the hybrid approach--the reduction of data replication from the operational systems--comes in. Furthermore, you also want to centralize the data extraction and removal process by having one common, shared repository. Data cleaning and transformation, which make up most of the unexpected cost of a DSS, can now be minimized. The final result is less maintenance and better performance.
After satisfactory results are observed from the initial data mart, you need to decide how to grow the existing DSS system. Before you build additional data marts, the existing data model should be examined to decide if new data elements are needed. The creation of another subject area for a new data mart can be completed in the same machine and database; depending on the number of new data elements needed, you can create a new data mart model or modify the existing one. There should be very little data duplication among the data marts so that a high degree of data integration can be achieved for the corporate data warehouse. Therefore, if you choose subject areas such that they are almost mutually exclusive, it will be much easier to integrate these areas when building the enterprise watrehouse. You must also carefully consider the fact that there should be only one interpretation (format and semantics) of data elements among the multiple subject-area data marts.
Now youre at a point where you can build the multitier enterprise data warehouse by moving the data model, representing the detail data of each data mart, to a different machine. Rolling up an enterprise data warehouse is simply a matter of taking this shared data model to an higher level, because its key relationships were established during the high-level data warehouse design. The effort needed to achieve this task is streamlined by the proper partitioning of the data mart subject areas, and the task of managing the common enterprise data (the identification and interpretation of the new detail data elements) was performed as each data mart was built. Clearly, the effort to migrate existing data marts to an enterprise data warehouse would be very tedious using the bottom-up approach. Opportunistic data marts built without any intention of enterprise integration become isolated systems and maintenance nightmares.
After the common detail data is moved to an enterprise data warehouse, the performance of the system--which contains mostly summarized data at the data mart level--will be enhanced. In his book The Data Warehouse Toolkit (Wiley & Sons, 1996), Ralph Kimball emphasizes this point when he concludes The use of prestored summaries (aggregates) is the single most effective tool the data warehouse designer has to control performance.
To create an enterprise data model, some suggest the use of smart algorithms to scan the metadata and discover common elements at the data mart level. In my opinion, this suggestion is valid only when one is uncertain about how the current data models are structured and maintained. This situation is typically encountered with systems built with the bottom-up approach, where creating a common enterprise data model is an afterthought. The hybrid approach attempts to prevent this situation from happening by keeping a closer eye on the enterprise data model during the design of each data model.
Another point to keep in mind is that choosing a scalable database management system is crucial. Given that the size of the data and the number of users increase very rapidly in enterprise systems, selecting a database that can run in multiple operating systems will greatly decrease the growing pains. Planning for growth now will pay off in the future when you decide to increase the capabilities of the your system.
The Big Choice
Given the importance of a scalable architecture for the enterprise DSS, one of the major considerations during the planning phase is deciding which implementation option to use: top down, bottom up, or hybrid. Understanding the implications of choosing one approach over the other will influence how the DSS project takes shape over time. When this decision is based on the right information, managing expectations during the DSS project will be much easier and less stressful for the project team and company sponsors.
Scalable solutions are built by design. Therefore, it is imperative to identify the methods that will ensure you have such a design. Because of its ability to control cost and risk without jeopardizing the creation of an enterprise data model, the hybrid approach should be the methodology of choice for most companies. It is suitable for early adopters as well as for more conservative users.
Tulu Tanrikorur is a software development manager at MSC Inc. He has more than 10 years of experience in designing and building large-scale commercial applications. He can be reached at tulu@mindspring.com.