Enterprise-Ready DSS: The Hybrid Approach

By Tulu Tanrikorur

Using a combined top-down/bottom-up DSS methodology, you can ensure the success of departmental data marts today as well as that of the enterprise data warehouse tomorrow

The role of data warehouses and data marts in strategic decision making is increasing rapidly; practically everyone in the business community plans to implement one at some point. Some companies are asking their software vendors to bundle a decision-support solution either as part of their standard offering or as an option, while others want to build their own solutions.
      Although many vendors now offer bundled tools for building a data warehouse/data mart solution, many customers are discovering that they need to do some work themselves, mainly because each company has unique corporate culture, environment, and requirements.
      Creating a DSS requires building a database as well as an application. The degree of work involved varies by the number of application subject areas and their scope. While the enterprise data warehouse-type of DSS spans multiple subject areas, the data mart solution usually involves a single subject area.
      As early implementers have found out the hard way, creating an enterprise data warehouse with clean and consistent data is neither easy nor cheap. The project scope, especially in companies with high volumes of data from multiple sources, is usually enormous. As a result, companies are now looking for a DSS strategy that will comprise multiple subject areas but at the same time allow for an iterative implementation.
      Implementing a corporatewide DSS requires a lot of good planning. According to a recent survey from Sentry Market Research Inc., managing expectations was rated as the most challenging aspect of building such a system. Another 1996 survey by the Data Warehouse Institute shows that the number of enterprise DSS users usually doubles in the first six months of production. Therefore, choosing a scalable implementation methodology is a major goal. In this article, Ill present a way of accomplishing that goal while minimizing the costs and risks of an enterprisewide DSS project.

Current Implementation Methodologies
The two most common methodologies for implementing a multitiered, corporatewide DSS are the top-down and bottom-up approaches. Among the software vendors, these two approaches to building a DSS constitute a major difference factor for the preferred choice of implementation.
      As shown in Figure 1, in a top-down approach one big corporate database (the data warehouse) containing the historical data is created from the operational systems, followed by smaller, subject-specific databases (the data marts). The danger in this approach because of its large scope is that building from the top down may not serve each departments specific needs. When that happens, the end-user applications are created with what is available rather what is required. Furthermore, one of the most difficult tasks is getting all the key people from different departments to agree on the same interpretation of the data before building any applications. During this phase, it is not uncommon to encounter risky company politics when deficiencies are uncovered in the current system.

      In a bottom-up approach (see Figure 2), data marts are built first by focusing on one specific business area. The problem here is that multiple data marts built in this manner become isolated systems, and because each mart requires the extraction of operational data from different sources, the risk of redundancy is high. As a result, the total time required to manage the data marts is more than initially anticipated.

      However, the biggest drawback of the bottoms-up approach is the lack of data integration across the whole company. In other words, the migration to a common data warehouse model will be more difficult. (In his book Building the Data Warehouse [Wiley & Sons, 1996], Bill Inmon defines the word migration as a plan addressing the issues of transferring data out of existing systems.) Thus, the effort to create an enterprise data model for a data warehouse evokes the need to plan for more scalability at the first opportunity--where the data from multiple sources can be cleaned and refreshed centrally.
      Fortunately, there is another emerging methodology that marries these two methods: the hybrid approach.

Going Hybrid
As discussed earlier, building many independent data marts without a common architecture makes creating a single, corporate data warehouse for the enterprise much harder down the line. But it is also a well-known fact that building a full-blown, easy-to-use data warehouse is riskier and more expensive than building a data mart, even for corporations with deep pockets. So how can you increase the chances of putting a DSS in production within a reasonable timeframe while minimizing the efforts needed to build an enterprise data warehouse later? The answer is to combine the strengths of the top-down and bottom-up approaches into a method I call the hybrid approach. In this approach, one inclusively designs for the enterprise data warehouse and department data mart. However, the data mart is built first with the data warehouse following later. (This method is also referred to as a combination approach by H. S. Gill and P. C. Rao in their Official Guide to Data Warehousing [Que Corp., 1996.])
      The degree to which the enterprise and department data models are taken during the design process is one of the key characteristics of the hybrid approach. With this approach, it is not necessary to create a fully completed enterprise data model at the beginning--a time- and resource-consuming effort that only corporations with big budgets can undertake. However, it is necessary to identify all major problem areas that eventually will be included in the total enterprise solution. For high-level enterprise design, we only need to know which key fields will relate to each other. But for the initial subject area chosen, a detailed low-level design with granular data needs to be completed. In other words, a data mart subset of the enterprise data model should be agreed upon by the key people before the department-specific data mart is built. At this point, the data naming conventions, types, and sizes must be clearly defined for the data mart at hand only. These conventions will function as the standards for b uilding all DSSs in the enterprise. As a result, you will be building a strong foundation for the corporate data model one data mart at a time. The result is that scalability neednt require big budgets and the goal of creating a common enterprise data model becomes achievable.
      There are a couple different schools of thought on which subject area should be addressed first; some prefer the first subject to be in a low- to medium-priority area, while others think it should be in a high-priority area. This issue is less important with the hybrid approach because of its ability to provide better scalability. However, in order to entice more project sponsors and attract more user interest, choosing a highly visible problem area will do a better job of proving the importance of having an enterprise DSS.
      As shown in Figure 3, in the hybrid approach, the DSS is designed at the top and bottom levels simultaneously; when completed, data marts are rolled up into an enterprise data warehouse. With this approach, the effort required to build an data warehouse is greatly reduced because all the data models for the data marts will be designed with a corporate strategy in mind.

      Some people prefer to create a complete enterprise data model instead of complete subsets of it (as in the top-down approach ). This type of development is analogous to waterfall development (sometimes known as the big bang method) where each task has to be fully completed before moving on to the next one. In contrast, starting a DSS system with subsets of the enterprise data model is more analogous to incremental development (also known as an iterative method) where tasks can be revisited to add more functionality. The leading software development companies discovered long ago that the incremental approach to building software is more manageable and less risky. This principle holds true in building DSSs as well. As Inmon explains, The most successful implementations of the architected environment have been those where the data warehouse has been built one iteration at a time.
      Essentially, the hybrid approach represents an example of applying incremental development by subject area in a DSS. Not surprisingly, this practice is now being actively pursued by major vendors like IBM, Oracle, and Computer Associates.
      Although there is now a general consensus on the usefulness of an iterative implementation, the scope of the iteration step and its simplicity are influenced by the chosen method. In the hybrid approach, iteration takes place over a single subject area at each pass whose key relations in the enterprise data model have already been established. When an iterative implementation is applied to the top-down approach, however, the chances are likely that it will span multiple subject areas in the whole enterprise model--leading to a slow, risky, and costly process.

Migration to Enterprise Level
An important point must be stressed here: When the value of data marts has been proven, you must avoid building too many of them (generally no more than two to three) without implementing an enterprise data warehouse. The main reason for this limit is to preserve your ability to create a common data model for the enterprise by which smaller marts can be refreshed, maintained, and tuned for performance. Thats where the strength of the top-down side of the hybrid approach--the reduction of data replication from the operational systems--comes in. Furthermore, you also want to centralize the data extraction and removal process by having one common, shared repository. Data cleaning and transformation, which make up most of the unexpected cost of a DSS, can now be minimized. The final result is less maintenance and better performance.
      After satisfactory results are observed from the initial data mart, you need to decide how to grow the existing DSS system. Before you build additional data marts, the existing data model should be examined to decide if new data elements are needed. The creation of another subject area for a new data mart can be completed in the same machine and database; depending on the number of new data elements needed, you can create a new data mart model or modify the existing one. There should be very little data duplication among the data marts so that a high degree of data integration can be achieved for the corporate data warehouse. Therefore, if you choose subject areas such that they are almost mutually exclusive, it will be much easier to integrate these areas when building the enterprise watrehouse. You must also carefully consider the fact that there should be only one interpretation (format and semantics) of data elements among the multiple subject-area data marts.
      Now youre at a point where you can build the multitier enterprise data warehouse by moving the data model, representing the detail data of each data mart, to a different machine. Rolling up an enterprise data warehouse is simply a matter of taking this shared data model to an higher level, because its key relationships were established during the high-level data warehouse design. The effort needed to achieve this task is streamlined by the proper partitioning of the data mart subject areas, and the task of managing the common enterprise data (the identification and interpretation of the new detail data elements) was performed as each data mart was built. Clearly, the effort to migrate existing data marts to an enterprise data warehouse would be very tedious using the bottom-up approach. Opportunistic data marts built without any intention of enterprise integration become isolated systems and maintenance nightmares.
      After the common detail data is moved to an enterprise data warehouse, the performance of the system--which contains mostly summarized data at the data mart level--will be enhanced. In his book The Data Warehouse Toolkit (Wiley & Sons, 1996), Ralph Kimball emphasizes this point when he concludes The use of prestored summaries (aggregates) is the single most effective tool the data warehouse designer has to control performance.
      To create an enterprise data model, some suggest the use of smart algorithms to scan the metadata and discover common elements at the data mart level. In my opinion, this suggestion is valid only when one is uncertain about how the current data models are structured and maintained. This situation is typically encountered with systems built with the bottom-up approach, where creating a common enterprise data model is an afterthought. The hybrid approach attempts to prevent this situation from happening by keeping a closer eye on the enterprise data model during the design of each data model.
      Another point to keep in mind is that choosing a scalable database management system is crucial. Given that the size of the data and the number of users increase very rapidly in enterprise systems, selecting a database that can run in multiple operating systems will greatly decrease the growing pains. Planning for growth now will pay off in the future when you decide to increase the capabilities of the your system.

The Big Choice
Given the importance of a scalable architecture for the enterprise DSS, one of the major considerations during the planning phase is deciding which implementation option to use: top down, bottom up, or hybrid. Understanding the implications of choosing one approach over the other will influence how the DSS project takes shape over time. When this decision is based on the right information, managing expectations during the DSS project will be much easier and less stressful for the project team and company sponsors.
      Scalable solutions are built by design. Therefore, it is imperative to identify the methods that will ensure you have such a design. Because of its ability to control cost and risk without jeopardizing the creation of an enterprise data model, the hybrid approach should be the methodology of choice for most companies. It is suitable for early adopters as well as for more conservative users.

Tulu Tanrikorur is a software development manager at MSC Inc. He has more than 10 years of experience in designing and building large-scale commercial applications. He can be reached at tulu@mindspring.com.  


 
search - home - archives - contacts - site index
 

Copyright 1997 Miller Freeman Inc. All Rights Reserved
Redistribution without permission is prohibited.

Questions? Comments? We would love to hear from you!