
Back in the 1970s when the Vietnam War, sit-ins, and "turning on" were a part of current history, my hippie film teacher told the class that the biggest rush of all wasn't drugs but knowledge and learning. Although it sounded square at the time, his words were true, and today I find it one of the biggest allures of working in the computing field. There's so much to learn just to keep pace. One of the reasons there's so much to learn is that there's a huge number of areas involved in DSS and an incredible amount of research being done in each area.
DSS research, for example, covers everything from computing architectures to analytical models to visual perception--even language theory. In the last couple of years, there have been major advances in logical and physical dimensional database design, schema integration, data compression, optimizers, data quality, analytical algorithms, image and text pattern matching, data visualization, agent automation, natural language interfaces, storage, bus and memory architectures, intraquery parallelization, repositories, change management, data cleansing, CASE, business rule managers, communications from email to Web publishing, forecasting, image processing, and more.
There is, however, a downside to the research-heavy nature of the field. With the exception of university-based efforts, the fruits of industrial research, which accounts for a large part of new deployable technology, must be commercialized (that is, sold). To do this, vendor marketing departments take over and craft messages that serve the interests of individual vendors--not consumers. In other words, software and hardware vendors form the main conduit for information about new DSS technology.
There are two main distortions that result: Computing tools are positioned as solutions, answers, or ends when they are really a means to an end, and tools are overly differentiated. For example, mining tools--which are complementary to OLAP tools, rely on many of the same functions as OLAP tools, and ought to be used together with OLAP tools for a variety of tasks--and OLAP tools, are frequently positioned as if they are different tools that do different things, and are only vaguely related. And many tools are advertised as if they can be consumed with beer and bread, which is apparent every time you read the trade journals, attend a conference, or speak with a vendor.
Take the term OLAP, for example. It was a clever marketing term that referred to multidimensional modeling and analysis, and although it was coined to sound as if it were a serious scientific term that did not denote any tool in particular, the term came to be associated with a particular class of data storage and analysis tools, namely OLAP tools. To confuse matters more, it wasn't long before another class of tools that, logically speaking, did the same thing but differed in terms of physical implementation strategies, namely storing data in a merchant RDBMS instead of an optimized multidimensional format, came along: ROLAP tools. And of course there was debate as to which was better--ROLAP, traditional OLAP (now referred to as MOLAP), or their hybrid, HOLAP. As I pointed out in an earlier column, this debate is ridiculous. The real issue is how to tune the storage and calculation of multidimensional data for a particular set of use scenarios.
Data mining is no better. It too is a clever marketing term. Data mining refers to the use of quantitative methods for estimating relationships between two or more variables for the use of understanding and predictions. It is most definitely an outgrowth of statistics. And while some of the practices differ from those of traditional statisticians (relying more on data-based models than parametric models), it is a gross distortion of reality to give the collection of techniques an entirely new name as if it represented a brand new technology (like the combustion engine relative to the horse and buggy). The implicit promise of data mining was that if you buy a data mining tool, your problems will be solved. And of course, well-paid consultants get to point out that it's the consultant, not the tool, that's important. The reality is that data mining is no magic bullet. And consumers get angry when promises go unfulfilled.
This is not to say that there haven't been some great advances in technology for OLAP, data mining, the Web, and so on, just that it ought to be easier to understand what the actual advances are and easier to put them to productive use. The difficulty is that it is in the vendors' interest to "productize" problems so their products can become "solutions." New product categories are continually popping up, and end users are constantly bewildered by the dizzying array of new products and new product categories that they seem to need to solve their problems. And because industry conferences tend to reflect the jargon created by the vendors, regardless of how heavy a presence the vendors actually have in the conference, it's hard to get a jargon-free education anywhere. With all this noise, it is difficult to see what is real and what is hype, and it's difficult to assimilate all this newness into a consistent, stable framework.
Data warehousing is perhaps the worst offender. Consider some of these new "technical" terms:
Data warehouse: a database for decision support, which in some circles has been around for 25 to 30 years
Data marts: subject area-specific data stores for decision support
Operational data stores: decision-support databases for operational or tactical decisions.
The real issues boil down to the optimal distribution of storage and computing. And depending on the particular needs of an organization, the optimal configuration may allow for transaction and decision-support processing to take place in the same database, or there may be a clear separation. Assuming there's a clear separation, there may be a single decision-support database, there may be many connected databases without a central one, or there may be a hub-and-spoke configuration--and the possibilities go on.
Imagine that the ideal computing configuration corresponds to a color. There could be any of a nearly limitless variety of optimal colors for a particular situation, and a framework for defining the optimal color would map the situation conditions to the color spectrum. Yet these new "technical" terms would make it appear that there are only a small number of color options and that we have to use these colors in particular. In other words, a data mart is just one color on the color spectrum. For the record, I have nothing against particular configurations of decision support- oriented databases, but I do resist the Kantian notion of objectifying these functional notions into concrete "things that we need to have." People don't necessarily need a data warehouse or a data mart; they need to solve business problems. And for many companies, the information needed to solve problems (such as point of sale and customer attribute data), combined with the costs of storing and processing data, make it so that the optimal solution to their problems involves some form of decision support-dedicated database (or databases, as the case may be). I think of a term like "data warehouse" as a metaphor for a particular optimal form of computing architecture, as if one did a cluster analysis of optimized computing configurations and discovered three clusters corresponding to warehouses, marts, and operational data stores.
The problem with the way things are now is that, for example, you get attendees at conferences asking whether they should be building a data warehouse, a data mart, or an operational data store as if these were fundamentally distinct things, like baryons and mesons. It's not that there aren't useful questions to ask, but it all gets buried under the new technology buzz words. For an industry where learning is so paramount, it's important that the learning process is made as efficient as possible.
Using a physical training metaphor, it's as if you wanted to train your whole body (learn about all DSS functions) by going to a fully equipped gym. Except there is no one gym. Instead each equipment vendor has its own partial gym for which you have a brochure selling you the benefits of its particular equipment. (Imagine, an Oracle gym, a Microsoft gym, an IBM gym ....) This would mean you have to figure out the attributes of each piece of equipment made by each vendor and figure out, somehow, what you need to do to train your whole body (making sure not to rely too heavily on the equipment vendors to learn this). Then you have to create a workout (learn about a DSS solution) by hopping from gym to gym using bits and pieces of different vendors' equipment. What a nightmare!
This is why I have always taken a functional approach. The real issue is what are we trying to do--what problem are we solving? The technology is a means to an end, not an end in itself. What we need is a forum for end users, analysts, IT managers, and executives that is centered around the core (relatively permanent, unchanging, and transcendent) functional issues of decision support and oriented toward particular business problems such as changing organizations, rapid growth, performance measures, and so on. By framing the debate in terms of things that users do, the debate frees itself from the constant flux of new products and takes the tone of a steady, guiding force. Users may even begin to control the direction of product development as vendors participate in the forums.
As a result, with the backing of Dave Stodder, Database Programming & Design, and Miller Freeman Inc., I am pleased to announce the creation of the DSS Summit. The DSS Summit will address leading-edge issues in decision support from a functional perspective. This year's Summit will take place on October 19 to 21 in Chicago. The two major business problems that will be addressed are "changing organizations," which covers everything from corporate reorganizations to mergers and acquisitions, and "performance measures," which covers employee and business process performance. Presenters will address topics from the business, technical, and end-user perspectives. There will also be presentations on a variety of leading-edge topics from rich content mining, intelligent agents, realtime decision support, and data quality to the integration of OLAP, data mining, and visualization.
In addition to presentations, we will try to provide for showcase demonstrations of each of the conference themes. Also, we will send out a questionnaire to each of the participating vendors that asks them to identify the technical and business problems that they solve, especially those pertaining to the conference themes. And we will encourage them to come to the Summit prepared to demonstrate how they solve the problems they claim to solve. (We are also planning to provide a means for small startup vendors with great, nonmainstream technology and small budgets to show off their stuff.) We will then post their responses to some forum (location to be determined) so that conference attendees will be able to inquire about the problems that are of concern to them and be lead to the appropriate vendors.
Our ultimate goal is to optimize the flow of information between vendors, researchers, analysts, consultants, and end users so the constant flow of new technologies can be assimilated by users (and feedback provided to vendors), within a stable, problem solving-oriented framework. In this way, the returns to mental training will become as efficient as the returns to physical training.
Erik Thomsen is an author, lecturer, researcher, and consultant focusing on OLAP and
decision-support applications. He is cofounder of the Cambridge, Mass.-
based consultancy Dimensional Systems and author of the book OLAP Solutions
(John Wiley & Sons, 1997). You can reach him via email at erik@dimsys.com.