
OVERVIEW OF APIS
Tensor and MD-API are somewhat similar in what they do and do not
provide a client programmer but they have some high-level differences as
well. Basic to each of them is the querying of multidimensional (MD)
structural metadata and data. Generally speaking, they have similar
metadata models (with some exceptions). Neither provides services beyond
the querying of MD structural metadata and fetching/updating of data
values. Significantly, the services not provided include execution of
DDL (defining computational metadata and defining structural metadata)
and administration (MD security, invoking database loading, and
calculating). Only Tensor provides even the most rudimentary
computational metadata.
The APIs use somewhat different terminology for the same concepts as
well. Table 1 provides an imprecise guide to these terminology
differences.
| TABLE 1. API terminology. | ||
| Tensor | MD-API | Concept |
| Axis | Edge | Dimension of an n-dimensional query result, as distinguished from a dimension in the queried hypercube |
| Cube, schema | MD schema | Collection of variables and measures; imprecise in that the MD-API's MD schema covers Tensor's cube as well as its schema |
| Cube | Hypercube | Query object |
OLAP METADATA
Standard metadata for diverse server products is a key benefit of both
APIs. The OLAP metadata exposed by the two interfaces is mostly similar.
Both APIs recognize six distinct types of metadata: dimensions,
hierarchies, levels, members, measures, and properties (which are
usually called "member attributes" in database jargon). Dimensions may
have multiple hierarchies but need not have any hierarchical structure
across members. Each API also identifies time and measures as special
types of dimensions. Measures appear on a special "measures dimension,"
but otherwise may be structured like members of any other dimension.
Hierarchies may each have multiple levels, although each API assigns a
different meaning to the term "level." Each hierarchy must have least
one level-like entity, although the exact requirements vary. They both
also use the term "property" to refer to what is logically a
one-dimensional measure applying to some subset of members in a
dimension. (Because both APIs refer to application-level attributes like
price as well as intrinsic member properties like name as properties,
where the distinction is important I'll use the terms "application
member properties" and "intrinsic member properties," respectively, and
"member properties" to refer to their union.) Both APIs also allow
measures and properties of various types, including numerical, text, and
date.
The APIs' property entities are logically similar, but they have some
differences as well. In the MD-API, a property may exist for all members
in a dimension, hierarchy, or level. In Tensor, properties can exist
only for levels--if a property were associated with all members of a
dimension, there would be a separate property for each level.
The APIs address hierarchies and levels within hierarchies differently.
The MD-API does not require the root level/elements of a hierarchy to consist of a single root or tip member, while
Tensor assumes the existence of a single root member to every hierarchy.
Each API also provides the concept of a default hierarchy for a
dimension. (Such a hierarchy is required in Tensor and optional in the
MD-API.) Tensor also provides an optional default member for a
dimension; a query can rely on this default if it is present.
The treatment of levels marks the greatest differences between the two
APIs. Tensor defines a level as those members found at a particular
distance from the root level of a hierarchy. Depending on a provider,
these levels may or may not be named. However, in an irregular
hierarchy, leaves will be at different distances from the root; Tensor
lacks the important abilities to describe members in terms of distance
from the leaves and describe levels independently of hierarchical
distances.
In contrast, in the MD-API, a level is a named collection of members
within the hierarchy not defined strictly by hierarchical distance. Each
member in a hierarchy must be on some level, and each pair of adjacent
levels must be clearly "higher" and "lower" relative to the others. The
MD-API provides the ability to describe members by depth from root,
height from leaves, and named level, which may provide three distinct
and useful groupings. Unfortunately, the definition of levels allows
members within a level themselves to have parent-child relationships,
and there is no defined mapping from hierarchical distances to named
levels if named levels were not defined by the server.
The two APIs model multiple related hypercubes somewhat differently.
Tensor provides the catalog and schema as organizing principles for its
cubes, and a Tensor schema follows the block multicube model. Each cube
models a hypercube, where the measures appear to have common
dimensionality. Separate hypercubes are related along common dimensions
within the schema. The MD-API provides the "MD schema," which follows a
series multicube model. Each schema has a single measures dimension and
a collection of other dimensions; each measure is dimensioned
independently.
There is no fundamental difference in expressive value between these two
forms. OLAP queries frequently combine measures that have different
natural dimensionality. Such queries are transparent in the MD-API, with
its one measures dimension per schema; in Tensor, with multiple measures
dimensions per schema, when members from separate hypercubes are
involved in a query the client must explicitly join the cubes.
Tensor also provides "virtual cubes," which are prejoins of two or more
other cubes or virtual cubes (possibly with additional manipulations or
transformations specified by a DBA, as virtual cubes are constructed
outside Tensor).
OLAP QUERY FUNCTIONALITY
There is significant overlap in query capability between the APIs, along
with major differences. Both support:
PROGRAMMING TO THE APIS
The fine-grained, fully object-oriented approach of the MD-API presents
an entirely different programming interface to queries than the
relatively coarse object and text query-language approach of Tensor.
Greater similarities exist between their approaches to fetching result
data, but these approaches are still programmatically quite different.
Metadata in the MD-API. In the MD-API, metadata is provided in the form
of objects that identify and relate themselves. Structural metadata,
including dimensions, hierarchies, levels, and properties, are
first-class objects obtained through simple method calls on containing
objects, while members (including measures) are queried by constructing
and executing a MemberQuery object (discussed later).
Each metadata object knows its name and other relevant attributes. A
method call on the connection returns its collection of dimensions; for
each dimension, a method returns its set of hierarchies, and so on.
Dimensions, hierarchies, levels, and the connection each have an
associated collection of member property objects as member properties
may exist at any of these scopes. Properties (application and intrinsic)
may also be queried by name within each scope.
Important intrinsic member property information will be found at various
scopes. For example, a unique number of children intrinsic property may
exist in each hierarchy. To obtain the number of children along the
hierarchy for a set of members simultaneously, the instance of this
named property obtained from that hierarchy is added to a query for the
members.
Query-oriented metadata objects are also defined for the MD-API. For
example, ValueDescriptors are named objects obtained from the connection
that request a particular representation of query result data. "Value"
is one (meaning the raw value) example; other objects might request a
formatted string representation.
A client must traverse the metadata prior to querying for any cell
values because metadata objects are components of any cell data query.
The minimum a client could do is obtain the dimensions and one
ValueDescriptor; a realistic client will initially traverse more objects
than that, although it needn't do so all at once (and the resource
issues involved in obtaining all members of a schema will often preclude
it). The number of dimensions, hierarchies, and levels combined will
likely be pretty small, so obtaining all of them is unlikely to be a
drag on performance.
Metadata in Tensor. In Tensor, each type of metadata entity (catalogs,
schemas, cubes, dimensions, hierarchies, levels, members, measures, and
properties) is a separate schema rowset obtained through the same
IDBSchemaRowset interfaces that OLE DB provides for tabular (non-OLAP)
providers. As with tabular schema rowsets, a client may restrict the set
of metadata rows returned by specifying a set of column restrictions on
the IDBSchemaRowset when obtaining rows through
IDBSchemaRowset::GetRowset.
Tensor provides catalogs and schemas as organizing principles for the
remainder of the metadata; these principles will be the primary
restriction criteria. Generally, schema rowsets can be restricted by all
the criteria that apply to their enclosing metadata type. For example,
to obtain the hierarchies available for a dimension, a client would
construct an array of restrictions specifying the appropriate catalog,
schema, and dimension and use it to obtain the rowset of hierarchies
from the Hierarchies schema rowset. The Members schema rowset must be
restricted to a single level's worth of members when queried.
Each metadata entity (each dimension in the Dimensions rowset, each
hierarchy in the Hierarchies rowset, and so on) occupies its own row of
that set. Each entity is ultimately identified by its xxx_
UNIQUE_NAME (DIMENSION_UNIQUE_NAME, MEMBER_UNIQUE_NAME, for example),
which must contain enough information to identify it completely within
the cube. For example, if the member name New York were to occur in two
different levels, a provider would construct a unique name for each
(perhaps 'Geog'.'State'.New York' and 'Geog'.'City'.New York'). Each
rowset provides a set of attributes about that type of entity.
Because members may exist in multiple hierarchies and be on a different
level in each hierarchy, a member will have a unique row for each
hierarchy on which it appears--in each of these rows, it will have the
same MEMBER_NAME but a different MEMBER_UNIQUE_NAME that must encode
hierarchy information.
In addition to being able to restrict members based on catalog, schema,
dimension, hierarchy, and level (as well as member name), members may be
restricted on a "tree operation" column that uses bit fields to identify
hierarchical relationships. A client may use these to request the
parents, children, or siblings of the members given by the restrictions,
as well as the given members. For example, restricting on the member
unique name [Geog].[ByLocation].[Illinois] and (MDTREEOP_
SELF + MDTREEOP_CHILDREN) will return the member rows for Chicago,
Peoria, and Springfield.
A Properties schema rowset provides access to all cell properties and
all application and intrinsic member properties.
MD-API MemberQueries. The MemberQuery object forms the basis for all
queries of member-related information--including data and property
values--in the MD-API. A MemberQuery is a fairly simple concept: some
query that, when evaluated, results in a set of members and related
properties. Each MemberQuery may execute alone or as part of a cube.
As discussed previously, MD-API members--by virtue of their potential
quantity--are obtained through MemberQuery objects by executing the
member query and opening a result buffer. The result buffer provides
access to the member objects and associated properties. Members may be
queried by name, hierarchical relationship, property values, and data
values.
To address the many ways a set of members can be specified, each
MemberQuery includes a large number of specification and specification
operations, including: select, add, and remove members from some scope
or hierarchical relation; selectively filter using ValueExpressions
(which represent the comparison operations); and sort the members by
associated data (through ValueExpressions), hierarchy, or name.
MemberQueries are also the vehicle for obtaining member-related
information. Properties associated with the members in a MemberQuery may
be added and removed. For example, the display names and number of
children in a hierarchy would be obtained along with the member objects
in a MemberQuery by adding the property objects representing these
objects.
Metadata queries will likely restrict themselves to obtaining members
and their intrinsic properties via hierarchical relationships or
name-based ValueExpressions. Data queries would use all operations
available.
CUBE QUERIES
The APIs are quite different in how they specify the form of a query.
When a query is specified, however, a client uses it basically the same
way in both approaches: The query is executed, a result buffer is
opened, and the client iterates through the result rowsets/buffer to
fetch data. The resulting rowsets or buffers each contain separate
cube-cell and axis or edge structures.
To examine cube queries, let's use a common example for both APIs. Given
dimensions of product, customer, channel, and time, the measures Sales
and Units are dimensioned by product, customer, channel, and time; the
measure Product Cost is dimensioned by product and time; and the measure
Ship Cost is dimensioned by customer and time. This approach leads to
three cubes called Sales, ProdCost, and ShipCost. Query 1 against this
subset is to retrieve the units, sales, product cost, and shipping cost
for the children of product 'Toys', customer 'DeptStoreA', and '1995',
suppressing products that have entirely empty result cells. Query 2 is
to drill down on the first product member returned while keeping the
original products. For each of these queries, we're interested only in
retrieving formatted text for the cell contents.
The MD-API approach. In the MD-API, queries are formed by creating a
cube object with edges that can be removed or added as necessary,
orienting its component MemberQuery objects on these edges, setting the
state of the MemberQuery objects to the desired query, adding the
ValueDescriptors that describe the desired cell data representation(s),
and setting desired cube-edge properties. In the MD-API a cube is
primarily an organizer for MemberQuery objects, so most of the work of
creating a cube query involves specifying the MemberQueries. Suppressed
empty-member and zero-member combinations are indicated by setting the
corresponding property of the cube edge.
The client requests a particular representation of the result through by
adding the appropriate ValueDescriptor object to the cube. For example,
the connection's ValueDescriptor named "Formatted Text" would be used as
part of the query to retrieve the right cell representation. The steps
required to create these queries in the MD-API are shown in Listing 1
(in pseudocode; translate to COM or Java depending on implementation).
| LISTING 1. Forming and executing a query in the MD-API. |
|
Query 1 Preconditions: 1. Dimension objects for product, measures, customer, and time identified 2. Member objects ToysMember, Sales, Units, ProductCost, ShippingCost, DeptStoreAMember, Y1995Member identified 3. Value descriptor object FormattedTextValueDescriptor identified 4. Product hierarchy object ProdHierarchy identified Steps to create Query 2: Note that when the overall query was set up through a pattern of steps, modifying the query involved only modifying the part that changed. |
| LISTING 2. Executing a query in Tensor. |
|
Query 1: Preconditions: 1. Member names [Units], [Sales], [ProdCost], [ShipCost], [Toys], [1995] and [DeptStoreA] must be MEMBER_UNIQUE_NAMES, and [Toys], [1995], [DeptStoreA] must be the same MEMBER_UNIQUE_NAME in each cube. 2. An ICommand IC interface is obtained from the connection 3. An ICommandText ICT interface is obtained from the ICommand 4. The following QueryString must be obtained:
SELECT
NON EMPTY (HIERARCHIZE( Product.Children ([Toys]))) ON AXIS(0),
[SalesCube].[Units], [SalesCube].[Sales], [ProdCostCube].[ProdCost], [ShipCostCube].[ShipCost] ON AXIS(1)
FROM
SalesCube, ProdCostCube, ShipCostCube
WHERE
{[1995], [DeptStoreA]}
CELL PROPERTIES FORMATTED_VALUE
Query: Query 2: SELECT
NON EMPTY (HIERARCHIZE( {Product.Children ([Toys]), Product.Children([ActionFigures]} )) ON AXIS(0),
{[SalesCube].[Units], [SalesCube].[Sales], [ProdCostCube].[ProdCost], [ShipCostCube].[ShipCost]} ON AXIS(1)
FROM
SalesCube, ProdCostCube, ShipCostCube
WHERE
{[1995], [DeptStoreA]}
CELL PROPERTIES FORMATTED_VALUE
Query: |
WRITE CAPABILITY
Tensor specifies support for writing data back to the server through the
usual rowset update. Cell values may be changed by using the
IRowsetChange interface obtained indirectly from the data set. MD
updates and deletes are similar to table updates except that the member
structure is not affected--deleting the row containing the empties the
cell (inserts are forbidden). Whether any dependent stored cells are
actually recalculated is left to the provider's implementation.
As part of OLE DB, Tensor operates within the framework of Microsoft
Transaction Server (MTS). A client may use any MTS services provided by
the service provider when updating data. At press time, the draft MD-API
did not include write capability. The OLAP Council does plan, however,
to address this area in the near future, and vendor extensions to
provide it will doubtless appear in the first implementations.
PLATFORM FACTORS
The initial platforms supported by each API vary. As part of OLE DB,
Tensor will available on Windows 95 and NT clients. Furthermore,
Microsoft plans to extend ADO to encapsulate Tensor; through ADO support
for Java, Tensor functionality will be available as native Java layered
on top of OLE DB.
The MD-API is currently specified for two different implementation
platforms, COM and Java. Windows 95 and NT clients can use COM
implementations, and any Java Virtual Machine can use a Java
implementation without additional layering. The OLAP Council plans to
specify a CORBA implementation of the MD-API as well.
CLIENT APPLICATION SUITABILITY
Tensor and MD-API diverge greatly in the types of applications they
support, although both APIs provide the means to pose sophisticated
queries.
OLAP clients can belong to one of two basic categories: dynamic clients
and reporting clients. A dynamic client allows the user to orient
dimensions arbitrarily on the axes or edges of a display (pivoting,
rotating, and so on), drill up and down on members, sort results in
various ways, and interactively and incrementally refine queries in
general. A reporting client is one in which a static query, perhaps
parameterized, is executed upon command. The more logic involved in
composing a query beyond simply filling parameter values into a
predefined query, the more a client is a dynamic browser than a
reporting client. (GUI report builders are similar to dynamic clients in
that they require a flexible internal representation for the query, and
require the ability to save queries as well.) The OLAP client universe
includes both categories; OLAP servers simplify the generation of canned
reports as well as dynamic views. And while these are archetypes of
action, a dynamic client should be able to save queries/views of
particular interest for future reference.
The MD-API provides much greater support for dynamic clients than
Tensor. All elements of a query are represented as objects, and those
objects may be incrementally modified with the primitives that match
what client tools try to provide. With the notable exception of the
modification history of each MemberQuery, the objects are
self-describing. Attaching GUI controls to the query objects and filling
GUI metadata controls from their API analogs are straightforward. The
major drawback of the MD-API is its lack of standard persistent
representation for the query objects; performing that work requires a
nontrivial amount of code in the client.
Tensor, in contrast, provides greater support for sophisticated
reporting clients and power users who can generate their own queries.
Text strings are trivial to load and save, and the expression language
is powerful. If a client can assume particular metadata, it can ignore
metadata calls completely. For example, default dimension members allow
static queries to be reused in various contexts. However, to provide
fine-grained control of queries--including dimension orientation and
incremental modification of parts of a query--clients will need to
implement the same types of query objects provided natively by the
MD-API and transform the rows of metadata into an appropriate form.
SIMILARLY DIFFERENT
The MD-API and Tensor represent two strikingly different techniques for
accomplishing largely the same tasks. In this article I did not attempt
to declare a "winner," but rather I tried to explore the strengths and
areas of needed improvement in approach. Ideally, the examination has
provided information helpful for your own evaluation and use of either
or both these interfaces.
George Spofford is a cofounder of Dimensional Systems, a Cambridge, Mass.-based consultancy. You can reach him at george@dimsys.com.