Data Warehousing: Supporting
Business Intelligence
by Jonathan G. Geiger, Cutter Consortium
Business intelligence (BI) is the set of processes and data structures
used to understand a companys business environment and support
strategic analysis and decisionmaking. This article describes
the business value that BI capabilities provide, the architecture
needed to support the environment, and a sound approach for building
and managing it.
Business Value
BI has become popular because companies recognize that it provides
bottom-line benefits. One of the interesting facets of BI is that,
in itself, it doesnt do anything; it is merely a store of
information that is applied by people and systems to derive benefits.
This sometimes makes it difficult to justify the sizable investment
in the architecture that a sustainable BI environment requires.
Companies need to recognize that without such an investment, however,
they will not be in a position to leverage one of their most important
assets information.
The most significant generic benefit of the BI environment is
the collection in the data warehouse of the single,
consolidated, enterprise view of the data. Although there are
technical efficiency benefits, the major beneficiaries of the
single store of information are the business users. With this
consolidated information as a base, strategic analysts can get
to the data they need much easier, use the same figures as the
basis of their analysis, and have a common understanding of business
terms. A data warehouse built to support customer relationship
management, for example, will enable companies to know how many
customers they have and the profitability of each customer. Armed
with this information, a company can make sound business decisions
affecting both customer segments and individual customers.
Architecture
The corporate information factory is representative of a conceptual
architecture needed to support BI.1 This architecture creates
two distinct stores of information, each with a different set
of objectives and a different design. The data warehouse stores
the enterprises consolidated BI information. It includes
historical information to facilitate trend analysis and is updated
through a controlled set of processes, never through individual
transactions. The objective of the data warehouse is to serve
as a collection and dissemination point for the data. It collects
data from wherever that data may exist through a data acquisition
process, and it sends data to the data marts through a data delivery
process. The term relational is often used to describe
the design of the data warehouse.
Data marts are smaller data stores that are populated with data
from the warehouse and built to answer a specific set of business
questions or support a specific business function. The marts often
contain summarized data, and their objective is to provide the
business users with a store of information that can be easily
and quickly accessed and traversed. The terms dimensional
or star schema are often used to describe the most
popular type of data mart.
Some of the benefits of this architecture include flexibility,
durability, scalability, maintainability, and reusability. These
benefits are only available with the investment in the infrastructure,
and companies need to resist the temptation to demand a business
deliverable from the architecture itself; the architecture provides
the foundation. When an office building is built, much of the
cost goes into the foundation, plumbing, wiring, etc. The visible
deliverables are the individual offices, but they would not be
possible without the investment in the infrastructure. Furthermore,
if the infrastructure is built with the future in mind, we can
rearrange offices and move internal walls without needing to reconstruct
the building.
Within the BI architecture, segregation of the data warehouse
and the data marts protects the business users from changes to
the warehouse. The components of the architecture facilitate growth
(scalability), as each component can be addressed independently.
The data warehouse, combined with the data acquisition processes
that feed it, is designed to accommodate additional sources of
data (flexibility) without changing the architecture (durability).
By including conforming dimensions (components that can be used
in multiple data marts) and summaries, reusability is accommodated.
With a central store of information, as the business community
needs change, the data is often readily available, needing only
to be loaded into a new data mart to meet the new requirements
(maintainability).
The data acquisition process is a complex set of activities that
collects data from various sources and loads it into the data
warehouse. This process often consumes the majority of the development
effort. The most complicated part of this process involves making
business decisions concerning the source of data to be used, the
quality expectations to be met, and how data from multiple sources
will be integrated. The data delivery process filters, formats,
and delivers data from the data warehouse to the data marts. The
data warehouse itself is maintained by an enterprise data management
function to ensure that performance, reliability, and quality
expectations are met.
The architecture includes other important components. Meta data
is information about the data in the warehouse and the data marts
and how it is used. The decision support interface is a set of
end-user access tools used to obtain and navigate through the
data. The enterprise portal is the interface through which the
users get to the data warehouse and data marts. Just as a person
in an office doesnt think about the buildings foundation,
plumbing, and wiring, this portal should be designed to meld the
BI capabilities with business processes so that the architecture
itself disappears into the background.
Methodology
Building a sustainable BI environment requires a program orientation
to ensure that the investment in the architecture and infrastructure
is made. With this investment, companies can realize the benefits
previously cited and be in a position to add new capabilities
very quickly. The methodology itself is iterative, with each project
typically scheduled for completion within three to six months.
Each BI effort is a project within the program. It begins with
planning and initiation, in which the scope is defined, the expectations
are set, and the project plan is developed. The next two phases,
getting data in and getting information out,
may be executed in tandem. Within getting data in, the data warehouse
is designed and the data acquisition process is built. Within
getting information out, the data marts are designed, the data
delivery processes are built, and the end-user access facilities
are built. The last phase of each project is deployment, in which
the BI capability is moved into a production environment. Once
the warehouse is built, it must be maintained and managed to ensure
that growth is managed, performance expectations are met, and
that the business continues to get business value.
Summary
Successful BI initiatives require a program orientation, a sustainable
and flexible architecture, and an iterative development methodology.
The program orientation ensures that all of the individual efforts
are coordinated and that the work performed in one project is
leveraged in subsequent projects. The architecture needs to isolate
the data warehouse, which serves as the collection point for data
residing in the operational systems, and the data marts, which
serve as the primary access point for the business community.
The methodology consists of program management activities, as
well as compatible getting data in and getting
information out phases in the individual projects.