15 Oct 2009
How to Build an Effective Data Warehouse.
Within the past 20 years, companies have accumulated multitudes of data from their operations- data that gives them insight into process efficiencies, consumer preferences, product effectiveness, etc. According to popular belief, this information is said to double every 18 months. The problem here is not a lack of abundance, it is that the available data is not readily usable for strategic decision making. It needs compiled, harvested and manipulated with the ability to be reported on in a minutes notice for a manager to make an informed decision. The question for many IT departments is, how do we do this?
What are some techniques that you’ve used in the past that have enabled your company to build and maintain an effective data warehouse? What are some obstacles we should avoid?


October 15th, 2009 at 2:28 am
I think much of the problem has arisen from IT using the incorrect system to house the data. Many companies are using operational systems, such as inventory, order processing and accounting which are not designed or intended to provide strategic information. Only correctly designed decision support systems or informational systems can provide the type of strategic information that managers need to make informed decisions.
October 15th, 2009 at 3:58 am
Good point Michael! I agree that many of us are focusing systems that can’t adequately house and report on the data we need.
Rob Rasor of IT Specialist Group also suggests the following:
Data can become useless when it becomes dated. So, immediate use is extremely important and very mission critical. When we deal with data warehousing, I actually start with data harvesting. I want to make sure my customers obtain accurate data quickly. Then, building an effective warehouse to be able to utilize the data is my next step. Most of my customers want information on their customers and who they are verses necessarily what they are buying. This leads to multiple statistical analysis options once they know who the customers are. We have had success with simple solutions like SQL or even MySQL databases. Anything that marries the warehouse with the software for analysis through easy transfer methodologies have been successful for us. But, such data must be relational and married to the customer or all is for nothing.
October 15th, 2009 at 4:29 am
I agree with Michael and would add that many companies decide to use their generic IT resources/people (cost, required buy-in from IT Department) to do what is often a very focused skill based job. Just because you know how to use a database and even understand the source database doesn’t mean you can do BI.
October 15th, 2009 at 9:38 am
Looking back, we would have been more successful if we identified and defined upfront, the need for an operations group to handle data fixes (e.g. what we called ‘errors’). We were tight on the coding and technology front, but less rigorous on the operational components.
But similar to other folks, we did have an initial data discovery phase, whereby a group of subject matter experts and interested parties (e.g. superusers) worked on the data analysis and identification of data/fields with our technology team. This helped both organizations tremendously in anticipating later data hiccups
October 15th, 2009 at 10:01 am
In addition to these great comments, the DW needs to be supported by business leaders, culture, sponsor who will see the importance of clear and common defintions of measurements with the DW/BI team. It’s natural for business leaders to want to define custom measurements/criteria in order to make outcomes as favorable as possible. If the business users cannot control the measurement definitions for say something like revenue, and they find the results unfavorable, they will be incented to declare the data invalid. If these tendencies are not somehow kept in check, and if the DW cannot clearly demonstrate data accuracy back to source systems, then DW personnel are likely spend a good deal of time justifying the program’s value.
October 15th, 2009 at 10:42 am
The problem is not necessarily in IT. In most environments there is still a communication gap between the business leaders and IT. Where there is no gap, it may be the tools that are the problem. Knowledge that can be acted on needs to be timely, or the opportunity is lost. Where there is a gap, IT does not always know enough to move in the direction the business users want and the business users do not know what can be harvested. Effective BI relies not only on a good Data Warehouse, but also, on an effective collaborative process for mining the data. I have seen it done effectively in environments where certain business users are given only the responsibility of mining the data (with the help of IT of course) so that they would create opportunities for decision makers to act. In the manufacturing or pharmaceutical realm, it is sort of like having research or technology teams building the materials that are then used to develop products by product development teams.
October 15th, 2009 at 4:25 pm
There are many slants to this one. Possibly the most effective way to NOT build an effective Data Warehouse is to set out to build a Data Warehouse – i.e. the old ‘build it and they will come’ idea. I’ve worked at two fairly large companies where massive IT led warehouse initiatives have become expensive white elephants. The common thread was that the IT team saw themselves as Data Warehouse builders and were disconnected from BI delivery. I suspect that is where some of Kimball’s thinking comes from – focus on BI delivery, conform your dimensions, but don’t focus too much on the plumbing as it isn’t the end game.
October 15th, 2009 at 11:45 pm
There are many challenges as everyone correctly points out Access, scalability, high cost, poor performance). Using a system to store and access data that was never designed for it is at the core. In addition, using systems that require heavy manipulation (indexing and tuning) to get the high performance that real Business Intelligence requires makes true insights and pure ad hoc user community queries impossible. We have had over 20 years of experience in what not to do (due to our exposure to fortune 500 co’s). But in essence by using our column based, load and go corporate information factory approach we help companies solve many of the traditional issues and enable real BI. Anyone interested in a specific explanation is welcome to email me daniel.schwartz@sand.com
October 16th, 2009 at 2:04 am
Great comments! In addition to using the wrong systems, I especially agree that most of the problem is due to the communication gap between IT and business managers. The IT department may not be aware of the day-to-day decisions that business leaders have to make and the business leaders don’t know if and how IT can help them. There certainly needs to be more collaboration between the two to explain the benefits of the correct system and to make it more user-friendly so that managers can engage directly with it.
October 16th, 2009 at 2:12 am
As a sidenote, MIKE2.0 is currently working on an open-source Data Warehousing Solution and could use some of your insight – http://mike2.openmethodology.org/wiki/Data_Warehousing_Solution_Offering. We’re open to your suggestions to improve this offering and encourage your dialogue about it.
October 17th, 2009 at 5:32 am
Hi – lots of really good comments. Lots of places I have been that have issues take very much a source-driven approach to Data Warehousing as opposed to figuring out what is needed from a business perspective. The DW winds up just looking like a staging area. Its really important to start from the usage side when formulating requirements while making sure to apply backend activities like data profiling in your project methodology to mitigate risks of delivery failure.
Has anyone on this thread applied semantic integration technologies in this space? Its something I am working on a bit now with a client.
October 22nd, 2009 at 7:38 am
I would add that a key challenge is the general lack of people that have a of deep understanding and appreciation of data/information. I’m referring to the abilities to:
- understand data structures, can truly think dimensionally, and know how they can enhance the agility of organisations to
adapt to change
- can explicitly link data to corporate performance outcomes and communicate this to both IT and the business
- understand and effectively manage the data lifecycle
I know a few who have these skills and they are worth their weight in gold. I never attempt to deliver a data warehouse without having at least two people like this in my team.
Like Brenda above,I also support the MIKE2.0 open-source Data Warehousing Solution work. Now if a can just have an open source tool that extracts, conforms and stores my data automatically …
October 23rd, 2009 at 10:49 pm
The definition of what constitutes a DW is open ended and it changes over time as the data is used (or not). Does it include the operational databases? Many times yes. Does it include report writers, extractors, and other tools to get at the data? Probably yes. Does it require staging databases? Probably yes but very costly assumption.
The complete picture of the DW cannot be defined at a point in time, it must evolve together with the business. The CEO of a new business can state one morning that for sure he/she needs a specified report every two hours to run the business. This is a good reason to dedicate resources and create an efficient process just for that. At another opportunity the same CEO would state the need to look for up to date data on a moment’s notice not knowing what data until the need arises. Good reason to have an expert on call by the CEO that can understand the urgent needs and translate them into an actual formatted report. Over time, executives form opinions on the growing new business and formalize set of data requirements and processes to run the business effectively. Good opportunity to expand on the notion of DW and include the tools and data needed to satisfy them.
The whole issue of DW is the same old notion of sharing data… hence databases! read vs update balance! known requirements vs not yet known!
At no time any data should be staged, replicated, indexed, manipulated without an implicit need to justify the costly move.
A good corporate data model should evolve to include the needs of a DW. Yes, you can separate low update data (or read only) from frequently updated data and treat the two differently with indexes and all sorts of replication and staging to answer potential and known retrieval needs and performance but it is not a given that DW is not operational and vice versa. The way I see it there is one comprehensive data model that serves all corporate data needs.
Mr Mudd and Dan Schwartz: very strong points you made.
The page is short and to conclude it it: we are all repeating the elephant and the blind story… there is only one data model and there are many ways of looking at it, using it, and naming it.