Calendar
August 2007
M T W T F S S
« Jul   Sep »
 12345
6789101112
13141516171819
20212223242526
2728293031  
Partners

Archive for August 22nd, 2007

Bridging the gap between structured and unstructured data

Wednesday, August 22nd, 2007

We’re often asked to compare approaches to managing structured and unstructured data and attempts to bridge the gap between the two.  Traditionally, technology practitioners who worried about unstructured data have been entirely different group to those that worried about structured data.

In fact, there are three types of data, structured, unstructured and a hybrid (records-oriented) grouping of semi-structured.  They have much in common and are all part of the enterprise information landscape.  In order to look at ways to leverage the relative strengths of the different types of data, it is important to first understand how they are used.

There are three primary applications of data within most enterprises.

The first is in support of operational processes.  In the case of structured data, these processes are usually complex from a system perspective but often quite transactional from a human perspective.  In the case of semi-structured and unstructured data, there is often less system intervention or interpretation of the data with a heavy reliance on human interpretation.

Secondly, each of the three is used for analysis.  In the case of structured, it is easy to understand how the analysis is undertaken.  With semi-structured/record data, analysis can be divided into aggregation of the structured components and a manual analysis of the free-text.  With unstructured, analysis is usually restricted to searching for like terms and manually evaluating the documents.

Finally, all three types of data are used as a reference to back-up decisions and provide an audit trail for operational processes.

MIKE2.0 recommends approaches to governance, architecture and integration which are independent of the structure of the data itself.

The majority of effort associated with all data, regardless of its form, is gaining access to it at the time when it’s needed.  In all three cases, there are processes to lookup or search the data.  SQL for structured data, lookups for semi-structured and tree-oriented folders for unstructured.  Increasing, the techniques for finding all three types are converging in one set of processes called Enterprise Search.

Ironically, despite the power of search, successful implementations are really mandating the implementation of common metadata and the use of a single enterprise metadata model.  Again, MIKE2.0 takes the information architect through these requirements in a lot of detail.

In the future, organisations can expect to keep all three forms of data (structured, semi-structured records and unstructured documents) in the same repositories.  However, there is no need to wait for this future utopia to begin leveraging all three in the same applications and managing them in a common way.

Powered by omCollab