Archive for September, 2007
There is huge interest from clients in enterprise search, with the focus being how to create useful applications that go beyond documents or web pages. Increasingly, we’re seeing organizations that have invested in metadata for regulatory compliance discovering the value of this asset using search technologies and techniques.
The original web experience was intended to be click-based navigating via a number of hubs to any point in the internet, but the last five years has seen the majority of users move to a language-based approach starting with a site like Google or Yahoo. The example I often use is the rain radar, often when setting out to a meeting in a city I’ll check to see if rain is coming. In Melbourne I can navigate from the www.bom.gov.au website to the radar but it’s faster for me to type “Melbourne weather radar” into Google, with the added benefit that I can use the same interface when I’m in Auckland, Singapore, New York etc..
At work, users are still in the late 90′s relying on incomplete intranets and a poorly maintained web of links. The problem is primarily access to the structured repositories and even more importantly access to the structures of those repositories (ie., the metadata.
In many cases, banks have been the early adopters of metadata repositories followed by insurers and then the very large government departments. The main driver for these repositories has been compliance and (for banks) risk (Basel II). These repositories are enormously rich in content, but extremely difficult to interface to the rest of the organization’s information. Search can be the solution and I recommend the following three steps:
1. Interface to metadata repositories
In a bank, a user should be able to search for “Risk Weighted Asset” and find not only the relevant documents but also a list of the systems and databases that contain relevant data as well as appropriate controls, processes and business rules. It isn’t difficult to build interfaces between structured metadata and the search tools.
2. Interface to master data
The next step is to build an interface that allows the user to type “Assets Walmart 2005″ and find, via the metadata, appropriate queries which can then be launched in a BI tool (eg., Business Objects or Cognos). This is part of my view that search should be the kick-off point for all information analysis. Again, this sounds difficult but really isn’t, you can use the metadata repository to define the dimensions of search and emulate hints (ie., “Did you mean xyz”) to help if the user is almost on target.
3. Better analysis of the quality of search
The search index increasingly becomes an asset in its own right. Using the techniques in MIKE2.0, we can use do constant health checks on the usability and relevance of the search index itself.
The last 5 years have seen a rapid advancement in open source software development. The reason it works so well is that open source isn’t just about freely available code, it’s a different approach to developing software to can leverage a huge resource pool of available talent. In some cases, an open source product may be of higher quality than a commercially developed product.
Open Source Information Development
Open source content development is the other area that has exploded. Wikipedia is the best example of collaborative development of content where authors build upon the work of others to release a product that anyone can edit.
What about open source data? We’re starting to see it too. A great example is the OpenStreetMap project, where individuals are “mapping the world” by building a repository of geodata and point of interest overlay in a wiki. OpenStreetMap looks to form a credible competitor to the government geodata providers – with the costly ones in Europe providing a particularly attractive target. Some of the ideas about what it may achieve in the future are even more exciting.
Software Development vs. Information Development
When developing open source code or information there are many similarities. The biggest differences have to do with commit rights and release cycles. Whereas code is released in cycles, wikis tend to have content changing all the time. This provides maximum value in terms of encouraging contributions but results in instability.
The other issue is that of authorization to contribute. In a code model there are typically controls around commit rights and a test process to ensure the code developed matches the planned specification.
To get more open source Information Development, we think a hybrid model makes sense in some scenarios. This is what we’ve done with the open methodology framework in MIKE2.0 in an attempt to add stability and reliability. While there are certainly some downsides, we believe that in some cases it can provide the best model for development.
Where Else Might it Apply?
The best case for the approach seems to be related to the development of standards as a means for effective collaboration. I can imagine a few nightmare scenarios for open source information development related to personal privacy, but by looking at the geodata providers we may see other candidates – such as credit agencies – that will face competition through a variation of the open model.
In this post, I want to remind readers that information is not abstract, it is something real and follows the laws of physics. Information theory talks about encoding information using predictable patterns, which have to be represented by some type of device. Such a device would typically use electrical energy in some form to represent each bit so it is no surprise that conservation of energy laws apply: that is creating one piece of information must destroy another.
Why does this matter? Information is finite and discovering one thing inevitably means that something else is either lost or in some way reduced in value. Anyone with a physics background might think of this as an extension on the Heisenberg uncertainty principle.
From a business management perspective, it means that the enterprise customer list cannot be used in an infinite number of ways without degenerating the value of the content. While intuitively true, I argue that it is also mathematically true, through the fact that applying information such as customer details also derives information about its application. Deriving information about its application must reduce information (significant or otherwise) from elsewhere. Usually, this reduction is significant – the process of finding out a customer needs a new service usually reduces the confidence in earlier analysis and limits the ability to target the same customer in other ways.
It took about 25 years for the ARPA initiative to evolve into the web and about 10 years before the advent of techniques and technologies arose that make up web 2.0. Although it’s a bit early, we’ll probably start to see the momentum around web 3.0 before the end of the decade. Web 2.0 was a bottom-up movement and Enterprise 2.0 is about making use of these capabilities. We may see Enterprise 3.0 connect with Web 3.0 earlier and even help drive the need for new technologies. Besides the typical “vice drivers”, what would drive demand for Enterprise 3.0 from the business side?
- Fixing healthcare – an avalanche of costs is breaking current systems, whether in the countries like the US or in those applying more socialized models.
- Enabling virtual shoring – organizations will want to improve their use of offshoring, outsourcing and physically separated staff. “Virtual shoring” in virtual worlds can help provide the answer through better collaboration and greatly reduced travel costs.
- Bank of the Future – firms and individuals will continue to strive for new ways to raise capital, manage liquidity and hedge risk. Retail consumers will want a richer and interactive experience provided that simplifies their lives through use of technology, not a branch. Institutions will want a richer technology experience too – as well as the ability to bring in information from all types of sources in the decisioning process.
On the supply side, the fundamental change will be open source, globalization and technology exponentiation factors.
In areas such as health care, we’ll see these factors work together. A virtual treatment room may be a way to reduce cost by providing access to. Increasingly individuals don’t have pensions and they will look to have innovative product options that make long term health care affordable.2 technology areas look to stand out that will support these new models:
- User Interactivity – the current interface needs a major upgrade. Visualization will become much, much richer and collaboration easier.
- Information Development – information currency will continue to get more valuable. This will be driven by more systems and types content, greater abstraction levels and that increasingly decisions will be made in an automated fashion independent of human intervention.
So what might if look like? Think of an enhanced version of Second Life combined with everything Google is doing on steroids. Second Life for individuals to act in new virtual communities for commerce, healthcare and education; Google to link it back to the real world. If this sounds far off, its not. Look at what organizations such as IBM, Dell and ABN Amro are doing in the virtual world. Some of its very early (and some would argue PR oriented) but its happening.
The concept of a single version of the truth has gained currency in Information Management to the point of being a mantra and I believe it is appropriate to introduce a few words of caution.
If I can be philosophical for a moment, Information Management theory is starting to turn the many old views of the world, including the way physics describes objects. This isn’t a long bow to draw for Information Management professionals: Imagine a white statue in the park. If you put on rose colored glasses, what color is the statue? If you get everyone in the park to put on rose colored glasses, what color is the statue? If you cover the statue in rose colored cellophane? If you paint in statue with rose colored paint?
Survey any group and you are likely to get different answers to the color of the statue under the different circumstances I’ve outlined here. If you at least said that painting the statue changed its color then you are admitting that it is the information (in this case color) that you receive that is important rather than what might exist in any deeper layers.
Those same rose colored glasses can apply to the enterprise data warehouse. One version of the truth that forces everyone to see the data through rose colored glasses does not make the data rose colored! Accountants, however, have thought long and hard about this and have rules for how you can clean-up variances that can’t be reconciled. The most important thing is to ensure that all observers agree rather than just observe the same result, and that includes reasonable outsiders, executives and analysts.
In the wake of a recent adverse court finding for a major media company in Australia, I was interviewed by The Age on the business dangers of email. I argued that communication within and between companies should be embraced rather than feared, but proper governance inevitably meant that one-on-one emails were not the best way to manage this unstructured content.
You can read the full article at http://www.theage.com.au/news/business/teamwork-avoids-dangers-of-oneonone-emails/2007/09/09/1189276544211.html or listen the podcast at the same site.
In Small Worlds Data Transformation Measures, Rob wrote about the challenges of data modelling in today’s complex, federated enterprise. This is explained through an overview on Graph Theory, which provides the foundation for the relational modelling techniques first developed by Ted Cod over 30 years ago.
Relational Theory has been one of the stalwarts of software engineering. It is governed by a Codd’s rules, which have fundamentally stayed intact despite the rapid advances in other areas of software engineering – a testament to their effectiveness and simplicity.
While evolutions have taken place over time and there have been some variations to approach (e.g. dimensional modelling), the changes have built on the relational theory foundation and abided by its design principles.
But is it time for a change? Are some of the issues we are seeing today the result of the foundation starting to crumble due to complexity? Or is it that there are so many violations of Codd’s Rules? While the latter is certainly a contributing factor, it may be that relational theory is starting to wear under the weight of our modern systems infrastructures – and the issues will continue to get worse. Whereas there does not appear to be an equivalent approach to relational theory that will address the issues we see today, we think Small Worlds Theory and Web 2.0 may provide some ideas for a new approach.
Small Worlds Theory helps provide rationale for a different approach to modelling information. Small Worlds Theory tells us that for a complex system to be manageable it must be designed as an efficient network and that many systems (biological, social or technological) follow this approach. Although the information across organizations is highly federated, it does not inter-relate through an efficient network. As opposed to building a single enterprise data model, it is the services model that includes the modelling of “data in motion” that should be incopoporated into the comprehensive approach.
In addition to better modelling of federated data, new techniques should also to bring in unstructured content. This includes the information from the “informal network” such as that developed in wikis and blogs. While there are standards to add structure to unstructured content, their uptake has been slow. People prefer a quick and easy approach to classification, especially for content that is more informal in nature.
Therefore, the approach may involve the use of categories and taxonomies to bring together collaborative forms of communications and link it to the formal network. Both Andi Rindler and Jeremy Thomas have discussed some work we are doing in their area on the MIKE2.0 project on their blog posts. We’re also starting to see the implementation of some very cool ideas for dynamically bringing together tagging concepts such as the Tagline Generator.
In summary, whereas an approach based on a mathematical foundation is a required to provide a solution equivalent to Codd’s and there is a grand vision for a “semantic web”, we may chip away at the problem through a variety of techniques. Just as Search is already providing a common for mechanism for data access, other techniques may help with information federation and unstructured content.
If an organization is going to move towards a Center of Excellence model for Information Development, we are some asked: “does it makes sense for this to be done offshore?” This seems a logical question, as organizations are increasingly moving their delivery capabilities offshore, especially for large application development and systems integration projects.
Although we encourage organizations to think about Information Development as a competency analogous to application development, it isn’t just something you can give to a separate group – it is a cultural change that must go across the company. While expertise can certainly be brought in from the outside, it’s also a capability that must exist internally.
Offshore Information Development should incorporate the following principles:
- It is the governance standards, policies and processes that enable an Information Development approach. These are the same in an offshore or onshore model.
- An Information Development team can be a physical (i.e. a dedicated team) or a virtual (i.e. members have other significant roles). In most cases there is a combination of dedicated and shared resources.
- For any sizeable offshore team, it will need to contain representation as part of the Information Development Center of Excellence.
- Information Development crosses business boundaries and requires participation from senior execs to line staff. Therefore, it is not a delivery capability that can be built completely offshore.
- The organizational model will evolve over time and individuals in assigned roles are typically needed to drive the transition to new organizational models.
In summary, organizations should make sure they have a strong onshore capability for Information Development, even if much of their development occurs offshore. Whatever the delivery model, the key to success is Information Governance through open and common standards, architectures and policies.
One of the difficult aspects of Information Development is that organizations cannot “start over” – they need to fix the issues of the past. This means that the transition to a new model must incorporate a significant transition from the old world
Most organizations have a very poorly defined view of their current state information architecture: models are undefined, quality is unknown and ownership is unclear. A product that models the Enterprise Information Architecture and provides a path for transitioning to the future state would therefore be extremely valuable.
Its capabilities could be grouped into 2 categories:
Things that can be done today, but typically through multiple products
- Can be used to define data models and interface schemas
- Provides GUI-based transformation mapping such as XML schema mapping or O-R mapping
- Is able to profile data and content to identify accuracy, consistency or integrity issues in a once-off or ongoing fashion
- Takes the direct outputs of profiling and incorporates these into a set of transformation rules
- Helps identify data-dependent business rules and classifies rule metadata
- Has an import utility to bring in common standards
New capabilities typically not seen in products today
- An ability to assign value to information based on its economic value within an organization
- Provides an information requirements gathering capability that includes drill down and traceability mapping are available across requirements
- Provides a data mastering model that shows overlaps of information assets across the enterprise and rules for its propagation
- Provides an ownership model to assign individual responsibility for different areas of the information architecture (e.g. data stewards, data owners, CIO)
- Has a compliance feature that can be run to check adherence to regulations and recommended best practices
- Provides a collaborative capability for users to jointly work together for better Information Governace
In summary, this product would be like an advanced profiling tool, enterprise architecture modelling tool and planning, budgeting and forecasting tool in one. It would be a major advantage to organizations on their path to Information Development.
Today’s solutions for Active Metadata Integration and Model Driven Development seem to provide the starting point for this next generation product. Smaller software firms such as MetaMatrix provided some visionary ideas to begin to move organizations to model driven Information Development. The bi-directional metadata repositories provided by the major players such as IBM and Informatica area a big step in the right direction. There is, however, a significant opportunity for a product that can fill the gap that exists today.
TODAY: Fri, March 24, 2017September2007