Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Archive for the ‘Business Intelligence’ Category

by: Alandduncan
13  May  2014

Is Your Data Quality Boring?

Is this the kind of response you get when you mention to people that you work in Data Quality?!

Let’s be honest here. Data Quality is good and worthy, but it can be a pretty dull affair at times. Information Management is something that “just happens”, and folks would rather not know the ins-and-outs of how the monthly Management Pack gets created.

Yet I’ll bet that they’ll be right on your case when the numbers are “wrong”.


So here’s an idea. The next time you want to engage someone in a discussion about data quality, don’t start by discussing data quality. Don’t mention the processes of profiling, validating or cleansing data. Don’t talk about integration, storage or reporting. And don’t even think about metadata, lineage or auditability. Yaaaaaaaaawn!!!!

Instead of concentrating on telling people about the practitioner processes (which of course are vital, and fascinating no doubt if you happen to be a practitioner), think about engaging in a manner that is relevant to the business community, using language and examples that are business-oriented. Make it fun!

Once you’ve got the discussion flowing in terms of the impacts, challenges and inhibitors that get in the way of successful business operations, then you can start to drill into the underlying data issues and their root causes. More often than not, a data quality issue is symptomatic of a business process failure rather than being an end in itself. By fixing the process problem, the business user gains a benefit, and the data in enhanced as a by-product. Everyone wins (and you didn’t even have to mention the dreaded DQ phrase!)

Data Quality is a human thing – that’s why its hard. As practitioners, we need to be communicators. Lead the thinking, identify the impact and deliver the value.

Now, that’s interesting!

Category: Business Intelligence, Data Quality, Enterprise Data Management, Information Governance, Information Management, Information Strategy, Information Value, Master Data Management, Metadata
No Comments »

by: Alandduncan
12  May  2014

The Information Management Tube Map

Just recently, Gary Allemann posted a guest article on Nicola Askham’s Blog, which made an analogy between Data Governance and the London Tube map. (Nicola also on Twitter. See also Gary Allemann’s blog, Data Quality Matters.)

Up until now, I’ve always struggled to think of a way to represent all of the different aspects of Information Management/Data Governance; the environment is multi-faceted, with the interconnections between the component capabilities being complex and not hierarchical. I’ve sometimes alluded to there being a network of relationship between elements, but this has been a fairly abstract concept that I’ve never been able to adequately illustrate.

And in a moment of perspiration, I came up with this…

I’ll be developing this further as I go but in the meantime, please let me know what you think.

(NOTE: following on from Seth Godin’ plea for more sharing of ideas, I am publishing the Information Management Tube Map under Creative Commons License Attribution Share-Alike V4.0 International. Please credit me where you use the concept, and I would appreciate it if you could reference back to me with any changes, suggestions or feedback. Thanks in advance.)

Category: Business Intelligence, Data Quality, Enterprise Data Management, Information Development, Information Governance, Information Management, Information Strategy, Information Value, Master Data Management, Metadata
No Comments »

by: Alandduncan
29  Mar  2014

Now that’s magic!

When I was a kid growing up in the UK, Paul Daniels was THE television magician. With a combination of slick high drama illusions, close-up trickery and cheeky end-of-the-pier humour, (plus a touch of glamour courtesy of The Lovely Debbie McGee TM), Paul had millions of viewers captivated on a weekly basis and his cheeky catch-phrases are still recognised to this day.

Of course. part of the fascination of watching a magician perform is to wonder how the trick works. “How the bloody hell did he do that?” my dad would splutter as Paul Daniels performed yet another goofy gag or hair-raising stunt (no mean fear, when you’re as bald as a coot…) But most people don’t REALLY want to know the inner secrets, and ever fewer of us are inspired to spray a riffle-shuffled a pack of cards all over granny’s lunch, stick a coin up their nose or grab the family goldfish from its bowl and hide it in the folds of our nether-garments. (Um, yeah. Let’s not go there…)

Penn and Teller are great of course, because they expose the basic techniques of really old, hackneyed tricks and force more innovation within the magician community. They’re at their most engaging when they actually do something that you don’t get to see the workings of. Illusion maintained, audience entertained.

As data practitioners, I think we can learn a few of these tricks. I often see us getting too hot-and-bothered about differentiating data, master data, reference data, metadata, classification scheme, taxonomy, dimensional vs relational vs data vault modelling etc. These concepts are certainly relevant to our practitioner world, but I don’t necessarily believe they need to be exposed at the business-user level.

For example, I often hear business users talking about “creating the metadata” for an event or transaction, when they’re talking about compiling the picklist of valid descriptive values and mapping these to the contextualising descriptive information for that event (which by my reckoning, really means compiling the reference data!). But I’ve found that business people really aren’t all that bothered about the underlying structure or rigour of the modelling process.

That’s our job.

There will always be exceptions. My good friend and colleague Ben Bor is something a special case and has the talent to combine data management and magic.

But for the rest of us mere mortals, I suggest that we keep the deep discussion of data techniques for the Data Magic Circle, and just let the paying customers enjoy the show….

Category: Business Intelligence, Data Quality, Enterprise Data Management, Information Development, Information Governance, Information Management, Information Strategy, Information Value, Master Data Management, Metadata
No Comments »

by: John McClure
06  Mar  2014

Grover: A Business Syntax for Semantic English

Grover is a semantic annotation markup syntax based on the grammar of the English language. Grover is related to the Object Management Group’s Semantics of Business Vocabulary and Rules (SBVR), explained later. Grover syntax assigns roles to common parts of speech in the English language so that simple and structured English phrases are used to name and relate information on the semantic web. By having as clear a syntax as possible, the semantic web is more valuable and useful.

An important open-source tool for semantic databases is SemanticMediaWiki that permits everyone to create a personal “wikipedia” in which private topics are maintained for personal use. The Grover syntax is based on this semantic tool and the friendly wiki environment it delivers, though the approach below might also be amenable to other toolsets and environments.

Basic Approach. Within a Grover wiki, syntax roles are established for classes of English parts of speech.

  • Subject:noun(s) -- verb:article/verb:preposition -- Object:noun(s)

refines the standard Semantic Web pattern:

  • SubjectURL -- PredicateURL -- ObjectURLwhile in a SemanticMediaWiki environment, with its relative URLs, this is the pattern:
  • (Subject) Namespace:pagename -- (Predicate) Property:pagename -- (Object) Namespace:pagename.


In a Grover wiki, topic types are nouns, more precisely nounal expressions, are concepts. Every concept is defined by a specific semantic database query, these queries being the foundation of a controlled enterprise vocabulary. In Grover every pagename is the name of a topic and every pagename includes a topic-type prefix. Example: Person:Barack Obama and Title:USA President of the United States of America, two topics related together through one or more predicate relations, for instance “has:this”. Wikis are organized into ‘namespaces’ — its pages’ names are each prefixed with a namespace-name, which function equally as topic-type names. Additionally, an ‘interwiki prefix’ can indicate the URL of the wiki where a page is located — in a manner compatible with the Turtle RDF language.

Nouns (nounal expressions) are the names of topic-types and or of topics; in ontology-speak, nouns are class resources or nouns are individual resources but rarely are nouns defined as property resources (and thereby used as a ‘predicate’ in the standard Semantic Web pattern, mentioned above). This noun requirement is a systemic departure from today’s free-for-all that allows nouns to be part of the name of predicates, leading to the construction of problematic ontologies from the perspective of common users.verbsIn a Grover wiki, “property names” are an additional ontology component forming the bedrock of a controlled semantic vocabulary. Being pages in the “Property” namespace means these are prefixed with the namespace name, “Property”. However the XML namespace is directly implied, for instance has:this implies a “has” XML Namespace. The full pagename of this property is “Property:has:this. The tenses of a verb — infinitive, past, present and future — are each an XML namespace, meaning there are separate have, has, had and will-have XML Namespaces. The modalities of a verb are also separate XML Namespace, may and must. Lastly the negation form for verbs (involving not) are additional XML Namespaces.

The “verb” XML Namespace name is only one part of a property name. The other part of a property name is either a preposition or it is a grammatical author. Together, these comprise an enterprise’s controlled semantic vocabulary.

As in English grammar, prepositions are used to relate an indirect object or object of a preposition, to a subject in a sentence. Example: “John is at the Safeway” uses a property named “is:at” to yield the triple Person:John -- is:at -- Store:Safeway. There are approximately about one hundred english prepositions possible for any particular verbal XML Namespace. Examples: had:from, has:until and is:in.
As in English grammar, articles such as “a” and “the” are used to relate direct objects or predicate nominatives to a subject in a sentence. As for prepositions above, articles are associated with a verb XML Namespace. Example: has:a:, has:this, has:these, had:some has:some and will-have:some.

adjectivesIn a Grover wiki, definitions in the “category” namespace include adjectives, such as “Public” and “Secure”. These categories are also found in a controlled modifier vocabulary. The category namespace also includes definitions for past participles, such as “Secured” and “Privatized”. Every adjective and past participle is a category in which any topic can be placed. A third subclass of modifiers include ‘adverbs’, categories in which predicate instances are placed.

That’s about all that’s needed to understand Grover, the Business Syntax for Semantic English! Let’s use the Grover syntax to implement a snippet from the Object Management Group’s Semantics of Business Vocabulary and Rules (SBVR) which has statements such as this for “Adopted definition”:

adopted definition
Definition: definition that a speech community adopts from an external source by providing a reference to the definition.
Necessities: (1) The concept ‘adopted definition’ is included in Definition Origin. (2) Each adopted definition must be for a concept in the body of shared meanings of the semantic community of the speech community.


Now we can use Grover’s syntax to ‘adopt’ the OMG’s definition for “Adopted definition”.
Concept:Term:Adopted definition -- is:within -- Concept:Definition
Concept:Term:Adopted definition -- is:in -- Category:Adopted
Term:Adopted definition -- is:a -- Concept:Term:Adopted definition
Term:Adopted definition -- is:also -- Concept:Term:Adopted definition
Term:Adopted definition -- is:of -- Association:Object Management Group
Term:Adopted definition -- has:this -- Reference:
Term:Adopted definition -- must-be:of -- Concept:Semantic Speech Community
Term:Adopted definition -- must-have:some -- Concept:Reference

This simplified but structured English permits the widest possible segment of the populace to participate in constructing and perfecting an enterprise knowledge base built upon the Resource Description Framework.

More complex information can be specified on wikipages using standard wiki templates. For instance to show multiple references on the “Term:Adopted definition” page, the “has:this” wiki template can be used:
Multi-lingual text values and resource references would be as follows, using the wiki templates (a) {{has:this}} and (b) {{skos:prefLabel}}
{{has:this |@=en|@en=Reference:}}
{{skos:prefLabel|@=en;de|@en=Adopted definition|@de=Angenommen definition}}

One important feature of the Grover approach is its modification of our general understanding about how ontologies are built. Today, ontologies specify classes, properties and individuals; a data model emerges from listings of range/domain axioms associated with a propery’s definition. Instead under Grover, an ontology’s data models are explicitly stated with deontic verbs that pair subjects with objects; this is an intuitively stronger and more governable approach for such a critical enterprise resource as the ontology.

Category: Business Intelligence, Enterprise Content Management, Enterprise Data Management, Enterprise2.0, Information Development, Semantic Web
No Comments »

by: Alandduncan
04  Mar  2014

The (Data) Doctor Is In: ADD looks for a data diagnosis…

Being a data management practitioner can be tough.

You’re expected to work your data quality magic, solve other people’s data problems, and help people get better business outcomes. It’s a valuable, worthy and satisfying profession. But people can be infuriating and frustrating, especially when the business user isn’t taking responsibility for the quality of their own data.

It’s a bit like being a Medical Doctor in general practice.

The patent presents with some early indicative symptoms. The MD then performs a full diagnosis and recommends a course of treatment. It’s then up to the patient whether or not they take their MD’s advice…

AlanDDuncan: “Doctor, Doctor. I get very short of breath when I go upstairs.”
MD: Yes, well. Your Body Mass Index is over 30, you’ve got consistently high blood pressure, your heatbeat is arrhythmic, and cholesterol levels are off the scale.”
ADD: “So what does that mean, doctor?”
MD: “It means you’re fat, you drink like a fish, you smoke like a chimney, your diet consists of fried food and cakes and you don’t do any exercise.”
ADD: “I’m Scottish.”
MD: “You need to change your lifestyle completely, or you’re going to die.”
ADD: “Oh. So, can you give me some pills?….”

If you’re going to get healthy with your data, you’ll going to have to put the pies down, step away from the Martinis and get off the couch folks.

Category: Business Intelligence, Data Quality, Information Development, Information Governance, Information Management, Information Strategy, Information Value, Master Data Management, Metadata
No Comments »

by: Robert.hillard
17  Dec  2013

Turning decision making into a game

Organisations are more complex today than ever before, largely because of the ability that technology brings to support scale, centralisation and enterprise-wide integration.  One of the unpleasant side effects of this complexity is that it can take too long to get decisions made.

With seemingly endless amounts of information available to management, the temptation to constantly look for additional data to support any decision is too great for most executives.  This is even without the added fear of making the wrong decision and hence trying to avoid any decision at all.

While having the right information to support a choice is good, in a world of Big Data where there are almost endless angles that can be taken, it is very hard to rule a line under the data and say “enough is enough”.

Anyone who has ever studied statistics or science would know that the interpretation of results is something that needs to be based on criteria that have been agreed before the data is collected.  Imagine if the veracity of a drug could be defined after the tests had been conducted, inevitably the result would be open to subjective interpretation.


The application of game mechanics to business is increasingly popular with products in the market supporting business activities such as sales, training and back-office processing.

Decision making, and the associated request for supporting data, is another opportunity to apply game mechanics.  Perhaps a good game metaphor to use is volleyball.

As most readers will know, volleyball allows a team to hit a ball around their own side of the court in order to set it up for the best possible return to the opposition.  However, each team is only allowed to have three hits before returning it over the net, focusing the team on getting it quickly to the best possible position.

Management decision making should be the same.  The team should agree up-front what the best target position would be to get the decision “ball” to and make sure that everyone is agreed on the best techniques to get the decision there.  The team should also agree on a reasonable maximum number of “hits” or queries to be allowed before a final “spike” or made.


That might mean for job interviews there will be no more than three interviews.  For an investment case there will be no more than two meetings and three queries for new data.  For a credit application there will no more than two queries for additional paperwork.

The most important aspect of improving the speed of decision making is the setting of these rules before the data for the decision is received.  It is too tempting once three interviews with a job candidate have been completed to think that “just one more” will answer all the open questions.  There will always be more data that could make an investment case appear more watertight.  It is always easier to ask another question than to provide an outright yes or no to a credit application or interview candidate.

But simply setting rules doesn’t leverage the power of gamification.  There needs to be a spirit of a shared goal.  Everyone on the decision side of the court needs to be conscious that decision volleyball means that each time they are “hitting” the decision to the next person they are counting down to one final “spike” of a yes or no answer.  The team are all scored regardless of whether they are the first to hit the “decision ball” or the last to make the “decision spike”.

Shared story

Games, and management, are about more than a simple score.  They are also about shared goals and an overarching narrative. The storyline needs to be compelling and keep participants engaged in between periods of intense action.

For management decisions, it is important that the context has an engaging goal that can be grasped by all.  In the game of decision volleyball, this should include the challenge and narrative of the agile organisation.  The objective is not just to make the right decision, but also along the way to set things up for the final decision maker to achieve the “spike” with the best chance of a decision which is both decisive and right.

The game of decision volleyball also has the opportunity to bring the information providers, such as the business intelligence team, into the story as well.  Rather than simply providing data upon request, without any context, they should be engaged in understanding the narrative that the information sits in and how it will set the game up for that decisive “spike”.

Category: Business Intelligence
No Comments »

by: John McClure
07  Sep  2013

Data Cubes and LOD Exchanges

Recently the W3C issued a Candidate Recommendation that caught my eye about the Data Cube Vocabulary which claims to be both very general but also useful for data sets such as survey data, spreadsheets and OLAP. The Vocabulary is based on SDMX, an ISO standard for exchanging and sharing statistical data and metadata among organizations, so there’s a strong international impetus towards consensus about this important piece of Internet 3.0.

Linked Open Data (LOD) is as W3C says “an approach to publishing data on the web, enabling datasets to be linked together through references to common concepts.” This is an approach based on W3′s Resource Desription Frmework (RDF), of course. So the foundational ontology actually to implement this quite worthwhile but grandiose LOD vision is undoubtedly to be this Data Cube Vocabulary — very handily enabling the exchange of semantically annotated HTML tables. Note that relational government data tables can now be published as “LOD data cubes” which are shareable with a public adhering to this new world-standard ontology.

But as the key logical layer in this fast-coming semantic web, Data Cubes very well may affect the manner an Enterprise ontology might be designed. Start with the fact that Data Cubes are themselves built upon several more basic RDF-compliant ontologies:

The Data Cube Vocabulary says that every Data Cube is a Dataset that has a Dataset Definition (more a “one-off ontology” specification). Any dataset can have many Slices of a metaphorical, multi-dimensional “pie” of the dataset. Within the dataset itself and within each slice are unbounded masses of Observations – each observation has values for not only the measured property itself but also any number of applicable key values — that’s all there’s to it, right?

Think of an HTML table. A “data cube” is a “table” element, whose columns are “slices” and whose rows are “keys”. “Observations” are the values of the data cells. This is clear but now the fun starts with identifying the TEXTUAL VALUES that are within the data cells of a given table.

Here is where “Dataset Descriptions” come in — these are associated with an HTML table elment an LOD dataset. These describe all possible dataset dimension keys and the different kinds of properties that can be named in an Observation. Text attributes, measures, dimensions, and coded properties are all provided, and all are sub-properties of rdf:Property.

This is why a Dataset Description is a “one-off ontology”, because it defines only text and numeric properties and, importantly, no classes of functional things. So with perfect pitch, the Data Cube Vocabulary virtually requires Enterprise Ontologies to ground their property hierarchy with the measure, dimension, code, and text attribute properties.

Data Cubes define just a handful of classes like “Dataset” “Slice” “SliceKey” and “Observation”. How are these four classes best inserted to an enterprise’s ontology class hierarchy? “Observation” is easy — it should be the base class of all observable properties, that is, all and only textual properties. “SliceKey” is a Role that an observable property can play. A “Slice” is basically an annotated rdf:Bag, mediated by skos:Collection at times.

A “Dataset” is a hazy term applicable to anythng classifiable as data objects or as data structures, that is, “a set of data” is merely an aggregate collection of data items just as a data object or data structure is. Accordingly, a Data Cube “dataset” class might be placed at or near the root of a class hierarchy, but its more clear to establish it as a subclass of an html:table class.

There’s more to this topic saved for future entries — all those claimed dependencies need to be examined.

Category: Business Intelligence, Information Development, Semantic Web
No Comments »

by: Phil Simon
02  Sep  2013

The Downside of DataViz

ITWorld recently ran a great article on the perils of data visualization. The piece covers a number of companies, including Carwoo, a startup that aims to make car buying easier. The company has been using dataviz tool Chartio for a few months. From the article:

Around a year ago, Rimas Silkaitis, a product manager at Carwoo, started looking for a better way to handle the many requests for data visualizations that his co-workers were making.

He looked at higher end products, like those from GoodData and Microstrategy. “Then I realized, hey, we’re a startup, we don’t have that kind of money,” he said. “That’s when we found Chartio.”

Now, most of the 40-person company–except sales and customer service, which have their own tools–have access to Chartio.

Silkaitis said he worries a bit about users misinterpreting data and creating bad visualizations, but he’s implemented procedures that seem to be working so far.

It starts with new hires. “Anybody that comes on new to the company, I sit them down and walk them through our data model and give them a tutorial on how Chartio works,” he said.

There are several key lessons in this piece related to intelligent data management, dataviz, and Big Data. Let’s review them.

DataViz Is Easier Than Ever

Over the last ten years, we have seen a proliferation of easy-to-use tools in many areas, and dataviz is no exception. Today, one needs not be a coder or work in the IT department to build powerful, interactive data visualization tool. Dragging and dropping and slicing and dicing are more prevalent than ever. Chartio is just one of dozens or hundreds of user-friendly applications that can make data come to life.

DataViz Can Be Abused

Often we look at visual representations of data and the required decision or trend seems obvious. But is it? Is the data or the dataviz masking what’s really going on? Are we seeing another example of Simpson’s Paradox?

Even with Small Data, there was tremendous potential for statistical abuse. You can multiply that by 1,000 thanks to Big Data.

Democratized DataViz Will Result in Some Bad Visualizations…and More

Some people lament the state of book publishing. Andrew Keen is one of them. Now that anyone can do it, everyone is doing it. One of the results: many self-published books look downright awful.

And the same holds true with data visualization. There are many truly awful ones out there. All else being equal, a bad dataviz will result in a bad decision. Period.

DataViz Guarantees Nothing

Even organizations that deploy powerful contemporary dataviz solutions guarantee nothing. The “right” decision still needs to be executed correctly and in a reasonable period of time.

But even if all of these dominoes fall, an organization still falls fall short of anything near 100-percent certainty of success. The world doesn’t stand still and plenty of other business realities should shatter existing delusions.

Simon Says: DataViz Requires Effective Communication and Education

Kudos for Silkaitis for understanding the need for employee training and education around Carwoo’s data. Without the requisite background, it’s easy for employees to abuse data–and make poor business decisions as a result. User-friendly tools are fine and dandy, but don’t think for a minute even the friendliest of tools obviates the need for occasional in-person communication.


What say you?

Tags: ,
Category: Business Intelligence, Information Management, Information Value
No Comments »

by: Ocdqblog
14  Aug  2013

Let the Computers Calculate and the Humans Cogitate

Many organizations are wrapping their enterprise brain around the challenges of business intelligence, looking for the best ways to analyze, present, and deliver information to business users.  More organizations are choosing to do so by pushing business decisions down in order to build a bottom-up foundation.

However, one question coming up more frequently in the era of big data is what should be the division of labor between computers and humans?

In his book Emergence: The Connected Lives of Ants, Brains, Cities, and Software, Steven Johnson discussed how the neurons in our human brains are only capable of two hundred calculations per second, whereas the processors in computers can perform millions of calculations per second.

This is why we should let the computers do the heavy lifting for anything that requires math skills, especially the statistical heaving lifting required by big data analytics.  “But unlike most computers,” Johnson explained, “the brain is a massively parallel system, with 100 billion neurons all working away at the same time.  That parallelism allows the brain to perform amazing feats of pattern recognition, feats that continue to confound computers—such as remembering faces or creating metaphors.”

As the futurist Ray Kurzweil has written, “humans are far more skilled at recognizing patterns than in thinking through logical combinations, so we rely on this aptitude for almost all of our mental processes. Indeed, pattern recognition comprises the bulk of our neural circuitry.  These faculties make up for the extremely slow speed of human neurons.”

“Genuinely cognizant machines,” Johnson explained, “are still on the distant technological horizon, and there’s plenty of reason to suspect they may never arrive.  But the problem with the debate over machine learning and intelligence is that it has too readily been divided between the mindless software of today and the sentient code of the near future.”

But even if increasingly more intelligent machines “never become self-aware in any way that resembles human self-awareness, that doesn’t mean they aren’t capable of learning.  An adaptive information network capable of complex pattern recognition could prove to be one of the most important inventions in all of human history.  Who cares if it never actually learns how to think for itself?”

Business intelligence in the era of big data and beyond will best be served if we let both the computers and the humans play to their strengths.  Let’s let the computers calculate and the humans cogitate.

Tags: , , ,
Category: Business Intelligence
No Comments »

by: John McClure
01  Aug  2013

Making Microsense of Semantics

Like many, I’m one who’s been around since the cinder block days, once entranced by shiny Tektronix tubes stationed nearby a dusty card sorter. After years using languages as varied as Assembler through Scheme, I’ve come to believe the shift these represented, from procedural to declarative, has well-improved the flexibility of software organizations produce.

Interest has now moved towards an equally flexble representation of data. In the ‘old’ days when an organization wanted to collect a new data-item about, say, a Person, then a new column would first be added by a friendly database administrator to a Person Table in one’s relational database. Very inflexible.

The alternative — now widely adopted — reduces databases to a simple forumulation, one that eliminates Person and other entity-specific tables altogether. These “triple-stores” basically have just three columns — Subject, Predicate and Object — in which all data is stored. Triple-stores are often called ‘self-referential’ because first, the type of a Subject of any row in a triple-store is found in a different row (not column) in the triple-store and second, definitions of types are found in different rows of the triple-store. The benefits? Not only is the underlying structure of a triple-store unchanging, but also stand-alone metadata tables (tables describing tables) are unnecessary.

Why? Static relational database tables do work well enough to handle transactional records whose dataitems are usually well-known in advance; the rate of change in those business processes is fairly low, so that the cost of database architectures based on SQL tables is equally low. What, then, is driving the adoption of triple-stores?

The scope of business functions organizations seek to automate has enlarged considerably: the source of new information within an organization is less frequently “forms” completed by users, now more frequently raw text from documents; tweets; blogs; emails; newsfeeds; and other ‘social’ web and internal sources; which have been produced received &or retrieved by organizations.

Semantic technologies are essential components of Natural Language Processing (NLP) applications which extract and convert, for instance, all proper nouns within a text into harvestable networks of “information nodes” found in a triple-store. In fact during such harvesting, context becomes a crucial variable that can change with each sentence analyzed from the text.

Bringing us to my primary distinction between really semantic and non-semantic applications: really semantic applications mimic a human conversation, where the knowledge of an indivdual in a conversation is the result of a continuous accrual of context-specific facts, context-specific definitions, even context-specific contexts. As a direct analogy, Wittgenstein, a modern giant of philosophy, calls this phenomena Language Games to connote that one’s techniques and strategies for analysis of a game’s state and one’s actions, is not derivable in advance — it comes only during the play of the game, i.e., during processing of the text corpora.

Non-semantic applications on the other hand, are more similar to rites, where all operative dialogs are pre-written, memorized, and repeated endlessly.

This analogy to human conversations (to ‘dynamic semantics’) is hardly trivial; it is a dominant modelling technique among ontologists as evidenced by development of, for instance, Discourse Representation Theory (among others, e.g., legal communities have a similar theory, simply called Argumentation) whose rules are used to build Discourse Representation Structures from a stream of sentences that accommodate a variety of linguistic issues including plurals, tense, aspect, generalized quantifiers, anaphora and others.

“Semantic models” are an important path towards a more complete understanding of how humans, when armed with language, are able to reason and draw conclusions about the world. Relational tables, however, in themselves haven’t provided similar insight or re-purposing in different contexts. This fact alone is strong evidence that semantic methods and tools must be prominent in any organization’s technology plans.

Category: Business Intelligence, Information Development, Information Strategy, Semantic Web
1 Comment »

Collapse Expand Close
TODAY: Mon, November 24, 2014
Recent Comments
Collapse Expand Close