Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Archive for the ‘Data Quality’ Category

by: Ocdqblog
14  Dec  2014

Open Data Grey Areas

In a previous post, I discussed some data quality and data governance issues associated with open data. In his recent blog post How far can we trust open data?, Owen Boswarva raised several good points about open data.

“The trustworthiness of open data,” Boswarva explained, “depends on the particulars of the individual dataset and publisher. Some open data is robust, and some is rubbish. That doesn’t mean there’s anything wrong with open data as a concept. The same broad statement can be made about data that is available only on commercial terms. But there is a risk attached to open data that does not usually attach to commercial data.”

Data quality, third-party rights, and personal data were three grey areas Boswarva discussed. Although his post focused on a specific open dataset published by an agency of the government of the United Kingdom (UK), his points are generally applicable to all open data.

Data Quality

As Boswarva remarked, the quality of a lot of open data is high even though there is no motivation to incur the financial cost of verifying the quality of data being given away for free. The “publish early even if imperfect” principle also encourages a laxer data quality standard for open data. However, “the silver lining for quality-assurance of open data,” Boswarva explained is that “open licenses maximize re-use, which means more users and re-users, which increases the likelihood that errors will be detected and reported back to the publisher.”

Third-Party Rights

The issue of third-party rights raised by Boswarva was one that I had never considered. His example was the use of a paid third-party provider to validate and enrich postal address data before it is released as part of an open dataset. Therefore, consumers of the open dataset benefit from postal validation and enrichment without paying for it. While the UK third-party providers in this example acquiesced to open re-use of their derived data because their rights were made clear to re-users (i.e., open data consumers), Boswarva pointed out that re-users should be aware that using open data doesn’t provide any protection from third-party liability and, more importantly, doesn’t create any obligation on open data publishers to make sure re-users are aware of any such potential liability. While, again, this is a UK example, that caution should be considered applicable to all open data in all countries.

Personal Data

As for personal data, Boswarva noted that while open datasets are almost invariably non-personal data, “publishers may not realize that their datasets contain personal data, or that analysis of a public release can expose information about individuals.” The example in his post centered on the postal addresses of property owners, which without the names of the owners included in the dataset, are not technically personal data. However, it is easy to cross-reference this with other open datasets to assemble a lot of personally identifiable information that if it were contained in one dataset would be considered a data protection violation (at least in the UK).


Tags: , ,
Category: Data Quality, Information Governance
No Comments »

by: Ocdqblog
12  Nov  2014

Open Data opens eyes to Data Quality and Data Governance

Calls for increased transparency and accountability lead government agencies around the world to make more information available to the public as open data. As more people accessed this information, it quickly became apparent that data quality and data governance issues complicate putting open data to use.

“It’s an open secret,” Joel Gurin wrote, “that a lot of government data is incomplete, inaccurate, or almost unusable. Some agencies, for instance, have pervasive problems in the geographic data they collect: if you try to map the factories the EPA regulates, you’ll see several pop up in China, the Pacific Ocean, or the middle of Boston Harbor.”

A common reason for such data quality issues in the United States government’s data is what David Weinberger wrote about “The keepers of the site did not commit themselves to carefully checking all the data before it went live. Nor did they require agencies to come up with well-formulated standards for expressing that data. Instead, it was all just shoveled into the site. Had the site keepers insisted on curating the data, deleting that which was unreliable or judged to be of little value, would have become one of those projects that each administration kicks further down the road and never gets done.”

Of course, the United States is not alone in either making government data open (about 60 countries have joined the Open Government Partnership) or having it reveal data quality issues. Victoria Lemieux recently blogged about data issues hindering the United Kingdom government’s Open Data program in her post Why we’re failing to get the most out of open data.

One of the data governances issues Lemieux highlighted was data provenance. “Knowing where data originates and by what means it has been disclosed,” Lemieux explained, “is key to being able to trust data. If end users do not trust data, they are unlikely to believe they can rely upon the information for accountability purposes.” Lemieux explained that determining data provenance can be difficult since “it entails a good deal of effort undertaking such activities as enriching data with metadata, such as the date of creation, the creator of the data, who has had access to the data over time. Full comprehension of data relies on the ability to trace its origins. Without knowledge of data provenance, it can be difficult to interpret the meaning of terms, acronyms, and measures that data creators may have taken for granted, but are much more difficult to decipher over time.”

I think the bad press about open data is a good thing because open data is opening eyes to two basic facts about all data. One, whenever data is made available for review, you will discover data quality issues. Two, whenever data quality issues are discovered, you will need data governance to resolve them. Therefore, the reason we’re failing to get the most out of open data is the same reason we fail to get the most out of any data.


Tags: , ,
Category: Data Quality, Information Governance
No Comments »

by: Ocdqblog
29  Oct  2014

Big Data needs Data Quality Too

The second of the five biggest data myths debunked by Gartner is many IT leaders believe that the huge volume of data that organizations now manage makes individual data quality flaws insignificant due to the law of large numbers.

Their view is that individual data quality flaws don’t influence the overall outcome when the data is analyzed because each flaw is only a tiny part of the mass of big data. “In reality,” as Gartner’s Ted Friedman explained, “although each individual flaw has a much smaller impact on the whole dataset than it did when there was less data, there are more flaws than before because there is more data. Therefore, the overall impact of poor-quality data on the whole dataset remains the same. In addition, much of the data that organizations use in a big data context comes from outside, or is of unknown structure and origin. This means that the likelihood of data quality issues is even higher than before. So data quality is actually more important in the world of big data.”

“Convergence of social, mobile, cloud, and big data,” Gartner’s Svetlana Sicular blogged, “presents new requirements: getting the right information to the consumer quickly, ensuring reliability of external data you don’t have control over, validating the relationships among data elements, looking for data synergies and gaps, creating provenance of the data you provide to others, spotting skewed and biased data. In reality, a data scientist job is 80% of a data quality engineer, and just 20% of a researcher, dreamer, and scientist.”

This aligns with Steve Lohr of The New York Times reporting that data scientists are more often data janitors since they spend from 50 percent to 80 percent of their time mired in the more mundane labor of collecting and preparing unruly big data before it can be mined to discover the useful nuggets that provide business insights.

“As the amount and type of raw data sources increases exponentially,” Stefan Groschupf blogged, “data quality issues can wreak havoc on an organization. Data quality has become an important, if sometimes overlooked, piece of the big data equation. Until companies rethink their big data analytics workflow and ensure that data quality is considered at every step of the process—from integration all the way through to the final visualization—the benefits of big data will only be partly realized.”

So no matter what you heard or hoped, the truth is big data needs data quality too.


Tags: ,
Category: Data Quality
No Comments »

by: Ocdqblog
12  Aug  2014

What Movie Ratings teach us about Data Quality

In previous posts on this blog, I have discussed aspects of data quality using travel reviews, the baseball strike zoneChristmas songs, the wind chill factor, and the bell curve.

Grab a big popcorn and a giant soda since this post discusses data quality using movie ratings.

The Following Post has been Approved for All Audiences

In the United States prior to 1922, every state and many cities had censorship boards that prevented movies from being shown in local theaters on the basis of “immoral” content. What exactly was immoral varied by censorship board.

In 1922, the Motion Picture Association of America (MPAA), representing all the major American movie studios, was formed in part to encourage the movie industry to censor itself. Will Hays, the first MPAA president, helped develop what came to be called the Hays Code, which used a list of criteria to rate movies as either moral or immoral.

For three decades, if a movie failed the Hays Code most movie theaters around the country would not show it. After World War II ended, however, views on movie morality began to change. Frank Sinatra received an Oscar nomination for his role as a heroin addict in the 1955 drama The Man with the Golden Arm. Jack Lemmon received an Oscar nomination for his role as a cross-dressing musician in the 1959 comedy Some Like It Hot. Both movies failed the Hays Code, but were booked by movie theaters based on good reviews and became big box office hits.

Then in 1968, a landmark Supreme Court decision (Ginsberg v. New York) ruled that states could “adjust the definition of obscenity as applied to minors.” Fearing the revival of local censorship boards, the MPAA created a new rating system intended to help parents protect their children from obscene material. Even though the ratings carried no legal authority, parents were recommended to use it as a guide in deciding what movies their children should see.

While a few changes have occurred over the years (most notably adding PG-13 in 1984), these are the same movie ratings we know today: G (General Audiences, All Ages Admitted), PG (Parental Guidance Suggested, Some Material may not be Suitable for Children), PG-13 (Parents Strongly Cautioned, Some Material may be Inappropriate for Children under 13), R (Restricted, Under 17 requires Accompanying Parent or Adult Guardian), and NC-17 (Adults Only, No One 17 and Under Allowed). For more on these ratings and how they are assigned, read this article by Dave Roos.

What Movie Ratings teach us about Data Quality

Just like the MPAA learned with the failure of the Hays Code to rate movies as either moral or immoral, data quality can not simply be rated as good or bad. Perspectives about the quality standards for data, like the moral standards for movies, changes over time. For example, consider how big data challenges traditional data quality standards.

Furthermore, good and bad, like moral and immoral, are ambiguous. A specific context is required to help interpret any rating. For the MPAA, the specific context became rating movies based on obscenity from the perspective of parents. For data quality, the specific context is based on fitness for the purpose of use from the perspective of business users.

Adding context, however, does not guarantee everyone will agree with the rating. Debates rage over the rating given to a particular movie. Likewise, many within your organization will disagree with the quality assessment of certain data. So the next time your organization calls a meeting to discuss its data quality standards, you might want to grab a big popcorn and a giant soda since the discussion could be as long and dramatic as a Peter Jackson trilogy.


Tags: ,
Category: Data Quality
No Comments »

by: Ocdqblog
29  Jul  2014

Do you know good information when you hear it?

When big data is discussed, we rightfully hear a lot of concerns about signal-to-noise ratio. Amidst the rapidly expanding galaxy of information surrounding us every day, most of what we hear sounds like meaningless background static. We are so overloaded but underwhelmed by information, perhaps we are becoming tone deaf to signal, leaving us hearing only noise.

Before big data burst onto the scene, bursting our eardrums with its unstructured cacophony, most of us believed we had data quality standards and therefore we knew good information when we heard it. Just as most of us believe we would know great music when we hear it.

In 2007, Joshua Bell, a world-class violinist and recipient of that year’s Avery Fisher Prize, an award given to American musicians for outstanding achievement in classical music, participated in an experiment for The Washington Post columnist Gene Weingarten, whose article about the experiment won the 2008 Pulitzer Prize for feature writing.

During the experiment, Bell posed as a street performer donned in jeans and a baseball cap and played violin for tips in a Washington, D.C. metro station during one Friday morning rush hour. For 45 minutes he performed 6 classical music masterpieces, some of the most elegant music ever written on one of the most valuable violins ever made.

Weingarten described it as “an experiment in context, perception, and priorities—as well as an unblinking assessment of public taste: In a banal setting at an inconvenient time, would beauty transcend? Each passerby had a quick choice to make, one familiar to commuters in any urban area where the occasional street performer is part of the cityscape: Do you stop and listen? Do you hurry past with a blend of guilt and irritation, aware of your cupidity but annoyed by the unbidden demand on your time and your wallet? Do you throw in a buck, just to be polite? Does your decision change if he’s really bad? What if he’s really good? Do you have time for beauty?”

As Bell performed, over 1,000 commuters passed by him. Most barely noticed him at all. Very few stopped briefly to listen to him play. Even fewer were impressed enough to toss a little money into his violin case. Although three days earlier he had performed the same set in front of a sold out crowd at Boston Symphony Hall where ticket prices for good seats started at $100 each, at the metro station Bell earned only $32.17 in tips.

“If a great musician plays great music but no one hears,” Weingarten pondered, “was he really any good?”

That metro station during rush hour is an apt metaphor for the daily experiment in context, perception, and priorities facing information development in the big data era. In a fast-paced business world overcrowded with fellow travelers and abuzz with so much background noise, if great information is playing but no one hears, is it really any good?

Do you know good information when you hear it? How do you hear it amidst the noise surrounding it?


Tags: , , , ,
Category: Data Quality, Information Development
No Comments »

by: Alandduncan
19  Jul  2014

Data Quality Profiling Considerations

Data profiling is an excellent diagnostic method for gaining additional understanding of the data. Profiling the source data helps inform both business requirements definition and detailed solution designs for data-related project, as well as enabling data issues to be managed ahead of project implementation.

Profiling of a data set will be measured with reference to and agreed Data Quality Dimensions (e.g. per those proposed in the recent DAMA white paper).

Profiling may be required at several levels:

• Simple profiling with a single table (e.g. Primary Key constraint violations)
• Medium complexity profiling across two or more interdependent tables (e.g. Foreign Key violations)
• Complex profiling across two or more data sets, with applied business logic (e.g. reconciliation checks)

Note that field-by-field analysis is required to truly understand the data gaps.

Any data profiling analysis must not only identify the issues and underlying root causes, but must also identify the business impact of the data quality problem (measured by effectiveness, efficiency, risk inhibitors). This will help identify any value in remediating the data – great for your data quality Business Case. Root cause analysis also helps identify any process outliers and and drives out requirements for remedial action on managing any identified exceptions.

Be sure to profile your data and take baseline measures before applying any remedial actions – this will enable you to measure the impact of any changes.

I strongly recommend Data Quality Profiling and root-cause analysis to be undertaken as an initiation activity as part of all data warehouse, master data and application migration project phases.

Category: Business Intelligence, Data Quality, Enterprise Data Management, Information Development, Information Governance, Information Management, Information Strategy, Information Value, Metadata
No Comments »

by: Alandduncan
01  Jul  2014

Information Requirements Gathering: The One Question You Must Never Ask!

Over the years, I’ve tended to find that asking any individual or group the question “What data/information do you want?” gets one of two responses:

“I don’t know.” Or;

“I don’t know what you mean by that.”

End of discussion, meeting over, pack up go home, nobody is any the wiser. Result? IT makes up the requirements based on what they think the business should want, the business gets all huffy because IT doesn’t understand what they need, and general disappointment and resentment ensues.

Clearly for Information Management & Business Intelligence solutions, this is not a good thing.

So I’ve stopped asking the question. Instead, when doing requirements gathering for an information project, I go through a workshop process that follows the following outline agenda:

Context setting: Why information management / Business Intelligence / Analytics / Data Governance* is generally perceived to be a “good thing”. This is essentially a very quick précis of the BI project mandate, and should aim at putting people at ease by answering the question “What exactly are we all doing here?”

(*Delete as appropriate).

Business Function & Process discovery: What do people do in their jobs – functions & tasks? If you can get them to explain why they do those things – i.e. to what end purpose or outcome – so much the better (though this can be a stretch for many.)

Challenges: what problems or issues do they currently face in their endeavours? What prevents them from succeeding in their jobs? What would they do differently if they had the opportunity to do so?

Opportunities: What is currently good? Existing capabilities (systems, processes, resources) are in place that could be developed further or re-used/re-purposed to help achieve the desired outcomes?

Desired Actions: What should happen next?

As a consultant, I see it as part of my role to inject ideas into the workshop dialogue too, using a couple of question forms specifically designed to provoke a response:

“What would happen if…X”

“Have you thought about…Y”

“Why do you do/want…Z”.

Notice that as the workshop discussion proceeds, the participants will naturally start to explore aspects that relate to later parts of the agenda – this is entirely ok. The agenda is there to provide a framework for the discussion, not a constraint. We want people to open up and spill their guts, not clam up. (Although beware of the “rambler” who just won’t shut up but never gets to the point…)

Notice also that not once have we actively explored the “D” or “I” words. That’s because as you explore the agenda, any information requirements will either naturally fall out of the discussion as it proceed, or else you can infer the information requirements arising based on the other aspects of the discussion.

As the workshop attendees explore the different aspects of the session, you will find that the discussion will touch upon a number of different themes, which you can categorise and capture on-the-fly (I tend to do this on sheets of butchers paper tacked to the walls, so that the findings are shared and visible to all participants.). Comments will typically fall into the following broad categories:

* Functions: Things that people do as part of doing business.
* Stakeholders: people who are involved (including helpful people elsewhere in the organisation – follow up with them!)
* Inhibitors: Things that currently prevent progress (these either become immediate scope-change items if they are show-stoppers for the current initiative, or else they form additional future project opportunities to raise with management)
* Enablers: Resources to make use of (e.g. data sets that another team hold, which aren’t currently shared)
* Constraints: “non-negotiable” aspects that must be taken into account. (Note: I tend to find that all constraints are actually negotiable and can be overcome if there is enough desire, money and political will.)
* Considerations: Things to be aware of that may have an influence somewhere along the line.
* Source systems: places where data comes from
* Information requirements: Outputs that people want

Here’s a (semi) fictitious example:

e.g. ADD: “What does your team do?”

Workshop Victim Participant #1: “Well, we’re trying to reconcile the customer account balances with the individual transactions.”

ADD: And why do you wan to do that?

Workshop Victim Participant #2: “We think there’s a discrepancy in the warehouse stock balances, compared with what’s been shipped to customers. The sales guys keep their own database of customer contracts and orders and Jim’s already given us dump of the data, while finance run the accounts receivables process. But Sally the Accounts Clerk doesn’t let the numbers out under any circumstances, so basically we’re screwed.”

Functions: Sales Processing, Contract Mangement, Order Fulfilment, Stock Management, Accounts Receivable.
Stakeholders: Warehouse team, Sales team (Jim), Finance team.
Inhibitors: Finance don’t collaborate.
Enablers: Jim is helpful.
Source Systems: Stock System, Customer Database, Order Management, Finance System.
Information Requirements: Orders (Quantity & Price by Customer, by Salesman, by Stock Item), Dispatches (Quantity & Price by Customer, by Salesman, by Warehouse Clerk, by Stock Item), Financial Transactions (Value by Customer, by Order Ref)

You will also probably end up with the attendees identifying a number of immediate self-assigned actions arising from the discussion – good ideas that either haven’t occurred to them before or have sat on the “To-Do” list. That’s your workshop “value add” right there….

Workshop Victim Participant #1: “I could go and speak to the Financial Controller about getting access to the finance data. He’s more amenable to working together than Sally, who just does what she’s told.”

Happy information requirements gathering!

Category: Business Intelligence, Data Quality, Enterprise Data Management, Information Development, Information Governance, Information Management, Information Strategy, Information Value, Master Data Management, Metadata
No Comments »

by: Ocdqblog
29  Jun  2014

Overcoming Outcome Bias

What is more important, the process or its outcome? Information management processes, like those described by the MIKE2.0 Methodology, drive the daily operations of an organization’s business functions as well as support the tactical and strategic decision-making processes of its business leaders. However, an organization’s success or failure is usually measured by the outcomes produced by those processes.

As Duncan Watts explained in his book Everything Is Obvious: How Common Sense Fails Us, “rather than the evaluation of the outcome being determined by the quality of the process that led to it, it is the observed nature of the outcome that determines how we evaluate the process.” This is known as outcome bias.

While an organization is enjoying positive outcomes, such as exceeding its revenue goals for the current fiscal period, outcome bias basks processes in a rose-colored glow. Information management processes must be providing high-quality data to decision-making processes, which business leaders are using to make good decisions. However, when an organization is suffering from negative outcomes, such as a regulatory compliance failure, outcome bias blames it on broken information management processes and poor data quality that lead to bad decision-making.

“Judging the merit of a decision can never be done simply by looking at the outcome,” explained Jeffrey Ma in his book The House Advantage: Playing the Odds to Win Big In Business. “A poor result does not necessarily mean a poor decision. Likewise a good result does not necessarily mean a good decision.”

“We are prone to blame decision makers for good decisions that worked out badly and to give them too little credit for successful moves that appear obvious after the fact,” explained Daniel Kahneman in his book Thinking, Fast and Slow.

While risk mitigation is an oft-cited business justification for investing in information management, Kahneman also noted how outcome bias can “bring undeserved rewards to irresponsible risk seekers, such as a general or an entrepreneur who took a crazy gamble and won. Leaders who have been lucky are never punished for having taken too much risk. Instead, they are believed to have had the flair and foresight to anticipate success. A few lucky gambles can crown a reckless leader with a halo of prescience and boldness.”

Outcome bias triggers overreactions to both success and failure. Organizations that try to reverse engineer a single, successful outcome into a formal, repeatable process often fail, much to their surprise. Organizations also tend to abandon a new process immediately if its first outcome is a failure. “Over time,” Ma explained, “if one makes good, quality decisions, one will generally receive better outcomes, but it takes a large sample set to prove this.”

Your organization needs solid processes governing how information is created, managed, presented, and used in decision-making. Your organization also needs to guard against outcomes biasing your evaluation of those processes.

In order to overcome outcome bias, Watts recommended we “bear in mind that a good plan can fail while a bad plan can succeed—just by random chance—and therefore judge the plan on its own merits as well as the known outcome.”


Tags: , , ,
Category: Data Quality, Information Development
No Comments »

by: Alandduncan
24  May  2014

Data Quality Profiling: Do you trust in the Dark Arts?

Why estimating Data Quality profiling doesn’t have to be guess-work

Data Management lore would have us believe that estimating the amount of work involved in Data Quality analysis is a bit of a “Dark Art,” and to get a close enough approximation for quoting purposes requires much scrying, haruspicy and wet-finger-waving, as well as plenty of general wailing and gnashing of teeth. (Those of you with a background in Project Management could probably argue that any type of work estimation is just as problematic, and that in any event work will expand to more than fill the time available…).

However, you may no longer need to call on the services of Severus Snape or Mystic Meg to get a workable estimate for data quality profiling. My colleague from QFire Software, Neil Currie, recently put me onto a post by David Loshin on, which proposes a more structured and rational approach to estimating data quality work effort.

At first glance, the overall methodology that David proposes is reasonable in terms of estimating effort for a pure profiling exercise – at least in principle. (It’s analogous to similar “bottom/up” calculations that I’ve used in the past to estimate ETL development on a job-by-job basis, or creation of standards Business Intelligence reports on a report-by-report basis).

I would observe that David’s approach is predicated on the (big and probably optimistic) assumption that we’re only doing the profiling step. The follow-on stages of analysis, remediation and prevention are excluded – and in my experience, that’s where the real work most often lies! There is also the assumption that a pre-existing checklist of assessment criteria exists – and developing the library of quality check criteria can be a significant exercise in its own right.

However, even accepting the “profiling only” principle, I’d also offer a couple of additional enhancements to the overall approach.

Firstly, even with profiling tools, the inspection and analysis process for any “wrong” elements can go a lot further than just a 10-minute-per-item-compare-with-the-checklist, particularly in data sets with a large number of records. Also, there’s the question of root-cause diagnosis (And good DQ methods WILL go into inspecting the actual member records themselves). So for contra-indicated attributes, I’d suggest a slightly extended estimation model:

* 10mins: for each “Simple” item (standard format, no applied business rules, fewer that 100 member records)
* 30 mins: for each “Medium” complexity item (unusual formats, some embedded business logic, data sets up to 1000 member records)
* 60 mins: for any “Hard” high-complexity items (significant, complex business logic, data sets over 1000 member records)

Secondly, and more importantly – David doesn’t really allow for the human factor. It’s always people that are bloody hard work! While it’s all very well to do a profiling exercise in-and-of-itself, the result need to be shared with human beings – presented, scrutinised, questioned, validated, evaluated, verified, justified. (Then acted upon, hopefully!) And even allowing for the set-aside of the “Analysis” stages onwards, then there will need to be some form of socialisation within the “Profiling” phase.

That’s not a technical exercise – it’s about communication, collaboration and co-operation. Which means it may take an awful lot longer than just doing the tool-based profiling process!

How much socialisation? That depends on the number of stakeholders, and their nature. As a rule-of-thumb, I’d suggest the following:

* Two hours of preparation per workshop ((If the stakeholder group is “tame”. Double it if there are participants who are negatively inclined).
* One hour face-time per workshop (Double it for “negatives”)
* One hour post-workshop write-up time per workshop
* One workshop per 10 stakeholders.
* Two days to prepare any final papers and recommendations, and present to the Steering Group/Project Board.

That’s in addition to David’s formula for estimating the pure data profiling tasks.

Detailed root-cause analysis (Validate), remediation (Protect) and ongoing evaluation (Monitor) stages are a whole other ball-game.

Alternatively, just stick with the crystal balls and goats – you might not even need to kill the goat anymore…

Category: Business Intelligence, Data Quality, Enterprise Data Management, Information Development, Information Governance, Information Management, Information Strategy, Information Value, Master Data Management, Metadata
No Comments »

by: Ocdqblog
22  May  2014

Things Interrupting the Internet of Things

I have written about several aspects of the Internet of Things (IoT) in previous posts on this blog (The Internet of HumansThe Quality of Things, and Is it finally the Year of IoT?). Two recent articles have examined a few of the things that are interrupting the progress of IoT.

In his InformationWeek article Internet of Things: What’s Holding Us Back, Chris Murphy reported that while “companies in a variety of industries — transportation, energy, heavy equipment, consumer goods, healthcare, hospitality, insurance — are getting measurable results by analyzing data collected from all manner of machines, equipment, devices, appliances, and other networked things” (of which his article notes many examples), there are things slowing the progress of IoT.

One of its myths, Murphy explained, “is that companies have all the data they need, but their real challenge is making sense of it. In reality, the cost of collecting some kinds of data remains too high, the quality of the data isn’t always good enough, and it remains difficult to integrate multiple data sources.”

The fact is a lot of the things we want to connect to IoT weren’t built for Internet connectivity, so it’s not as simple as just sticking a sensor on everything. Poor data quality, especially timeliness, but also completeness and accuracy, is impacting the usefulness of the data transmitted by things currently connected to sensors. Other issues Murphy explores in his article include the lack of wireless network ubiquity, data integration complexities, and securing IoT data centers and devices against Internet threats.

“The clearest IoT successes today,” Murphy concluded, “are from industrial projects that save companies money, rather than from projects that drive new revenue. But even with these industrial projects, companies shouldn’t underestimate the cultural change they need to manage as machines start telling veteran machine operators, train drivers, nurses, and mechanics what’s wrong, and what they should do, in their environment.”

In his Wired article Why Tech’s Best Minds Are Very Worried About the Internet of Things, Klint Finley reported on the findings of a survey about IoT from the Pew Research Center. While potential benefits of IoT were noted, such as medical devices monitoring health and environmental sensors detecting pollution, potential risks were highlighted, security being one of the most immediate concerns. “Beyond security concerns,” Finley explained, “there’s the threat of building a world that may be too complex for our own good. If you think error messages and applications crashes are a problem now, just wait until the web is embedded in everything from your car to your sneakers. Like the VCR that forever blinks 12:00, many of the technologies built into the devices of the future may never be used properly.”

The research also noted concerns about IoT expanding the digital divide, ostracizing those who can not, or choose not to, participate. Data privacy concerns were also raised, including the possibility of dehumanizing the workplace by monitoring employees through wearable devices (e.g., your employee badge being used to track your movements).

Some survey respondents hinted at a division similar to what we hear about the cloud, differentiating a public IoT from a private IoT. In the latter case, instead of sacrificing our privacy for the advantages of connected devices, there’s no reason our devices can’t connect to a private network instead of the public internet. Other survey respondents believe that IoT has been overhyped as the next big thing and while it will be useful in some areas (e.g., the military, hospitals, and prisons) it will not pervade mainstream culture and therefore will not invade our privacy as many fear.

What Say You about IoT?

Please share your viewpoint about IoT by leaving a comment below.


Tags: , , ,
Category: Data Quality, Information Development
No Comments »

Collapse Expand Close
TODAY: Wed, March 20, 2019
Collapse Expand Close
Recent Comments
Collapse Expand Close