Archive for the ‘Information Management’ Category
SanDisk’s new wireless flash drive hit the scene with mixed opinions. Some argue that it serves as a solution to transferring documents, photos, videos, and music from one mobile device to another, or to a PC, in a way that is not complicated or unreliable. Others argue that this wireless flash drive is only one of many storage options, and not very spectacular at that.
Let’s take a moment to evaluate what this new wireless flash drive can do, and how it could indeed make an impact on the storage industry, mobile devices, and even data protection.
What It Can Do
The new SanDisk wireless flash drive has the ability store data such as photos, videos, documents, and music from any device, boosting the storage capacity of mobile devices up to 64 GB. With the use of SanDisk’s app, all your data will be accessible and manageable remotely. Its most appealing trait is its ability to connect to a range of devices, including the iPhone, Android, iPad, Kindle Fire, and even your PC.
How This Will Affect the Storage Industry
The cloud offers many large storage options, but the downside is its lack of accessibility without an internet connection. This device is a nice companion for users who don’t have enough room on their phones and tablets, and who don’t want to delete data to make room. With this wireless flash drive, you don’t have to spend money upgrading to the latest iPad with its increased storage.
How can this make a difference in the industry? This drive offers privacy by excluding the internet and simplicity with its easily manageable app; why struggle with the security hazards and the price updates of online storage when you can have all your data in your pocket? This makes it an appealing option over the competition, which could change how people view internet storage.
How This Will Affect Mobile Devices
Most of us have experienced the headache of trying to transfer media from our phone to our computer; this wireless flash drive eliminates this issue. Simply transfer your data onto the flash drive, plug it into your computer via USB, and move your media with ease.
Beforehand, if you had multiple devices from many brands, it was difficult to share data across each without struggling to find the right software to bridge the gap. This SanDisk flash drive eliminates this issue, and can affect mobile devices in the way we approach their barriers. This could encourage users to use many different brands rather than one, allowing the SanDisk flash drive to serve as the bridge.
How This Will Affect Data Protection
Its remote access allows up to eight different devices to access its data — up to three devices at the same time. You can share storage and transfer data over your many devices or with your friends – all with no internet connection. With this drive’s ability to allow remote access to many users at many locations, it makes people suspicious of how protected their data actually is. SanDisk combats this by allowing users to set up a personal password, as you might do with your WiFi, so only you or those you’ve shared the password with have access to the data.
As a smaller, private storage device, it’s less likely to become a target of cyber terrorism as we see often with online cloud storage. Why would a hacker want your family photos or your few downloaded movies? Since you can personally store only your data on the drive, rather than sharing a storage device with many, it gives you a lower profile. This peace of mind could sway the opinions of the public from online storage.
As one of many flash storage options, this flash drive cannot necessarily be called revolutionary, but it offers the public many useful features and advantages that are not often found in our internet dependent age. This is likely to encourage other companies to look more closely at the unique needs of users and how to bridge the gaps, sparking the development of new wireless flash drives that are more secure, more remote, and able to store more without the use of internet.
Anthropologist Robin Dunbar has used his research in primates over recent decades to argue that there is a cognitive limit to the number of social relationships that an individual can maintain and hence a natural limit to the breadth of their social group. In humans, he has proposed that this number is 150, the so-called “Dunbar’s number”.
In the modern organisation, relationships are maintained using data. It doesn’t matter whether it is the relationship between staff and their customers, tracking vendor contracts, the allocation of products to sales teams or any other of the literally thousands of relationships that exist, they are all recorded centrally and tracked through the data that they throw off.
Social structures have evolved over thousands of years using data to deal with the inability of groups of more than 150 to effectively align. One of the best examples of this is the 11th century Doomsday Book ordered by William the Conqueror. Fast forward to the 21st century and technology has allowed the alignment of businesses and even whole societies in ways that were unimaginable 50 years ago.
Just as a leadership team needs to have a group of people that they relate to that falls within the 150 of Dunbar’s number, they also need to rely on information which allows the management system to extend that span of control. For the average executive, and ultimately for the average executive leadership team, this means that they can really only keep a handle on 150 “aspects” of their business, reflected in 150 “key data elements”. These elements anchor data sets that define the organisation.
Key Data Elements
To overcome the constraints of Dunbar’s number, mid-twentieth century conglomerates relied on a hierarchy with delegated management decisions whereas most companies today have heavily centralised decision making which (mostly) delivers a substantial gain in productivity and more efficient allocation of capital. They can only do this because of the ability to share information efficiently through the introduction of information technology across all layers of the enterprise.
This sharing, though, is dependent on the ability of an executive to remember what data is important. The same constraint of the human brain to know more than 150 people also applies to the use of that information. It is reasonable to argue that the information flows have the same constraint as social relationships.
Observing hundreds of organisations over many years, the variety of key data elements is wide but their number is consistently in the range of one to a few hundred. Perhaps topping out at 500, the majority of well-run organisations have nearer to 150 elements dimensioning their most important data sets.
While decisions are made through metrics, it is the most important key data elements that make up the measures and allow them to be dimensioned.
Although organisations have literally hundreds of thousands of different data elements they record, only a very small number are central to the running of the enterprise. Arguably, the centre can only keep track of about 150 and use them as a core of managing the business.
Another way of looking at this is that the leadership team (or even the CEO) can really only have 150 close relationships. If each relationship has one assigned data set or key data element they are responsible for then the overall organisation will have 150.
Choosing the right 150
While most organisations have around 150 key data elements that anchor their most important information, few actually know what they are. That’s a pity because the choice of 150 tells you a lot about the organisation. If the 150 don’t encompass the breadth of the enterprise then you can gain insight into what’s really important to the management team. If there is little to differentiate the key data elements from those that a competitor might choose then the company may lack a clear point of difference and be overly dependent on operational excellence or cost to gain an advantage.
Any information management initiative should start by identifying the 150 most important elements. If they can’t narrow the set down below a few hundred, they should be suspicious they haven’t gotten to the core of what’s really important to their sponsors. They should then look to ask the question of whether these key data elements span the enterprise or pick organisational favourites; whether they offer differentiation or are “me too” and whether they are easy or hard for a competitor to emulate.
The identification of the 150 key data elements provides a powerful foundation for any information and business strategy. Enabling a discussion on how the organisation is led and managed. While processes evolve quickly, the information flows persist. Understanding the 150 allows a strategist to determine whether the business is living up to its strategy or if its strategy needs to be adjusted to reflect the business’s strengths.
Data profiling is an excellent diagnostic method for gaining additional understanding of the data. Profiling the source data helps inform both business requirements definition and detailed solution designs for data-related project, as well as enabling data issues to be managed ahead of project implementation.
Profiling of a data set will be measured with reference to and agreed Data Quality Dimensions (e.g. per those proposed in the recent DAMA white paper).
Profiling may be required at several levels:
• Simple profiling with a single table (e.g. Primary Key constraint violations)
• Medium complexity profiling across two or more interdependent tables (e.g. Foreign Key violations)
• Complex profiling across two or more data sets, with applied business logic (e.g. reconciliation checks)
Note that field-by-field analysis is required to truly understand the data gaps.
Any data profiling analysis must not only identify the issues and underlying root causes, but must also identify the business impact of the data quality problem (measured by effectiveness, efficiency, risk inhibitors). This will help identify any value in remediating the data – great for your data quality Business Case. Root cause analysis also helps identify any process outliers and and drives out requirements for remedial action on managing any identified exceptions.
Be sure to profile your data and take baseline measures before applying any remedial actions – this will enable you to measure the impact of any changes.
I strongly recommend Data Quality Profiling and root-cause analysis to be undertaken as an initiation activity as part of all data warehouse, master data and application migration project phases.
Over the years, I’ve tended to find that asking any individual or group the question “What data/information do you want?” gets one of two responses:
“I don’t know.” Or;
“I don’t know what you mean by that.”
End of discussion, meeting over, pack up go home, nobody is any the wiser. Result? IT makes up the requirements based on what they think the business should want, the business gets all huffy because IT doesn’t understand what they need, and general disappointment and resentment ensues.
Clearly for Information Management & Business Intelligence solutions, this is not a good thing.
So I’ve stopped asking the question. Instead, when doing requirements gathering for an information project, I go through a workshop process that follows the following outline agenda:
Context setting: Why information management / Business Intelligence / Analytics / Data Governance* is generally perceived to be a “good thing”. This is essentially a very quick précis of the BI project mandate, and should aim at putting people at ease by answering the question “What exactly are we all doing here?”
(*Delete as appropriate).
Business Function & Process discovery: What do people do in their jobs – functions & tasks? If you can get them to explain why they do those things – i.e. to what end purpose or outcome – so much the better (though this can be a stretch for many.)
Challenges: what problems or issues do they currently face in their endeavours? What prevents them from succeeding in their jobs? What would they do differently if they had the opportunity to do so?
Opportunities: What is currently good? Existing capabilities (systems, processes, resources) are in place that could be developed further or re-used/re-purposed to help achieve the desired outcomes?
Desired Actions: What should happen next?
As a consultant, I see it as part of my role to inject ideas into the workshop dialogue too, using a couple of question forms specifically designed to provoke a response:
“What would happen if…X”
“Have you thought about…Y”
“Why do you do/want…Z”.
Notice that as the workshop discussion proceeds, the participants will naturally start to explore aspects that relate to later parts of the agenda – this is entirely ok. The agenda is there to provide a framework for the discussion, not a constraint. We want people to open up and spill their guts, not clam up. (Although beware of the “rambler” who just won’t shut up but never gets to the point…)
Notice also that not once have we actively explored the “D” or “I” words. That’s because as you explore the agenda, any information requirements will either naturally fall out of the discussion as it proceed, or else you can infer the information requirements arising based on the other aspects of the discussion.
As the workshop attendees explore the different aspects of the session, you will find that the discussion will touch upon a number of different themes, which you can categorise and capture on-the-fly (I tend to do this on sheets of butchers paper tacked to the walls, so that the findings are shared and visible to all participants.). Comments will typically fall into the following broad categories:
* Functions: Things that people do as part of doing business.
* Stakeholders: people who are involved (including helpful people elsewhere in the organisation – follow up with them!)
* Inhibitors: Things that currently prevent progress (these either become immediate scope-change items if they are show-stoppers for the current initiative, or else they form additional future project opportunities to raise with management)
* Enablers: Resources to make use of (e.g. data sets that another team hold, which aren’t currently shared)
* Constraints: “non-negotiable” aspects that must be taken into account. (Note: I tend to find that all constraints are actually negotiable and can be overcome if there is enough desire, money and political will.)
* Considerations: Things to be aware of that may have an influence somewhere along the line.
* Source systems: places where data comes from
* Information requirements: Outputs that people want
Here’s a (semi) fictitious example:
e.g. ADD: “What does your team do?”
Workshop Victim Participant #1: “Well, we’re trying to reconcile the customer account balances with the individual transactions.”
ADD: And why do you wan to do that?
Workshop Victim Participant #2: “We think there’s a discrepancy in the warehouse stock balances, compared with what’s been shipped to customers. The sales guys keep their own database of customer contracts and orders and Jim’s already given us dump of the data, while finance run the accounts receivables process. But Sally the Accounts Clerk doesn’t let the numbers out under any circumstances, so basically we’re screwed.”
Functions: Sales Processing, Contract Mangement, Order Fulfilment, Stock Management, Accounts Receivable.
Stakeholders: Warehouse team, Sales team (Jim), Finance team.
Inhibitors: Finance don’t collaborate.
Enablers: Jim is helpful.
Source Systems: Stock System, Customer Database, Order Management, Finance System.
Information Requirements: Orders (Quantity & Price by Customer, by Salesman, by Stock Item), Dispatches (Quantity & Price by Customer, by Salesman, by Warehouse Clerk, by Stock Item), Financial Transactions (Value by Customer, by Order Ref)
You will also probably end up with the attendees identifying a number of immediate self-assigned actions arising from the discussion – good ideas that either haven’t occurred to them before or have sat on the “To-Do” list. That’s your workshop “value add” right there….
Workshop Victim Participant #1: “I could go and speak to the Financial Controller about getting access to the finance data. He’s more amenable to working together than Sally, who just does what she’s told.”
Happy information requirements gathering!
For years now the physics community has been taking the leap into computer science through the pursuit of the quantum computer. As weird as the concepts underpinning the idea of such a device are, even weirder is the threat that this machine of the future could pose to business and government today.
There are many excellent primers on quantum computing but in summary physicists hope to be able to use the concept of superposition to allow one quantum computer bit (called a “qubit”) to carry the value of both zero and one at the same time and also to interact with other qubits which also have two simultaneous values.
A quantum computer would be hoped to come up answers to useful questions with far fewer processing steps than a conventional computer as many different combinations would be evaluated at the same time. Algorithms that use this approach are generally in the category of solution finding (best paths, factors and other similar complex problems).
As exciting as the concept of a quantum computer sounds, one of the applications of this approach would be a direct threat to many aspects of modern society. Shor’s algorithm provides an approach to integer factorisation using a quantum computer which is like a passkey to the encryption used across our digital world.
The cryptography techniques that dominate the internet are based on the principle that it is computationally infeasible to find the factors of a large number. However, Shor’s algorithm provides an approach that would crack the code if a quantum computer could actually be built.
Does it matter today?
We’re familiar with businesses of today being disrupted by new technology tomorrow. But just as weird, as the concept of quantum superposition is the possibility that the computing of tomorrow could disrupt the business of today!
We are passing vast quantities of data across the internet. Much of it is confidential and encrypted. Messages that we are confident will remain between the sender and receiver. These include payments, conversations and, through the use of virtual private networks, much of the internal content of both companies and government.
It is possible that parties hoping to crack this content in the future are taking the opportunity to store it today. Due to the architecture of the internet, there is little to stop anyone from intercepting much of this data and storing it without anyone having any hint of its capture.
In the event that a quantum computer, capable of running Shor’s algorithm, is built the first thought will need to be to ask what content could have been intercepted and what secrets might be open to being exposed. The extent of the exposure could be so much greater than might appear at first glance.
How likely is a quantum computer to be built?
There is one commercially available device marketed as a quantum computer, called the D-Wave (from D-Wave Systems). Sceptics, however, have published doubts that it is really operating based on the principles of Quantum Computing. Even more importantly, there is no suggestion that it is capable of running Shor’s algorithm or that it is a universal quantum computer.
There is a great deal of evidence that the principles of quantum computing are consistent with the laws of physics as they have been uncovered over the past century. At the same time as physics is branching into computing, the information theory branch of computing is expanding into physics. Many recent developments in physics are borrowing directly from the information discipline.
It is possible, though, that information theory as applied to information management problems could provide confidence that a universal quantum computer is not going to be built.
Information entropy was initially constructed by Claude Shannon to provide a tool for quantifying information. While the principles were deliberately analogous to thermal entropy, it has subsequently become clear that the information associated with particles is as important as the particles themselves. Chapter 6 of my book, Information-Driven Business, explains these principles in detail.
It turns out that systems can be modelled on information or thermal entropy interchangeably. As a result, a quantum computer that needs to obey the rules of information theory also needs to obey the laws of thermal entropy.
The first law of thermodynamics was first written by Rudolf Clausius in 1850 as: “In all cases in which work is produced by the agency of heat, a quantity of heat is consumed which is proportional to the work done; and conversely, by the expenditure of an equal quantity of work an equal quantity of heat is produced”.
Rewording over time has added sophistication but fundamentally, the law is a restatement of the conservation of energy. Any given system cannot increase the quantity of energy or, as a consequent of the connection between thermal and information entropy, the information that it contains.
Any computing device, regardless of whether it is classical or quantum in nature, consumes energy based on the amount of information that is being derived as determined by the information entropy of the device. While it is entirely possible that massive quantities of information could be processed in parallel, there is no escaping the requirement to adhere to this requirement with a quantum computer truly delivering this level of computing requiring the same order of energy as the thousands or even millions of classical computers required to deliver the same result.
I anticipate that developers of quantum computers will either find that the quantity of energy required to process is prohibitive or that their qubits will constantly frustrate their every effort to maintain coherence for long enough to complete useful algorithms.
Could I be wrong?
Definitely! In a future post I propose to create a scorecard tracking the predictions I’ve made over the years.
However, anyone who claims to really understand quantum mechanics is lying. Faced with the unbelievably complex wave functions required for quantum mechanics which seem to defy any real world understanding, physicist David Mermin famously advised his colleagues to just “Shut up and calculate!”.
Because of the impact of a future quantum computer on today’s business, the question is far from academic and deserves almost as much investment as the exploration of these quantum phenomena do in their own right.
At the same time, the investments in quantum computing are far from wasted. Even if no universal quantum computer is possible, the specialised devices that are likely to follow the D-Wave machine are going to prove extremely useful in their own right.
Ultimately, the convergence of physics and computer science can only benefit both fields as well as the business and government organisations that depend on both.
Why estimating Data Quality profiling doesn’t have to be guess-work
Data Management lore would have us believe that estimating the amount of work involved in Data Quality analysis is a bit of a “Dark Art,” and to get a close enough approximation for quoting purposes requires much scrying, haruspicy and wet-finger-waving, as well as plenty of general wailing and gnashing of teeth. (Those of you with a background in Project Management could probably argue that any type of work estimation is just as problematic, and that in any event work will expand to more than fill the time available…).
However, you may no longer need to call on the services of Severus Snape or Mystic Meg to get a workable estimate for data quality profiling. My colleague from QFire Software, Neil Currie, recently put me onto a post by David Loshin on SearchDataManagement.com, which proposes a more structured and rational approach to estimating data quality work effort.
At first glance, the overall methodology that David proposes is reasonable in terms of estimating effort for a pure profiling exercise – at least in principle. (It’s analogous to similar “bottom/up” calculations that I’ve used in the past to estimate ETL development on a job-by-job basis, or creation of standards Business Intelligence reports on a report-by-report basis).
I would observe that David’s approach is predicated on the (big and probably optimistic) assumption that we’re only doing the profiling step. The follow-on stages of analysis, remediation and prevention are excluded – and in my experience, that’s where the real work most often lies! There is also the assumption that a pre-existing checklist of assessment criteria exists – and developing the library of quality check criteria can be a significant exercise in its own right.
However, even accepting the “profiling only” principle, I’d also offer a couple of additional enhancements to the overall approach.
Firstly, even with profiling tools, the inspection and analysis process for any “wrong” elements can go a lot further than just a 10-minute-per-item-compare-with-the-checklist, particularly in data sets with a large number of records. Also, there’s the question of root-cause diagnosis (And good DQ methods WILL go into inspecting the actual member records themselves). So for contra-indicated attributes, I’d suggest a slightly extended estimation model:
* 10mins: for each “Simple” item (standard format, no applied business rules, fewer that 100 member records)
* 30 mins: for each “Medium” complexity item (unusual formats, some embedded business logic, data sets up to 1000 member records)
* 60 mins: for any “Hard” high-complexity items (significant, complex business logic, data sets over 1000 member records)
Secondly, and more importantly – David doesn’t really allow for the human factor. It’s always people that are bloody hard work! While it’s all very well to do a profiling exercise in-and-of-itself, the result need to be shared with human beings – presented, scrutinised, questioned, validated, evaluated, verified, justified. (Then acted upon, hopefully!) And even allowing for the set-aside of the “Analysis” stages onwards, then there will need to be some form of socialisation within the “Profiling” phase.
That’s not a technical exercise – it’s about communication, collaboration and co-operation. Which means it may take an awful lot longer than just doing the tool-based profiling process!
How much socialisation? That depends on the number of stakeholders, and their nature. As a rule-of-thumb, I’d suggest the following:
* Two hours of preparation per workshop ((If the stakeholder group is “tame”. Double it if there are participants who are negatively inclined).
* One hour face-time per workshop (Double it for “negatives”)
* One hour post-workshop write-up time per workshop
* One workshop per 10 stakeholders.
* Two days to prepare any final papers and recommendations, and present to the Steering Group/Project Board.
That’s in addition to David’s formula for estimating the pure data profiling tasks.
Detailed root-cause analysis (Validate), remediation (Protect) and ongoing evaluation (Monitor) stages are a whole other ball-game.
Alternatively, just stick with the crystal balls and goats – you might not even need to kill the goat anymore…
A “foreign” colleague of mine once told me a trick his English language teacher taught him to help him remember the “questioning words” in English. (To the British, anyone who is a non-native speaker of English is “foreign.” I should also add that as a Scotsman, English is effectively my second language…).
“Five Whiskies in a Hotel” is the clue – i.e. five questioning words begin with “W” (Who, What, When, Why, Where), with one beginning with “H” (How).
These simple question words give us a great entry point when we are trying to capture the initial set of issues and concerns around data governance – what questions are important/need to be asked.
* What data/information do you want? (What inputs? What outputs? What tests/measures/criteria will be applied to confirm whether the data is fit for purpose or not?)
* Why do you want it? (What outcomes do you hope to achieve? Does the data being requested actually support those questions & outcome? Consider Efficiency/Effectiveness/Risk Mitigation drivers for benefit.)
* When is the information required? (When is it first required? How frequently? Particular events?)
* Who is involved? (Who is the information for? Who has rights to see the data? Who is it being provided by? Who is ultimately accountable for the data – both contents and definitions? Consider multiple stakeholder groups in both recipients and providers)
* Where is the data to reside? (Where is it originating form? Where is it going to?)
* How will it be shared? (How will the mechanisms/methods work to collect/collate/integrate/store/disseminate/access/archive the data? How should it be structured & formatted? Consider Systems, Processes and Human methods.)
Clearly, each question can generate multiple answers!
Aside: in the Doric dialect of North-East of Scotland where I originally hail from, all the “question” words begin with “F”:
Fit…? (What?) e.g. “Fit dis yon feel loon wint?” (What does that silly chap want?)
Fit wye…? (Why?) e.g. “Fit wye div ye wint a’thin’?” (Why do you want everything?)
Fan…? (When?) e.g. “Fan div ye wint it?” (When you you want it?)
Fa…? (Who?) e.g. “Fa div I gie ‘is tae?” (Who do I give this to?)
Far…? (Where?) e.g. “Far aboots dis yon thingumyjig ging?” (Where exactly does that item go?)
Foo…? (How?) e.g. “Foo div ye expect me tae dae it by ‘e morn?” (How do you expect me to do it by tomorrow?)
Whatever your native language, these key questions should get the conversation started…
Remember too, the homily by Rudyard Kipling:
Is this the kind of response you get when you mention to people that you work in Data Quality?!
Let’s be honest here. Data Quality is good and worthy, but it can be a pretty dull affair at times. Information Management is something that “just happens”, and folks would rather not know the ins-and-outs of how the monthly Management Pack gets created.
Yet I’ll bet that they’ll be right on your case when the numbers are “wrong”.
So here’s an idea. The next time you want to engage someone in a discussion about data quality, don’t start by discussing data quality. Don’t mention the processes of profiling, validating or cleansing data. Don’t talk about integration, storage or reporting. And don’t even think about metadata, lineage or auditability. Yaaaaaaaaawn!!!!
Instead of concentrating on telling people about the practitioner processes (which of course are vital, and fascinating no doubt if you happen to be a practitioner), think about engaging in a manner that is relevant to the business community, using language and examples that are business-oriented. Make it fun!
Once you’ve got the discussion flowing in terms of the impacts, challenges and inhibitors that get in the way of successful business operations, then you can start to drill into the underlying data issues and their root causes. More often than not, a data quality issue is symptomatic of a business process failure rather than being an end in itself. By fixing the process problem, the business user gains a benefit, and the data in enhanced as a by-product. Everyone wins (and you didn’t even have to mention the dreaded DQ phrase!)
Data Quality is a human thing – that’s why its hard. As practitioners, we need to be communicators. Lead the thinking, identify the impact and deliver the value.
Now, that’s interesting!
Just recently, Gary Allemann posted a guest article on Nicola Askham’s Blog, which made an analogy between Data Governance and the London Tube map. (Nicola also on Twitter. See also Gary Allemann’s blog, Data Quality Matters.)
Up until now, I’ve always struggled to think of a way to represent all of the different aspects of Information Management/Data Governance; the environment is multi-faceted, with the interconnections between the component capabilities being complex and not hierarchical. I’ve sometimes alluded to there being a network of relationship between elements, but this has been a fairly abstract concept that I’ve never been able to adequately illustrate.
And in a moment of perspiration, I came up with this…
I’ll be developing this further as I go but in the meantime, please let me know what you think.
(NOTE: following on from Seth Godin’ plea for more sharing of ideas, I am publishing the Information Management Tube Map under Creative Commons License Attribution Share-Alike V4.0 International. Please credit me where you use the concept, and I would appreciate it if you could reference back to me with any changes, suggestions or feedback. Thanks in advance.)
When I was a kid growing up in the UK, Paul Daniels was THE television magician. With a combination of slick high drama illusions, close-up trickery and cheeky end-of-the-pier humour, (plus a touch of glamour courtesy of The Lovely Debbie McGee TM), Paul had millions of viewers captivated on a weekly basis and his cheeky catch-phrases are still recognised to this day.
Of course. part of the fascination of watching a magician perform is to wonder how the trick works. “How the bloody hell did he do that?” my dad would splutter as Paul Daniels performed yet another goofy gag or hair-raising stunt (no mean fear, when you’re as bald as a coot…) But most people don’t REALLY want to know the inner secrets, and ever fewer of us are inspired to spray a riffle-shuffled a pack of cards all over granny’s lunch, stick a coin up their nose or grab the family goldfish from its bowl and hide it in the folds of our nether-garments. (Um, yeah. Let’s not go there…)
Penn and Teller are great of course, because they expose the basic techniques of really old, hackneyed tricks and force more innovation within the magician community. They’re at their most engaging when they actually do something that you don’t get to see the workings of. Illusion maintained, audience entertained.
As data practitioners, I think we can learn a few of these tricks. I often see us getting too hot-and-bothered about differentiating data, master data, reference data, metadata, classification scheme, taxonomy, dimensional vs relational vs data vault modelling etc. These concepts are certainly relevant to our practitioner world, but I don’t necessarily believe they need to be exposed at the business-user level.
For example, I often hear business users talking about “creating the metadata” for an event or transaction, when they’re talking about compiling the picklist of valid descriptive values and mapping these to the contextualising descriptive information for that event (which by my reckoning, really means compiling the reference data!). But I’ve found that business people really aren’t all that bothered about the underlying structure or rigour of the modelling process.
That’s our job.
There will always be exceptions. My good friend and colleague Ben Bor is something a special case and has the talent to combine data management and magic.
But for the rest of us mere mortals, I suggest that we keep the deep discussion of data techniques for the Data Magic Circle, and just let the paying customers enjoy the show….
TODAY: Fri, March 24, 2017March2017