30 Apr 2014
While data visualization often produces pretty pictures, as Phil Simon explained in his new book The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions, “data visualization should not be confused with art. Clarity, utility, and user-friendliness are paramount to any design aesthetic.” Bad data visualizations are even worse than bad art since, as Simon says, “they confuse people more than they convey information.”
Simon explained how data scientist Melinda Thielbar recommends using data visualization to help an analyst communicate with a nontechnical audience, as well as help the data communicate with the analyst.
“Visualization is a great way to let the data tell a story,” Thielbar explained. “It’s also a great way for analysts to fool themselves into believing the story they want to believe.” This is why she recommends developing the visualizations at the beginning of the analysis to allow the visualizations that really illustrate the story behind the data to stand out, a process she calls “building windows into the data.” When you look through a window, you may not like what you see.
“Data visualizations may include bad, suspect, duplicate, or incomplete data,” Simon explained. This can be a good thing, however, since data visualizations “can help users identify fishy information and purify data faster than manual hunting and pecking. Data quality is a continuum, not a binary. Use data visualization to improve data quality.” Even when you are looking at what appears to be the pretty end of the continuum, Simon cautioned that “just because data is visualized doesn’t necessarily mean that it is accurate, complete, or indicative of the right course of action.”
Especially when dealing with volume aspect of big data, data visualization can help find outliers faster. While detailed analysis is needed to determine whether the outlier is a business insight or a data quality issue, data visualization can help you shake those needles out of the haystack and into a clear field of vision.
Among its many other uses, which Simon illustrates well in his book, finding ugly data with pretty pictures is one way data visualization can be used for improving data quality.