Calls for increased transparency and accountability lead government agencies around the world to make more information available to the public as open data. As more people accessed this information, it quickly became apparent that data quality and data governance issues complicate putting open data to use.
“It’s an open secret,” Joel Gurin wrote, “that a lot of government data is incomplete, inaccurate, or almost unusable. Some agencies, for instance, have pervasive problems in the geographic data they collect: if you try to map the factories the EPA regulates, you’ll see several pop up in China, the Pacific Ocean, or the middle of Boston Harbor.”
A common reason for such data quality issues in the United States government’s data is what David Weinberger wrote about Data.gov. “The keepers of the site did not commit themselves to carefully checking all the data before it went live. Nor did they require agencies to come up with well-formulated standards for expressing that data. Instead, it was all just shoveled into the site. Had the site keepers insisted on curating the data, deleting that which was unreliable or judged to be of little value, Data.gov would have become one of those projects that each administration kicks further down the road and never gets done.”
Of course, the United States is not alone in either making government data open (about 60 countries have joined the Open Government Partnership) or having it reveal data quality issues. Victoria Lemieux recently blogged about data issues hindering the United Kingdom government’s Open Data program in her post Why we’re failing to get the most out of open data.
One of the data governances issues Lemieux highlighted was data provenance. “Knowing where data originates and by what means it has been disclosed,” Lemieux explained, “is key to being able to trust data. If end users do not trust data, they are unlikely to believe they can rely upon the information for accountability purposes.” Lemieux explained that determining data provenance can be difficult since “it entails a good deal of effort undertaking such activities as enriching data with metadata, such as the date of creation, the creator of the data, who has had access to the data over time. Full comprehension of data relies on the ability to trace its origins. Without knowledge of data provenance, it can be difficult to interpret the meaning of terms, acronyms, and measures that data creators may have taken for granted, but are much more difficult to decipher over time.”
I think the bad press about open data is a good thing because open data is opening eyes to two basic facts about all data. One, whenever data is made available for review, you will discover data quality issues. Two, whenever data quality issues are discovered, you will need data governance to resolve them. Therefore, the reason we’re failing to get the most out of open data is the same reason we fail to get the most out of any data.
No doubt you’ve heard of bring your own device (BYOD) already. It’s been nearly impossible to avoid the hype surrounding BYOD and all the benefits supporters say it offers. While companies may be focused on BYOD, another trend has slowly but steadily been catching on. It’s called bring your own network (BYON), and while it might sound similar to BYOD, it’s actually creating even more headaches for IT departments. While BYOD is something businesses can adopt and regulate, BYON mostly operates in the shadows. Bring your own network essentially means employees are using their own mobile devices’ 3 and 4G capabilities to create or access wireless hotspots. This is often done when workers determine that the current business network does not meet the demands of their jobs. As you can imagine this trend is causing more than its fair share of problems, particularly when it comes to security.
One of the main points of contention with BYON is how it allows employees to completely avoid the corporate network. Perhaps they do it because they think the company’s network is too slow for what they need. Or maybe they use a different network to gain access to sites that have been blocked. It’s easy to see that avoiding network filters and security measures can lead to significant problems for businesses. Security measures, such as firewalls and antivirus protection, are put in place to protect the network and the devices that have been granted access to it. Employees that use their devices to avoid the corporate network may also represent a weak point that hackers can attack and exploit, gaining access to data and business systems should the workers ever connect to the regular network at any point.
When a business enacts a BYOD policy, it allows them to carefully monitor and create controls for all devices being used for work. However, by using BYON, many employees choose to go behind the IT department’s back, using devices that haven’t been outfitted with the sometimes necessary controls that can improve BYOD security. Without these controls, IT workers will have no way to monitor each employee’s device, nor can they install the protective measures that can serve as a deterrent to more security threats. This is especially important because when employees use their mobile devices for work purposes, they also pose the risk of accidentally accessing unauthorized content or downloading malicious apps and malware. Without security controls, IT has no way to detect malware and no way to wipe a device that gets lost or stolen. Perhaps some employees see this as a benefit, but the fact remains that a device without controls is a bigger risks than one without.
BYON also increases the risks of data leakage. With insecure access points in play, hackers will likely have an easier time infiltrating a mobile device and perusing its contents. If an employee uses the device for work, it may contain company data and other sensitive information that can be used by a hacker to spread further damage. IT departments are normally able to monitor data within the company, but when it comes to devices connected to other networks, IT has no way to ensure data security. Devices that utilize BYON are essentially outside the IT department’s jurisdiction, and that can lead to numerous problems usually not foreseen by the employee.
These security challenges that stem from bring your own network are certainly troubling, but there are solutions that companies can put in place. Many businesses may choose to fully and unequivocally embrace BYOD by establishing clear and precise guidelines over what is permitted while also communicating these policies to employees. Many workers use BYON not knowing it is against company policy, so clear communication can help avoid these problems. Companies should also run business risk assessments to more accurately identify where the weak points are in their network and what data might be in danger of leaking or getting stolen. An outright ban on Wi-Fi hot spots may also be necessary, but that’s for the most extreme cases.
Bring your own network is usually a response to restrictive network policies. Employees want to use their own devices at peak performance outside network restrictions, but the consequences of doing this usually lead to more security problems. Activities outside the network can actually create bigger security threats than what companies see with BYOD, so it’s important for businesses to address BYON problems before they become damaging. An early response can help a company direct its focus to other important matters while keeping networks and systems safe.
Hadoop is an excellent tools for collecting and sorting massive volumes of data, but businesses must also use analytics and visualization tools on top of Hadoop in order to reap the full benefits from big data. Here’s a quick list of apps to successfully manage and leverage the massive amounts of information generated as organizations grow.
1. Roambi: With this application, mobile workers have the ability to access and analyze the same business data they use in the office in order to make smart decisions quickly. Mobile workers need more ways to manipulate data and not be limited by business tools which are often stripped down for mobile use. Enterprise level mobile workers cannot afford to lose any capabilities if they are expected to accomplish business objectives in a timely fashion.
Roambi’s goal is to change the mobile business app landscape by improving the productivity and decision-making of the mobile workforce. The app changes the way people share, interact with and present data from the mobile perspective.
The Phoenix Suns, a professional basketball team, is one such organization who uses analytics for both on-the-court and business decisions. Although skeptical at first, the Suns have found Roambi to be both valuable and easy to use in their business decisions. Utilizing this Big Data app has boosted sales and marketing while being able to make the best decisions possible on the fly.
2. Datameer: Although it seems basic on the surface, Datameer surpasses Excel and other spreadsheet programs by allowing the user link to active data sources as well as import flat files and joining two tabs together into a third like joining tables in a database.
Datameer makes it possible to integrate all data, analyze all data and visualize data helping organizations achieve their respective goals. It is purpose-built for Hadoop enabling the raw data to move to new insights quickly using the different aspects of Hadoop like Oozie.
Big Data integration is made possible with built-in connectors to all common structured and unstructured data sources. Datameer eliminates the need for ETL and pre-defined schemas.
Analyzing big data is made simple because all information is easily integrated into Hadoop with Datameer’s data integration wizard. It’ll help organizations understand the important questions and understand the effect of every transformation made and make the proper analysis adjustments as more data is processed.
Datameer also makes it possible for infographics to be produced with the WYSIWYG Business Infographic Designer. Images can be imported and videos can be embedded while free-form text can be written to organize big data in an aesthetic way.
3. SAS Visual Analytics: SAS seeks to help organizations find answers to questions quickly and then continue to ask more questions helping companies achieve their big data goals. This application uses guided exploration, in-memory processing and advanced data visualization to make it clear. SAS seeks to be a scalable solution of any organization handling data of any size.
With mobile tethering, reports can be explored without internet connectivity. Mobile workers and executives can easily access and explore dashboards anytime regardless of location. SAS mobile apps are available for iPad and Android.
Social media analytics are made easier with Visual Analytics. It can be used to tap into the millions of tweets sent each day to track customer comments in call logs and identify the “hot topics” of the day. SAS makes it possible to pick up on the buzzwords that could lead to the next round of sales or mitigate a current branding issue.
4. Esri ArcGIS: The GIS stands for Geographic Information making it easy to create data-driven maps and visualizations. Esri ArcGIS has set to enable organizations to visualize and analyze big data in a way that reveals patterns, trends and relationships that reports don’t. Esri tech can pull disparate places, streams or even web logs.
Esri seeks to expose patterns through the use of maps which can prove beneficial to organizations regardless of industry or focus. Retailers identify the competition is and where promotions are the most effective. Even banks understand why loans are defaulting and climate change scientists can see the impact of shifting weather patterns. Esri’s analytics also perform predictive modeling using spatially enabled big data to help organizations develop strategies from if/then scenarios.
This is just a quick overview of some of the big data apps available. As Hadoop and big data adoption becomes more mainstream, more tools for analytics and visualization will likely surface. What big data app have you found effective?
The way networks have been built and managed for years may be about to change. That may not come as a big surprise considering how quickly technology evolves from year to year, but the fact remains that networks have been done a certain way for a long time and it may not be long before things are done differently. One of the more popular topics being discussed of late is that of software-defined networking (SDN). The discussions largely center on the benefits SDN can bring to new networking strategies, but any talk of networks will naturally flow into the issue of security. SDN may be a new approach to building, designing, and managing vast networks, but before it’s implemented on a larger scale, its impact on network security will have to be examined as its benefits and drawbacks are properly analyzed.
A Look at SDN
To better understand what software-defined networking is, it’s best to compare it with traditional networking practices. All traditional networks are composed of a controller, or control plane, and the physical network itself, or the data plane. At the heart of the idea of SDN is the separation of these two planes, which allows administrators to better optimize each one. Supporters of SDN say the main reason to do this is to simplify networking, making it more flexible and agile when dealing with different network flows. Management tasks are simplified, which may be applied to security issues as well. This is mostly done by using the same kind of cloud architectures used in cloud computing along with more reactionary resource allocation.
A New Approach
This new design requires a different approach from those adopting the SDN model. The traditional method had builders designing the network first, then adding in the proper security measures later. SDN, however, must be thought of as a major component of the network and designed with this in mind from the very beginning. SDN security measures then become a foundational element of the network, designed directly into the workloads and communications systems. Security isn’t looked at as just another aspect of the network to be dealt with later but rather as one more component to build the rest of the network around.
Benefits of SDN
On the surface, this sounds like a refreshing and effective new approach to addressing network security, and there are certainly benefits that come with it. With the traditional network, firewalls were often difficult to place since network boundaries were ill-defined. Software-defined networking can address this frustrating quirk by actually routing all network traffic through a central firewall. This re-routing also makes data analysis from network traffic much easier, which in turn can be used to detect security threats. SDN also allows for stronger encryption to be used within the designed framework of the network, which can increase the chances of valuable data remaining secure.
There are other ways in which SDN can improve on network security. As mentioned above, a SDN allows for a more dynamic network, which can respond to threats quickly through easy-to-manage network restructuring. SDN also provides for some handy security tools and capabilities, such as instantly enacting a quarantine around networks and endpoints that have been infiltrated by outside attackers. Software-defined networking also makes a larger number of security responses available, like emergency broadcasts, tarpits, and reflector nets.
Weakness of SDN
All of these benefits may sound like implementing SDN is a slam dunk case, but it does come with some drawbacks that are worth considering. SDN is still a new and immature technology, which means developers are still hard at work figuring out how best to properly utilize it. That also means more security vulnerabilities may become evident as time moves on, and since additional security measures can’t simply be added on like in the traditional model, some of those vulnerabilities may not be addressed. It may also be easier for hackers to launch a distributed denial of service attack (DDoS), since attackers only need to infiltrate a single device on the network. And due to the nature of SDN, if one part of the network goes down, the entire network goes down with it.
This of course doesn’t mean that all companies should shy away from SDN permanently. More advances will be made with software-defined networking which maximize its benefits while minimizing or even eliminating its weaknesses. As a new technology, there is still a lot of work to be done in optimizing it. Many businesses are pursuing the goal of better protection for their network, and SDN is just one way this goal may be achieved. Time will tell if the reality of SDN will live up to its potential.
The second of the five biggest data myths debunked by Gartner is many IT leaders believe that the huge volume of data that organizations now manage makes individual data quality flaws insignificant due to the law of large numbers.
Their view is that individual data quality flaws don’t influence the overall outcome when the data is analyzed because each flaw is only a tiny part of the mass of big data. “In reality,” as Gartner’s Ted Friedman explained, “although each individual flaw has a much smaller impact on the whole dataset than it did when there was less data, there are more flaws than before because there is more data. Therefore, the overall impact of poor-quality data on the whole dataset remains the same. In addition, much of the data that organizations use in a big data context comes from outside, or is of unknown structure and origin. This means that the likelihood of data quality issues is even higher than before. So data quality is actually more important in the world of big data.”
“Convergence of social, mobile, cloud, and big data,” Gartner’s Svetlana Sicular blogged, “presents new requirements: getting the right information to the consumer quickly, ensuring reliability of external data you don’t have control over, validating the relationships among data elements, looking for data synergies and gaps, creating provenance of the data you provide to others, spotting skewed and biased data. In reality, a data scientist job is 80% of a data quality engineer, and just 20% of a researcher, dreamer, and scientist.”
This aligns with Steve Lohr of The New York Times reporting that data scientists are more often data janitors since they spend from 50 percent to 80 percent of their time mired in the more mundane labor of collecting and preparing unruly big data before it can be mined to discover the useful nuggets that provide business insights.
“As the amount and type of raw data sources increases exponentially,” Stefan Groschupf blogged, “data quality issues can wreak havoc on an organization. Data quality has become an important, if sometimes overlooked, piece of the big data equation. Until companies rethink their big data analytics workflow and ensure that data quality is considered at every step of the process—from integration all the way through to the final visualization—the benefits of big data will only be partly realized.”
So no matter what you heard or hoped, the truth is big data needs data quality too.
In organisations around the world employees are accidently merging their personal and professional cloud applications with dire results. Some of the issues include the routing of sensitive text messages to family members and the replication of confidential documents onto the servers of competitors.
Our personal and work lives are merging. When done well, this can be a huge boost to our productivity and personal satisfaction. The rise of the smart mobile device has been a major part of this merge, driving Chief Information Officers towards various flavours of Bring Your Own Device (BYOD) policies.
With the advent of a myriad of cloud services delivered over the internet, our individual cloud architectures are becoming far more complex. The increasing trend to move beyond BYOD to Bring Your Own Application (BYOA) means that the way individuals configure their own personal technology has the potential to directly impact their employer.
Bringing applications and data to work
Many of our assumptions about workplace technology are based on the world of the past where knowledge workers were rare, staff committed to one employer at a time and IT was focused on automating back office processes.
In this environment, one employee could be swapped out for another easily and all were prepared to conform to a defined way of working. Our offices were perhaps best compared to Henry Ford’s production line of the early twentieth century where productivity required conformity.
It’s no wonder that IT has put such a high value on a Standard Operating Environment.
However, new styles of working have taken hold (see Your insight might protect your job). Businesses and government are employing more experts on a shared basis. An increasing proportion of the workforce is best described as being made up of “knowledge workers” and a sizeable number of these people are choosing to work in new ways including spreading their week over several organisations. Even fulltime staff find that that their employers have entered into complex joint ventures meaning that the definition of the enterprise is constantly shifting.
Personal productivity is complex and highly dependent on the individual. The approach that works for one person is a complete anathema for another. Telling people to work in a standard way is like adding a straightjacket to their productivity. Employees should be allowed to use their favourite tools to schedule their time, author documents, create presentations and take notes.
Not only is the workplace is moving away from lowest common denominator software, the increasing integration between mobile, tablet and personal computers means that the boundaries between them are becoming less clear. It all adds up to the end of the Standard Operating Environment.
CIOs are right to worry that BYOD and BYOA will result in their data being spread out over an unmanageable number of systems and platforms. It is no longer workable to simply demand that all information be kept in a single source of truth physical database or repository. I’ve previously argued that the foundation of a proper strategy is to separate the information from the business logic or processes that create it (see Your data is in the cloud when…).
Some of the simplest approaches that CIOs have taken to manage their BYOD environment have involved virtualisation solutions where a virtual desktop with the Standard Operating Environment is run over a client which is available on many devices.
While this is progress, it barely touches the productivity question. While workers can choose the form factor of the device that suits them, they are still being forced into the straightjacket of lowest common denominator business applications.
The vendors are going to continue to provide better solutions which put the data in a standard form while allowing access to many (even competing) applications. It’s not about just one copy of the data on a database, but rather allowing controlled replication and digital watermarks that track the movement of this data including loss prevention.
While CIOs may see many downsides, the upsides go beyond the productivity of individual workers.
For example, organisations constantly struggle with managing their staff master data, but in a world of personal social media employees will effectively identify themselves (see Login with social media).
Managing software licenses, even in the most efficient organisation, is still an imperfect science at best with little motivation for users to optimise their portfolio. When employees can bring their own cloud subscriptions to work, with an agreed allowance paid to them, the choices that they make are so much more tailored to their actual needs today rather than what they might need in months or even years to come.
Organisations that have grappled with provisioning new PCs are seeing the advantages of the consumer app stores with employees self-administering deployment between devices. Cloud and hardware providers are increasingly recognising the complex nature of families and are enabling security and deployment of content between members. The good news is that even the simplest family structure is more complicated than almost any enterprise organisation chart!
We see a hint of bad architecture when staff misconfigure their iPhones and end-up with their (potentially sensitive) text messages being shared with their spouse or wider family or when contractors use their personal cloud drive working across more than one organisation. Even worse is when it goes really wrong and a ransomware breach on a personal computer infects all of these shared resources!
The breadth of services that the personal cloud covers is constantly growing. For many, it already includes their telecommunications, voicemail, data storage, diary, expense management, timesheets, authoring of office documents, messaging (email and texts), professional library, project management, diagram tools and analytics. Architecture is even beginning to matter in social media with the convergence of the tools most of us use (see The future of social networks).
Some employers fear the trend of cloud, BYOD and BYOA will lead to the loss of their organisation’s IP. The smart enterprise, however, is realising that well-managed cloud architectures and appropriate taxonomies can help rather than hinder employees to know what’s theirs and what’s yours.
In the near future you may even start choosing staff based on the quality of their personal cloud architecture!
Missed what’s been happening in the MIKE2.0 information management community? Check out our bi-weekly update:
Getting Started with the Five Phases of MIKE2.0
The MIKE2.0 Methodology has abandoned the traditional linear or waterfall approach to systems development in favor of an iterative, agile approach called continuous implementation. This approach divides the development and rollout of anentire system into a series of implementation cycles. These cycles identify and prioritize the portions of the system that can be constructed and rolled out before the entire system is complete. Each cycle also includes
- A feedback step to evaluate and prioritize the implementation results
- Strategy changes
- Improvement requests on the future implementation cycles.
Following this approach, there are five phases to the MIKE2.0 Methodology:
Feel free to check them out when you have a moment to learn how they can help improve your enterprise information management program.
This Week’s Food for Thought:
5 of the Most Common IT Security Mistakes to Watch Out For
Securing the enterprise is no easy task. Every day it seems like there are dozens of new security risks out there, threatening to shut down your company’s systems and steal valuable data. Stories of large corporations suffering from enormous data breaches probably don’t help calm those fears, so it’s important to know the risks are real and businesses must be able to respond to them. Even though enhancing security is crucial, enterprises still make a lot of mistakes while trying to shore up their systems. Here’s a look at some of the most common IT security mistakes so you’ll be better aware of what to watch out for.Read more.
Data Integration is the Schema in Between
The third of the five biggest data myths debunked by Gartner is big data technology will eliminate the need for data integration. The truth is big data technology excels at data acquisition, not data integration. This myth is rooted in what Gartner referred to as the schema on read approach used by big data technology to quickly acquire a variety of data from sources with multiple data formats. This is best exemplified by the Hadoop Distributed File System (HDFS). Unlike the predefined, and therefore predictably structured, data formats required by relational databases, HDFS is schema-less.
NoSQL vs SQL: An Overview
With the increase of big data in industries across the world through Hadoop and Hadoop Hive, numerous changes in how big data is stored and analyzed have occurred. It used to be that Structured Query Language (SQL) was the main method companies used to handle data stored in relational database management systems (RDBMS). This technology was first introduced in the 1970’s and was extremely productive for it’s time. However, since 1970, the amount and types of information available has risen and changed dramatically.
Virtualization can do a lot for a company. It can increase a business’s efficiency, doing more work with less equipment. Virtualization can also save on costs, particularly when it comes to cooling down servers and getting things back up and running after a technical disaster. That’s just scratching the surface of all the benefits virtualization technology has to offer, so it may come as a surprise that some business leaders are still hesitant to make virtualization a part of their companies. The main concern they have usually has to do with security. Moving sensitive data and programs to virtual machines can sound like a risky strategy, no matter what benefits can be provided. When utilized properly, however, virtualization may actually end up improving security, alleviating any doubts in using the technology.
There are, of course, many ways to implement virtualization in an organization. Some of those ways include server virtualization, network virtualization, storage virtualization, and desktop virtualization. Many companies choose to use one of multiple methods to bring their businesses up to date with all the latest technology, but each type does present challenges when confronting security risks. That’s why there are security solutions for each virtualization strategy. It’s important to note that while virtualization can improve security, it’s does not have the capability to stop all attacks. Threats that appear on physical machines can still pop up from time to time on virtual machines. With that said, here are just a few ways virtualization types can minimize risks and improve security.
For server virtualization, it becomes even more necessary it provide adequate security. According to one report, more than 90% of records that are stolen by attackers come from servers, and it’s a number that’s only expected to rise over the coming years. Servers that are virtualized have a number of advantages to work with when it comes to security. For one thing, virtualized servers are able to identify and isolate applications that are compromised or unstable. This means that applications that may have been infected with malware are more likely to be identified and separated from the other applications to avoid the spreading of any malicious viruses or damaging elements. In addition to that, virtualized servers can also make it easier to create more cost-effective intrusion detection, protecting not just the server and the virtual machines themselves but the entire network. Virtualization with servers also allows easier monitoring by administrators. By deploying monitoring agents in one virtual location, administrators can more easily view traffic and deny access to suspicious users. Server virtualization also allows a master image of the server to be created, making it easy to determine if the server is acting abnormally or against set parameters.
Much of the security advantages that come from network virtualization are similar in nature to those found in server virtualization. One example of this isolation. With network virtualization, virtual networks are separated from others, which greatly minimizes the impact malware could have when infecting the system. The same philosophy applies when looking at another main feature of network virtualization–segmentation, where a virtual network is composed of multiple tiers. The entire network, and in turn each tier, can be protected through the distribution of firewalls. It makes for more effective security measures while employing consistent security models across all networks and software.
Though perhaps not as common as other forms of virtualization, desktop virtualization is still more than capable of making business more productive while still addressing security issues. IT departments are able to better secure virtualized desktops by controlling what users are able to do from a central location. Desktop virtualization also provides for customizing security settings and making changes to meet any new demands. In this way, not only are desktop computer more secure, it makes the IT departments’ job a lot easier.
Whether going the desktop, network, or server virtualization route, IT security will always be high on the list of priorities. While at first seen as a potential security liability, virtualization can now be seen as a security enhancement. In the capable hands of the right experts, businesses should be able to prepare the virtualized systems that allow any challenge from a security threat to be met with a rapid and decisive response, thereby keeping valuable company data safe.
With the increase of big data in industries across the world through Hadoop and Hadoop Hive, numerous changes in how big data is stored and analyzed have occurred. It used to be that Structured Query Language (SQL) was the main method companies used to handle data stored in relational database management systems (RDBMS). This technology was first introduced in the 1970’s and was extremely productive for it’s time. During it’s more than four decades, SQL has proven very efficient in managing structured, predictable data. Using columns and rows with pre selected schemas, an SQL database can then gather and process the data to make it usable and understandable to the end party. It’s proved very effective.
However, since 1970, the amount and types of information available has risen and changed dramatically. The prevalence of big data has drastically increased the amount of information available to companies and it’s changed what type of information is available. Much of the data available today is unstructured and unpredictable, which is very difficult for traditional SQL databases. These changes have put increasing pressure for a system capable of both gathering and analyzing huge amounts of unstructured and unpredictable data.
Not only is it difficult for SQL to process unstructured and unpredictable information, but it’s also more costly. Not only that, but it’s also more difficult to process very large batches of data. SQL isn’t very flexible and or scalable. NoSQL was developed to solve these difficulties and do what SQL couldn’t do. NoSQL is short for “Not Only Structured Query Language” and in the age of big data is making data gathering and processing much easier for companies and businesses.
There are numerous differences to the two. I’ll mention a few of the advantages NoSQL has over SQL here.
NoSQL doesn’t require schemas like SQL does meaning it can process information much quicker. With SQL, schemas (another word for categories)had to be predetermined before information was entered. That made dealing with unstructured information extremely difficult because companies never knew just what categories of information they would be dealing with. NoSQL doesn’t require schemas so it can handle unstructured information easier and much quicker. Also, NoSQL can handle and process data in real-time. Something SQL doesn’t do.
Another advantage to NoSQL computing is the scalability it provides. Unlike SQL, which tends to be very costly when trying to scale information and isn’t nearly as flexible, NoSQL makes scaling information a breeze. Not only is it cheaper and easier, but it also promotes increased data gathering. With SQL companies had to be very selective in the information they gathered and how much of it they gathered. That placed restrictions on growth and revenue possibilities. Because of NoSQL’s flexibility and scalability, it promotes data growth. That’s good for businesses and it’s good for the consumer.
NoSQL is also extremely valuable and important for cloud computing. One of the main reasons we’ve seen such a rise in big data’s prominence in the mainstream is because of cloud computing. Cloud computing has drastically reduced the startup costs of big data by eliminating the need of costly infrastructure. That has increased its availability to both big and small business. Cloud computing has also made the entire process of big data, from the gathering stages to analyzing and implementing, easier for companies. Much of the process is now taken care of and monitored by the service providers. The increased availability of big data means that companies can better serve the general public.
So while SQL still has a future and won’t be going away anytime soon, NoSQL is really the key to future success with big data and cloud computing. It’s flexibility, scalability and low cost make it a very attractive option. Additionally it’s ability to gather and analyze unstructured and unpredictable data quickly and efficiently mean it’s a great option for companies with those needs.
The third of the five biggest data myths debunked by Gartner is big data technology will eliminate the need for data integration. The truth is big data technology excels at data acquisition, not data integration.
This myth is rooted in what Gartner referred to as the schema on read approach used by big data technology to quickly acquire a variety of data from sources with multiple data formats.
This is best exemplified by the Hadoop Distributed File System (HDFS). Unlike the predefined, and therefore predictably structured, data formats required by relational databases, HDFS is schema-less. It just stores data files, and those data files can be in just about any format. Gartner explained that “many people believe this flexibility will enable end users to determine how to interpret any data asset on demand. It will also, they believe, provide data access tailored to individual users.”
While it was a great innovation to make data acquisition schema-less, more work has to be done to develop information because, as Gartner explained, “most information users rely significantly on schema on write scenarios in which data is described, content is prescribed, and there is agreement about the integrity of data and how it relates to the scenarios.”
It has always been true that whenever you acquire data in various formats, it has to be transformed into a common format before it can be further processed and put to use. After schema on read and before schema on write is the schema in between.
Data integration is the schema in between. It always has been. Big data technology has not changed this because, as I have previously blogged, data stored in HDFS is not automatically integrated. And it’s not just Hadoop. Data integration is not a natural by-product of any big data technology, which is one of the reasons why technology is only one aspect of a big data solution.
Just as it has always been, in between data acquisition and data usage there’s a lot that has to happen. Not just data integration, but data quality and data governance too. Big data technology doesn’t magically make any of these things happen. In fact, big data just makes us even more painfully aware there’s no magic behind data management’s curtain, just a lot of hard work.
TODAY: Fri, July 3, 2015July2015