Posts Tagged ‘SQL’
If your company has complicated, large sets of data that it’s looking to analyze, and that data isn’t simple, structured or predictable data then SQL is not going to meet your needs. While SQL specializes in many things, large amounts of unstructured data is not one of those areas. There are other methods for gathering and analyzing your data that will be much more effective and efficient and probably cost you less too.
NoSQL, which stands for “Not Only Standard Query Language,” is what your company needs. It’s perfect for dealing with huge data sets of information that aren’t always going to be structured — meaning every batch that’s gathered and analyzed could potentially be different. It’s a relatively new process, but it’s becoming increasingly important in the big data world because of its immense capabilities.
NoSQL is known for it’s flexibility, scalability and relatively low cost, all of which SQL doesn’t offer. SQL relies on traditional column and row processing with predefined schemas. When you’re dealing with unstructured information, it’s nearly impossible to work with schemas, because there’s no way to know what type of information you’ll be getting each time. Also, with SQL it can be costly to scale data and the actual processing can be very slow. NoSQL on the other hand, makes the process cheaper, and it can also do real-time analyzing with extremely large data sets. It gives companies the flexibility to gather as much or as little information as it needs to be successful.
NoSQL really takes information scalability to the next level. In the past, companies have too often been hindered in their data gathering ability because of the limited capabilities of then current technologies to handle large sets of information. Part of NoSQL’s strength comes not only from it’s ability to scale information much better than SQL, but also because it works so well in the cloud. With cloud services, companies have nearly unlimited scalability. Many NoSQL databases also offer integration with other big data tools, such as Apache’s Spark as a Service.
Along the same lines as scalability, companies can now gather more information than was ever possible with SQL. That ability means companies can be much more flexible in the information that they gather and process. In the past, companies also had to be selective of the type and amount of information gathered for fear of reaching storage and processing limits. NoSQL eliminates that difficulty. It’s important that companies have more information, not for information’s sake, but to benefit the consumer. Data sets that companies were previously hindered from utilizing can now be of great benefit.
NoSQL makes scaling and gathering information much more cost efficient than SQL for three reasons. First, it’s increased ability to process large amounts of information quickly and effectively mean companies can make changes faster which positively affects the bottom line. Second, it’s much cheaper to scale information with NoSQL than it is with SQL, which also saves money. Last, because of NoSQL and big data in the cloud, many companies now have access to big data without having to pay the enormous startup costs that come with an onsite infrastructure — something that used to be a necessity for big data.
Not only does NoSQL offer increased scalability and flexibility at a lower price, but it also provides increased analytical speed. Companies can gather and analyze large data sets, not just quickly, but in real-time. No longer are companies forced to wait hours, days or even weeks to get the information back. The uses for this type of quick analysis are nearly endless. It’s especially important in today’s social media-filled world. Companies can instantly see what is being said about them, how their products are being received, what competitors are doing and what consumers want to see in the future.
NoSQL isn’t perfect and still lacks some of the capabilities of SQL, but it has many strengths that make it a great option for companies looking to gather and analyze large data sets that are unstructured and unpredictable.
I love Google and in a pretty unhealthy way. In my third book, The New Small, there are oodles references to Google products and services. I use Google on a daily basis for all sorts of different things, including e-mail, document sharing, phone calls, calendars, and Hangouts.
And one more little thing: search. I can’t imagine ever “Binging” something and, at least in the U.S., most people don’t either.
Yet, there are limitations to Google and, in this post, I am going to discuss one of the main ones.
A few years ago, I worked on a project doing some data migration. I supported one line of business (LOB) for my client while another consultant (let’s call him Mark) supported a separate LOB. Mark and I worked primarily with Microsoft Access. The organization ultimately wanted to move toward an enterprise-grade database, in all likelihood SQL Server.
Relying Too Much on Google
Mark was a nice guy. At the risk of being immodest, though, his Access and data chops weren’t quite on my level. He’d sometimes ask me questions about how to do some relatively basic things, such as removing duplicates. (Answer: SELECT DISTINCT.) When he had more difficult questions, I would look at his queries and see things that just didn’t make a whole lot of sense. For example, he’d try to write one massive query that did everything, rather than breaking them up into individual parts.
Now, I am very aware that development methodologies vary and there’s no “right” one. Potato/pot-ah-to, right? Also, I didn’t mind helping Mark–not at all. I’ll happily share knowledge, especially when I’m not pressed with something urgent.
Mark did worry me, though, when I asked him if he knew SQL Server better than MS Access. “No,” he replied. “I’ll just Google whatever I need.”
For doing research and looking up individual facts, Google rocks. Finding examples of formulas or SQL statements isn’t terribly difficult either. But one does not learn to use a robust tool like SQL Server or even Access by merely using a search engine. You don’t design an enterprise system via Google search results. You don’t build a data model, one search at a time. These things require a much more profound understanding of the process.
In other words, there’s just no replacement for reading books, playing with applications, taking courses, understanding higher-level concepts, rather than just workarounds, and overall experience.
You don’t figure out how to play golf while on the course. You go to the practice range. I’d hate to go to a foreign country without being able to speak the language–or accompanied by someone who can. Yes, I could order dinner with a dictionary, but what if a doctor asked me in Italian where the pain was coming from?
What say you?
Over the course of my career, I have written more reports than I can count. I’ve created myriad dashboards, databases, SQL queries, ETL tools, neat Microsoft Excel VBA, scripts, routines, and other ways to pull and massage data.
In a way, I am Big Data.
This doesn’t make me special. It just makes me a seasoned data-management professional. If you’re reading this post, odds are that the list above resonates with you.
Three Problems with Creating Excessive Reports
As an experienced report writer, it’s not terribly hard to pull data from databases table, external sources, and the web. There’s no shortage of forums, bulletin boards, wikis, websites, and communities devoted to the most esoteric of data- and report-related concerns. Google is a wonderful thing.
I’ve made a great deal of money in my career by doing as I was told. That is, a client would need me to create ten reports and I would dutifully create them. Sometimes, though, I would sense that ten weren’t really needed. I would then ask if any reports could be combined. What if I could build only six or eight reports to give that client the same information? What if I could write a single report with multiple output options?
There are three main problems with creating an excessive number of discrete reports. First, it encourages a rigid mode of thinking, as in: “I’ll only see it if it’s on the XYZ report.” For instance, Betty in Accounts Receivable runs an past due report to find vendors who are more than 60 days late with their payments. While this report may be helpful, it will fail to include any data that does not meet predefined criterion. Perhaps her employer is particularly concerned about invoices from particularly shady vendors only 30 days past due.
Second, there’s usually a great deal of overlap. Organizations with hundreds of standard reports typically use multiple versions of the same report. If you ran a “metareport”, I’d bet that some duplicates would appear. In and of itself, this isn’t a huge problem. But often database changes means effectively modifying the same multiple times.
Third, and most important these days, the reliance upon standard reports inhibits data discovery.
Look, standard reports aren’t going anywhere. Simple lists and financial statements are invaluable for millions of organizations.
At the same time, though, one massive report for everything is less than ideal. Ditto for a “master” set of reports. These days, true data discovery tools like Tableau increase the odds of finding needles in haystacks.
Why not add interactivity to basic reports to allows non-technical personnel to do more with the same tools?
What say you?
“I saw the angel in the marble and carved until I set him free.”
The era of Big Data has arrived, yet relatively few organizations seem to recognize it. Platitudes from CXOs are all fine and dandy, but how many have invested in Hadoop or hired a data scientist? Not too many, in my view. (See “Much Hadoop About Nothing.“)
Brass tacks: The hype around Big Data today is much greater than the reality–and it probably will be for some time.
This is unfortunate, as many organizations already have within their walls very valuable data that could be turned into information and knowledge with the right tools. Because of their unwillingness to adopt more contemporary Big Data and dataviz applications, though, that knowledge effectively hides in plain sight. The ROI question still paralyzes many CXOs afraid to jump into the abyss.
I know something about the notion of hiding in plain sight. It is one of the major themes of my favorite TV show, Breaking Bad.
To some extent, I understand the reluctance surrounding Hadoop. After all, it’s a fundamentally different way of thinking about data, modeling, and schema. Most IT professionals are used to thinking about data in orderly and relational terms, with tables and JOIN statements. Those will the skills to work with this type of data are in short supply, at least for the time being. “Growing” data scientists and a new breed of IT professionals doesn’t happen overnight. The same thing happens with lawyers and doctors.
Overcoming stasis isn’t easy, especially in budget-conscious, risk-averse organizations. To that end, here are a few tips on getting started with Big Data:
- Don’t try to boil the ocean. Small wins can be huge, to paraphrase from the excellent book The Power of Habit: Why We Do What We Do in Life and Business.
- Communicate successes. Getting people to come to you is much easier than forcing them. The carrot is more effective than the stick.
- Under-promise and over-deliver.
Compared to a year ago, I have seen progress with respect to Big Data adoption. Increasingly, intelligent people and companies are doing more with new forms of data—and getting more out of it. As a result, data visualization has become a big deal. To paraphrase Michelangelo, they are starting to set the data free.
What say you?
TODAY: Tue, March 28, 2017March2017