Archive for the ‘Open Source’ Category
One of the most exciting features of the Internet is the ability to get the voice of the crowd almost instantly. Polling of our organisations and society that would have taken weeks in the past can be done in hours or even minutes. Ideas are tested in the court of public opinion in real time. This is potentially a huge boost for participation in democracy and the running of our businesses, or is it?
Our governments and businesses have long worked with a simple leadership model. We select our leaders through some sort of process and then give them the authority to act on our behalf. In the past, we asked our leaders to report back on a regular basis and, most of the time, we left them to it. In democracies we used elections to score the success or failure of our leaders. If they did well, we gave them another term. In business, we relied on a board selected by shareholders.
This really started to change with the advent of the 24 hour news cycle. Rather than curate 30 minutes of news once a day, the TV needed to find stories to fill all of the day. Unlike newsprint which had time for analyse, speed to air was a key performance metric of reporters and an initial, even if uninformed, sound bite was enough to get something to the public.
There is a popular movement to open-up government even further with regular electronic plebiscites and a default to open data. At its core is the desire to make the machinery of government transparent to all citizens. While transparency is good, it is the consequence of having “too many cooks in the kitchen” that leads to problems. Having everyone have their say, either through direct contributions or through endless polling means that the fundamental approach to decision making has to change. While fulltime politicians have the time to get underneath the complexity of a problem, the mass of voters don’t. The result is that complex arguments get lost in one or two sentence explanations.
This is happening at exactly the time that our global societies are becoming more complex and need sophisticated responses. Issues such as migration, debt and global taxation are too complex to be boiled down to a sound bite. It is telling that few have suggested turning our judiciary over to the court of public opinion!
H. L. Mencken, a well-known journalist from the first half of the 20th century who wrote extensively on society and democracy, once said “For every complex problem there is a solution that is concise, clear, simple, and wrong.” An overly crowd oriented approach to democracy results in these simple answers which are dumbing down our decision makers.
The danger doesn’t stop at our leaders, it also extends to the running of our organisations. We work more collaboratively than ever before. Technology is enabling us to source solutions from the crowd to almost any problem. This can work brilliantly for many problems such as getting a quick view on whether a brand message is going to resonate, or if a product would appeal to a particular demographic.
Where it can let us down is when we start trying to ask too many people to provide input to complex problems. Great design, sophisticated modelling and radical new product offerings don’t lend themselves well to having a large number of people collaborate to find the answer.
Collaboration and the use of the crowd needs to be targeted to the places where it works best. This is going to be more important than ever as more people move to the “gig economy”, the movement where they use platforms like 99designs, Expert360, Topcoder or 10EQS to manage their work. The most successful organisations are going to learn what questions and problems the crowd can solve for them.
Questions that require a simple, technical answer seem to suit this form of working well. Similarly, problems that can be solved with well-defined technical solutions are well suited to handing out to a group of strangers.
The crowd either completely rejects the status quo (the crowd as a protest movement), with little to offer in terms of alternative approaches or it slightly tweaks current solutions (the crowd without depth). Even the individual sourced through the crowd seems to be unlikely to rock the boat due to a lack of context for the problem they’re trying to solve.
The way we work and solve problems as a society and in our organisations is changing. The one thing we know for sure is that we don’t yet know how this will really work!
There’s nothing more punk-rock than the sort of DIY ethics currently fueling open-source communities. The general subversiveness combined with an apparent twice-a-week minimum black t-shirt rule among developers may make the open source scene look kind of like a cool-guy/girl clique, at least from an outsider’s perspective.
Everybody is rebelling against something, right?
In the cloud computing ecosystem the basic theme is rebellion against failure, according to whatever that means to whomever is considering the question. And within that question is the other major decision; whether the given needs call for an open-source, or proprietary architecture. So let’s take a closer look at what the major differences between those two models mean for businesses.
Charging for Software
Generally, open source models are free and won’t charge for the use of software. Proprietary models may offer free packages at first, but ultimately always end up costing the customer. Many updates to proprietary software are free, but significant upgrades and the ability to add new packages often comes with a fee. Charges can also come in the form of a per-user fee. Open source options are based more on the development of a community. They take direction from the demands of the market and tend to start with a small collection of developers and users. Successful projects are quickly picked up, while others are left to languish in obscurity.
Vendor lock-ins occur with proprietary software. This means that the website and software used with a proprietary vendor can’t be taken to another provider. It also limits the ability to use other providers with the knowledge to use a particular product. In contrast, open source products are more flexible and allow users to move between different systems freely. Open source cloud computing offers a greater range of compatibility between several different products. Typically, if a proprietary solution goes out of business the end-user is left with an unusable product. With open source projects, there is usually another project or fork that can take off where the old one left off.
Modifying System Code
Proprietary software doesn’t allow the manipulation of the source code. Even simple modifications to change styling or add features are not permitted with proprietary software. This can be beneficial for users who are happy with a set of features that is completely managed by one company. For those who like to tinker and adjust software to their needs, it may not be an ideal solution. Open source options allow for modifications and a company can even create an entire fork based off the existing software. When a feature doesn’t exist within an open source application, a developer can be hired to incorporate the feature into the product.
Licensing and Hosting Costs
Using proprietary software isn’t for the faint of heart or light of wallet. Licensing and hosting fees are often higher with proprietary software. By using open source options, users can avoid having to pay operating system costs, and per-product fees to use the software. This provides more flexibility to those who run open source platforms. A new software package or feature can be quickly added on to an existing installation without the need to purchase a license. Additionally, proprietary software requires the use of commercial databases, which further add to the total cost of operation.
Product documentation is often more involved and useful with open source software. The reason for this is the large communities that often follow and support open source projects. Help documentation for proprietary software is often only surface level. This is partially due to the service-based nature of proprietary software. It’s more profitable when consumers have to rely on the company for support and technical services. However, this can negatively impact business if an update goes wrong and technical support can’t immediately correct the issue. Open source applications come with substantial documentation that is typically updated with each product release and freely available online.
Security and Performance Considerations
When you have an entire community of developers poking and prodding at an application, you tend to have better security. Many of the features that are put into proprietary software are designed to keep the software from being modified. This adds bloat to the code and prevents the option for a light and lean product. Additionally, excess code leaves more room for security and stability flaws. With open source software, there are many more eyes looking at the code and fixes tend to come in much more quickly than with proprietary software. Stability and advanced threat defense tends to be tighter with open source applications, as long as users keep their software updated. Out of date applications are just as vulnerable to hacking and infiltration as proprietary systems.
Open source and proprietary cloud services both aim to provide end-users with reliable software. Some users prefer the backing of a large company like Amazon or Microsoft, with a tailored list of compatible programs and services. Others prefer the interoperability and flexibility of open source alternatives like OpenStack or Eucalyptus. It’s not necessarily an issue of right or wrong per se. It just depends what the user’s specific needs are. For some open source software is the obvious choice, while those who want more predictably managed solutions may find proprietary solutions the ideal choice.
Big Data requires million-dollar investments.
Nonsense. That notion is just plain wrong. Long gone are the days in which organizations need to purchase expensive hardware and software, hire consultants, and then three years later start to use it. Sure, you can still go on-premise, but for many companies cloud computing, open source tools like Hadoop, and SaaS have changed the game.
But let’s drill down a bit. How can an organization get going with Big Data quickly and inexpensively? The short answer is, of course, that it depends. But here are three trends and technologies driving the diverse state of Big Data adoption.
Crowdsourcing and Gamification
Consider Kaggle. Founded in April 2010 by Anthony Goldbloom and Jeremy Howard, the company seeks to make data science a sport, and an affordable one at that. Kaggle is equal parts rowdsourcing company, social network, wiki, gamification site, and job board (like Monster or Dice).
Kaggle is a mesmerizing amalgam of a company, one that in many ways defies business convention. Anyone can post a data project by selecting an industry, type (public or private), type of participation (team or individual), reward amount, and timetable.” Kaggle lets you easily put data scientists to work for you, and renting is much less expensive than buying them.
Open Source Applications
But that’s just one way to do Big Data in a relatively inexpensive manner–at least compared to building everything from scratch and hiring a slew of data scientists. As I wrote in Too Big to Ignore, digital advertising company Quantcast attacked Big Data in a very different way, forking the Hadoop file system. This required a much larger financial commitment than just running contest on Kaggle.
The common thread: Quantcast’s valuation is nowhere near that of Facebook, Twitter, et al. The company employs dozens of people–not thousands.
Finally, even large organizations with billion-dollar budgets can save a great deal of money on the Big Data front. Consider NASA, nowhere close to anyone’s definition of small. NASA embraces open innovation, running contests on Innocentive to find low-cost solutions to thorny data issues. NASA often prizes in the thousands of dollars, receiving suggestions and solutions from all over the globe.
I’ve said this many times. There’s no one “right” way to do Big Data. Budgets, current employee skills, timeframes, privacy and regulatory concerns, and other factors should drive an organization’s direction and choice of technologies.
What say you?
Studies consistently rank DBpedia as a crucial repository in the semantic web; its data is extracted from Wikipedia and then structured according to DBpedia’s own ontology. Available under Creative Commons and GNU licenses, the repository can be queried directly on the DBpedia site and it can be downloaded by the public for use within other semantic tool environments.
This is a truly AMAZING resource! The English version of the DBpedia knowledge base for instance now has over two billion ‘triples’ to describe 4,000,000+ topics — 20% are persons, 16% places, 5% organizations including 95,000 companies and educational institutions, plus creative works, species, diseases, and so on — with equally impressive statistics concerning their knowledge bases in over one hundred other languages. And the DBpedia Ontology itself has over 3,000,000 classes properties and instances. What a breath-taking undertaking in the public sphere!
Recently I had a wonderful opportunity to hear about DBpedia’s latest projects for their repository, here are the slides. DBpedia is now surely moving towards adoption of an important tool — Wikidata — in order to aggregate DBpedia’s 120 language-specific databases, into one single, multi-lingual repository.
Wikidata‘s own project requirements are interesting to the MIKE2 community as they parallel significant challenges common to most enterprises in areas of data provenance and data governance. Perhaps in response to various public criticisms about the contents of Wikipedia, Wikidata repositories support source citations for every “fact” the repository contains.
The Wikidata perspective is that it is a repository of “claims” as distinguished from “facts”. Say for example that an estimate of a country’s Gross National Product is recorded. This estimate is a claim will often change over time, and will often be confronted by counter-claims from different sources. What Wikidata does is to provide a data model that keeps track of all claimed values asserted about something, with the expectation this kind of detailed information will lead to mechanisms directly relevant to the level of “trust” that may be confidently associated with any particular ‘statement of fact’.
The importance of source citations is not restricted to the credibility of Wikipedia itself and its derivative repositories; rather this is a universal requirement common to all enterprises whether they be semantics-oriented or not. A simple proposition born of science — to distinguish one’s original creative and derived works from those learned from others — is now codified and freighted with intellectual property laws (copyrights, patents), subjects of complex international trade treaties.
Equally faced by most enterprises is a workforce located around the globe, each with varying strengths in the English language. By using a semantic repository deeply respectful of multilingual requirements — such as Wikidata is — enterprises can deploy ontologies and applications that improve worker productivity across-the-board, regardless of language.
Wikidata is a project funded by Wikipedia-Germany. Enterprises might consider helping to fund open-source projects of this nature, as these are most certainly investments whose value cannot be over-estimated.
Visit here for more Semantic MediaWiki conferences and slides. Ciao!
When it comes to building an online community, there’s no shortage of platforms to choose from. We all know them and (most of us) already have them:
The rise of free digital media has removed almost every barrier to entry for people to build the right stage for their content. The tools are there and the effort is minimal. A few clicks, a name, a logo and its done, right?
I often wonder though, are we too focused on building our own communities and not enough on participating in others? While the creation of a Facebook, G+, Twitter, LinkedIn, and all the other “must have” network profiles is relatively easy, the building and maintenance portion of the program surely is not. It can take months to build an audience from scratch and many of us don’t have that luxury of time. When the message is ready, wouldn’t you rather share it in a room with people already in it?
Sure, there are definite long-term benefits of creating your own community. Content moderation control, for instance. Controlling the dialogue. The guarantee of always being heard. I don’t advocate we step away from building these communities from scratch, but when time and resources are of the essence, why not focus more on the communities in your niche that already exist?
MIKE2.0 has a robust open source platform for data and information management professionals to gather and share best practices, experiences and seek advice on a number of topics that advocate better enterprise information management. We have a forum, a wiki, a blog, and a community of data experts to help answer your toughest data related questions. With currently 867 wiki articles and thousands of people signed-up to contribute their knowledge and experience, why not join the conversation?
When time and resources are limited, my take-home advice is don’t waste them building a community you can’t sustain. Spend your time reaching people who are already sitting in the room waiting to hear what you have to say. If you’re a data pro, then MIKE2.0 is your place.
We’ve just released the seventh episode of our Open MIKE Podcast series!
Episode 07: “Guiding Principles for the Open Semantic Enterprise” features key aspects of the following MIKE2.0 solution offerings:
Semantic Enterprise Guiding Principles: openmethodology.org/wiki/Guiding_Principles_for_the_Open_Semantic_Enterprise
Semantic Enterprise Composite Offering: openmethodology.org/wiki/Semantic_Enterprise_Composite_Offering
Semantic Enterprise Wiki Category: openmethodology.org/wiki/Category:Semantic_Enterprise
Check it out:
Open MIKE Podcast – Episode 07 from Jim Harris on Vimeo.
Want to get involved? Step up to the “MIKE”
We kindly invite any existing MIKE contributors to contact us if they’d like to contribute any audio or video segments for future episodes.
On Twitter? Contribute and follow the discussion via the #MIKEPodcast hashtag.
In “Can You Use Big Data? The Litmus Test“, Venkatesh Rao writes about the impact of Big Data on corporate strategy and structure. Rao quotes Alfred Chandler’s famous line, “structure follows strategy.” He goes on to claim that, “when the expressivity of a technology domain lags the creativity of the strategic thinking, strategy gets structurally constrained by technology.”
It’s an interesting article and, while reading it, I couldn’t help but think of some thought-provoking questions around implementing new technologies. That is, today’s post isn’t about Big Data per se. It’s about the different things to consider when deploying any new information management (IM) application.
Let’s first look at the converse of Rao’s claim? Specifically, doesn’t the opposite tend to happen (read: technology is constrained by strategy)? How many organizations do not embrace powerful technologies like Big Data because they don’t fit within their overal strategies?
For instance, Microsoft could have very easily embraced cloud computing much earlier than it did. Why did it drag its feet and allow other companies to beat it to the punch? Did it not have the financial means? Of course not. I would argue that this was all about strategy. Microsoft for years had monopolized the market the desktop and many on-premise applications like Windows and Office.
To that end, cloud computing represented a threat to Microsoft’s multi-billion dollar revenue stream. Along with open source software and the rise of mobility, in 2007 one could start to imagine a world in which Microsoft would be less relevant than it was in 2005. The move towards cloud computing would have happend with or without Microsoft’s blessing, and no doubt many within the company thought it wise to maximize revenues while it could. (This isn’t inherently good or bad. It just supports the notion that strategy constrains technology as well.)
The Control Factor
Next up, what about control? What role does control play in structure and strategy? How many organizations and their employees have historically resisted implementing new technologies because key players simply refused to relinquish control over key data, processes, and outcomes? In my experience, quite a few.
I think here about my days working on large ERP projects. In the mid-2000s, employee and vendor self-service became more feasible but many organizations hemmed and hawed. They chose not to deploy these time-saving and ultimately more efficient applications because internal resistance proved far too great to overcome. In the end, quite a few directors and middle managers did not want to cede control of “their” business processes and ownership of “their” data because it would make them less, well, essential.
Simon Says: It’s Never Just about One Thing
Strategy, culture, structure, and myriad other factors all play significant roles in any organization’s decision to deploy–or not to deploy–any given technology. In an ideal organization, all of these essential components support each other. That is, a solid strategy is buttressed by a healthy and change-tolerant structure and organizational culture. One without the other two is unlikely to result in the effective implementation of any technology, whether its mature or cutting-edge.
What say you?
If you’re in the technology or information management spaces, you’ve probably heard the axiom “Information wants to be free.” While accounts vary, many attribute the quote to Stewart Brand in the 1960s.
Today, information is becoming increasingly more prevalent and less expensive, spawning concepts such as Open Data, defined as:
the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. While not identical, open data has a similar ethos to those of other “Open” movements such as open source, open content, and open access. The philosophy behind open data has been long established (for example in the Mertonian tradition of science), but the term “open data” itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives such as Data.gov.
Again, you may already know this. But consider what companies such as Facebook are doing with their infrastructure and server specs–the very definition of traditionally closed apparata. The company has launched a project, named OpenCompute, that turns the proprietary data storage models of Amazon and Google on their heads. From the site:
We started a project at Facebook a little over a year ago with a pretty big goal: to build one of the most efficient computing infrastructures at the lowest possible cost.
We decided to honor our hacker roots and challenge convention by custom designing and building our software, servers and data centers from the ground up.
The result is a data center full of vanity free servers which is 38% more efficient and 24% less expensive to build and run than other state-of-the-art data centers.
But we didn’t want to keep it all for ourselves. Instead, we decided to collaborate with the entire industry and create the Open Compute Project, to share these technologies as they evolve.
In a word, wow.
Both of these projects reveal complex dynamics at play. On one hand, there’s no doubt that more, better, and quicker innovation results from open source endeavors. Crowdsourced projects benefit greatly from vibrant developer communities. In turn, these volunteer armies create fascinating extensions, complementary projects, and new directions for existing applications and services.
I could cite many examples, but perhaps the most interesting is WordPress. Its community is nothing less than amazing–and the number of themes, extensions, and plug-ins grows daily, if not hourly. The vast majority of its useful tools are free or nearly free. And development begets more development, creating a network effect. Millions of small businesses and solopreneurs make their livings via WordPress in one form or fashion.
On the other hand, there is such a thing as too open–and WordPress may be an equally apropos example here. Because the software is available to all of the world, it’s easier for hackers to launch Trojan horses, worms, DoS attacks, and malware aimed at popular WordPress sites. To be sure, determined hackers can bring down just about any site (WordPress or not), but when they have the keys to the castle, it’s not exactly hard for them to wreak havoc.
Does Facebook potentially gain by publishing the design of its data centers and servers for all to see? Of course. But the risks are substantial. I can’t tell you that those risks are or are not worth the rewards. I just don’t have access of all of the data.
But I certainly wouldn’t feel comfortable doing as much if I ran IT for a healthcare organization or an e-commerce company. Imagine a cauldron of hackers licking their lips at the trove of stealing highly personal and valuable information surely used for unsavory purposes.
When considering how open to be, look at the finances of your organization. Non-profits and startups might find that the squeeze of erring on the side of openness is worth the juice. For established companies with very sensitive data and a great deal to lose, however, it’s probably not wise to be too open.
What say you?
The most important thing about the evolution of cloud computing is the ability to setup a business quickly without the delays and cost associated with establishing dedicated infrastructure. Phil Simon has written a book, The New Small, which demonstrates that this approach is ready for prime time.
Simon goes further and argues that the very characteristics that lead people to use the cloud are also the characteristics that make businesses agile and successful. While his book provides case studies from what he calls “New Small” companies, the approach is just as applicable to groups within big organisations that seek to unburden themselves from big overheads and delays.
Simon is a fellow blogger on this site and we share many of the same philosophies with regards to the information technology industry. He is someone who “gets” the subtle issues that organisations face managing complex technology in an era of information overload.
The cost of setting up the systems for any new business or function within a larger organisation is continually going up, despite the drive downwards, by Moore’s law, of computers themselves. This increase in cost is due to the complexity inherent in the increasing amount of information and the processes that handle it. Anyone worried about these costs could do worse than to think in terms of “New Small” and ask whether there is another way to deploy nimble solutions.
Organizations want to find a way to bring the cost of technology down. There is a growing sense of frustration that it is too hard to make even small changes to the way a business is run without incurring huge expense. Many are arguing that cloud computing and software as a service (SaaS) are offering a way of achieving these sorts of gains.
Another way of looking at things is to consider the last decade as being one where very little actually changed in core computing models. The industry got a lot better at applying techniques developed in the 1990s. The next decade is unlikely to be so comfortable with much more radical approaches appearing, including the move from a web-based architecture to one that utilises small, but functionally rich, “apps”. It is very likely that the IT department of the near future will appear much like an “enterprise app store”!
All this has to happen in an environment where the most important resource available to any business is the data that it holds. Any new approach to implementing applications cannot swap system complexity for information fragmentation as this will put at risk regulatory obligations, shareholder value and potential future business opportunities.
In April of this year, I will be speaking at the annual Grow Your Company Conference. GrowCo is a three-day event created for business leaders who want to achieve sizable growth within their organization. I’ll be talking in part about some of the management principles in The New Small. I have plenty of time to figure out exactly what I’ll say, but I’ll be shocked if the subject of data management is not broached at some point.
So, in this post, I’ll describe some the data management best practices for those looking to grow their businesses.
Data management and quality issues are like termites: once you have them, it’s tough to completely eliminate them. What’s more, they’re going to get worse. Much worse. The best approach in combatting them is to be vigilant from the beginning.
For example, employees who aren’t exactly diligent about data entry need to be straightened out immediately. Sure, people make mistakes. No one needs to be crucified for making a legitimate mistake or not fully understanding something. I’m not talking about that.
I’m talking about people who oblivious to the consequences of their actions. Remember, decisions based on inaccurate or incomplete information are less likely to be the right decisions. No one needs reasons to distrust the data.
Also, remember that many growing organizations become willing–or unwilling–acquisition targets. Inaccurate financial, customer, or other information will impede that organization’s ability to make the right M&A move.
Bigger data is harder to manage
Companies with three or so folks may be able to sit down and make decisions based on gut feel, instinct, and simple common sense. When companies grow to, say, 30 employees, it’s reasonable to assume that the amount of data under the company’s control grows as well–although it certainly may not be tenfold. This increase in the amount of data (structured and otherwise) means that its management is no longer quite as simple, nor are decisions based on it.
Consider nascent trends that may be obscured in reports.
Old tools may have to go by the wayside
Because the data becomes harder to manage, it’s quite possible that time-tested tools may no longer do the trick. Many small businesses use Microsoft Excel or a simple Google Spreadsheet for basic customer relationship management (CRM). If the company adds concurrent users working remotely, it may be time for the organization to consider a tool like Zoho or SugarCRM, to name just a few of the powerful, web-based applications available these days.
These three tips hardly represent a comprehensive list of the things to consider as your organization grows. Excellent data management is no substitute for:
- solid people management practices
- a product or service that people actually want
- a sound business model
- regulatory requirements
- the company’s industry
- company politics
- profit margins
These challenges should not be understated, and truth be told, many lie outside of the purview of the organization itself. By the same token, however, organizations can control their own internal data management. Don’t make things harder by failing to take care of the things that you can control.
What say you?
TODAY: Sun, April 23, 2017April2017