Open Framework, Information Management Strategy & Collaborative Governance | Data & Social Methodology - MIKE2.0 Methodology
Wiki Home
Collapse Expand Close

Members
Collapse Expand Close

To join, please contact us.

Improve MIKE 2.0
Collapse Expand Close
Need somewhere to start? How about the most wanted pages; or the pages we know need more work; or even the stub that somebody else has started, but hasn't been able to finish. Or create a ticket for any issues you have found.

Graph databases

From MIKE2.0 Methodology

Share/Save/Bookmark
Jump to: navigation, search

Graph Databases: Emphasizing Relationships as Primary Data

Over in a corner of the NoSQL world, hidden among the key-value stores Couchbase, MongoDB, and the column store Cassandra, lies the Graph Databases.

Before relational databases, there were network databases, which are actually quite similar in concept to graph databases. SQL and the relational world came along and was clearly a better fit for the modern workload, which was largely oriented to working with numbers. Now, what matters has expanded and graph databases make a strong value proposition for their intended workload. That workload is highly connected data and includes navigating social networks, configurations and recommendations. With the high interest in those applications, it’s workload that is poised to expand tremendously.

For example, if DirecTV wants to know if I have the NFL Package (I do!), it looks for the existence of a connection between my record and the record for NFL Package in a graph database. An example with only nodes representing “friends” finds a large cluster of people connected to all of their friends (connected to their friends, etc.). This inability to definitively draw lines around which very little crosses (think of the Ashton Kutcher effect on the twitter graph) makes sharding difficult or impossible.

Instead of tables, graph databases store three attributes per value. Similar to a key-value store with its two columns, but with an additional column representing a node-relationship-property of a “graph.” Hence, the similarity to the triplestore but without the same language capabilities. Like nodes, relationships can have properties such as the age or relationship (public, private) of the relationship represented.

The structure does not accept SQL. For example, Cypher is the language used with Neo4j. It contains the commands necessary to get nodes, traverse nodes and return values. It’s simpler than SQL for traversing relationships to find values or the existence of values. Gremlin is another project for accessing graph databases. One very cool feature is to limit the “degrees” that are searched in a query.

Neo Technology, a Swedish company, is the commercial sponsor of Neo4j, a leading graph database. ACID-compatible Neo4j can hold up to 32 billion nodes, 32 billion relationships and 64 billion properties. Andreas Kollegger noted there were 1000 people participating in the community with thousands of databases deployed at customers of all sizes. Other graph databases include STIG from Tagged and AllegroGraph from Franz. Objectivity’s Infinite Graph is an object-oriented graph database.

“Graph” is probably not the most operable term for this field as it conjures up an image of something with fixed rows and columns and predictable connections (i.e., graph paper). The old term of network may be better, but regardless it’s a fit for complex big data when relationships matter most.

Wiki Contributors
Collapse Expand Close