A graph database  is a database, which allows to store data in the form of graphs and query it using graph traversal. One of the major advantages of graph databases  is the natural way in which data can be represented and that queries do not need joins, as the data is connected via edges. They also benefit from the rising popularity of NoSql  databases in general. Among the more well known graph databases are Neo4j  and OrientDb  , which will also be used in the example code of this blog post.
The mathematical background  of graph databases is actually very interesting and I recommend reading up some of the standard algorithms on graphs, e.g. Breadth-first search  , Depth-first search  , Shortest path  , Travelling salesman problem  , Prim’s algorithm  (which finds a Minimum spanning tree  ).
For a more developer oriented introduction I recommend this video  from the Neo4j guys (I especially found the example helpful, which starts after the first 18 minutes). Another resource, which also targets Neo4j users is O'Reilly's Graph Databases book, which can be downloaded for free here  .
Gremlin  is a domain specific language for traversing property graphs. It has implementations to access most graph databases, especially graph databases supporting the property-graph model  . Gremlin-Scala  , which is a thin Scala  wrapper for Gremlin, aims to make Gremlin available to Scala users. It is now maintained by Michael Pollmeier  and the more recent releases aim to make the interface feel more native to Scala developers. The current focus of his efforts is on a more fluent Scala Api and type safety. A good start to Gremlin and Gremlin-Scala are the Gremlin documentation  , the Gremlin wiki  , Gremlin steps  and the Gremlin-Scala  page on Github.
Gremlin is developed by Tinkerpop  , which offers a stack of frameworks and libraries intended to make the use of graph databases easier and unify the access to the various graph databases available. The bottom layer of those frameworks and libraries is Blueprints  . For example, developing in Gremlin-Scala means also making use of the Pipes framework  . For real world applications, Rexster  is a very interesting choice to abstract the low level database handling and leaving it to the framework. It is a graph server, which allows to use predefined REST  queries targeting the underlying graph database. The Gremlin-Scala page on Github has a short paragraph  on how to use the project together with Rexster.
In the following, I give an overview of the example code for this blog post. The complete example project  is available on Github. It uses Gremlin-scala 2.4.1 together with Scala 2.10.3. To run it, just start up Sbt  and run the tests (Warning: By default, it populates /tmp/neo4j).
The base class is an abstract class containing all the tests, which will be described in the following sections. It has an abstract method to obtain the Scalagraph, which is the access to the graph database. This abstract method is then implemented for each of the used graph databases. The base class also makes sure in the after method, that the graph database will be cleaned up and is shut down properly. Code snippet 1 shows the outline of the base class.
The implementations of the base class for each of the databases are quite simple, as they only need to override the getGraph method and make sure to run the tests. Thanks to the interface of Gremlin, this is very easy to achieve. Code snippet 2 shows the code for the Neo4j database, which will store the data on the disk (make sure to change the path to one suiting your machine) and Code snippet 3 shows the code for the OrientDB database, which will store the data in memory for this test.
The first test is very simple and just shows how to store primitives in a vertex. The code can be seen in Code snippet 4.
The second test, which has been copied over from the Gremlin-Scala test suite shows how to add an edge between two vertices and query for it afterwards. Code snippet 5 shows the code.
The last test is a bit more complex and gives a glimpse of how useful graph databases can be for certain scenarios. The example is inspired by an upcoming trip of mine, in which I am going to explore the Far North of New Zealand with friends. We will start from Auckland and our final goal is Cape Reinga, which is the most northern point of the northern island. In this example, I have added some cities between Auckland and Cape Reinga as nodes of the graph. The roads between those cities are modelled as weighted edges containing the information of the distance and the time how long the drive takes. The code for the setup can be seen in Code snippet 6.
The goal is now, to find the shortest path  between Auckland and Cape Reinga. To do this, the loop function of Gremlin-Scala is being used. In the following code seen in Code snippet 7, a loop function is defined, which takes a startstep and repeats the steps, until it finds the Cape Reinga vertex or has looped for 7 times (first function inside the loop call). It keeps all paths, in which the last step ends on the vertex Cape Reinga (second function inside the loop call). This function is then used to go from the vertex Auckland to the connected vertices and loop over this step. The result are 29 different variants showing on how to get from Auckland to Cape Reinga, jumping from vertex to vertex, with a maximum of 7 steps.
But that is not enough yet, as the goal was not just to find all the paths, but to find the shortest path. For this, the edges need to be added to the hops between the nodes, as seen in Code snippet 8. After adding the edges and with them the information on the travel time and the distance, the shortest path can be found. To facilitate this, I am parsing the path into a tuple, containing its total time to travel (using the edges in the path) and the visited vertices in the path. Now the shortest path is just the one with the smallest travel time. For visualisation purposes, I have added a println statement of the ordered paths with their travel time, which can be seen in Code snippet 9.