Understanding graph databases: How they work and examples in action

A graph database, a type of NoSQL database, owes its origins to graph theory, where interconnected objects are represented using vertices/nodes and edges.

Graph databases consider relationships between data entities as first-class citizens. As such, a graph database does more than store and show what data you have; it explicitly links the entities in your data to let you see at a glance how those entities are related.

In contrast to relational databases, where data is stored in rigid tables, rows, and columns, data in a graph database is stored as a network of entities (nodes) and relationships (edges).

This post explores the different types of graph databases, how they work using real-world examples, and how to solve challenges associated with them.

Open-source graph databases

Among the numerous open-source graph databases available are three top options we will focus on here: Neo4j, ArangoDB, and OrientDB.

Neo4j

Neo4j is a versatile database with the ability to traverse billions of relationships in seconds. It supports the declarative and human-readable Cypher query language, allowing users to run complex queries more easily and traverse graphs faster to uncover hidden entity relationships.

Neo4j is also ACID-compliant, ensuring data relationships revealed are accurate and consistent.

ArangoDB

This multi-model database boasts some 14,000 GitHub stars. While it features support for key-value, document, and graph dta models, all models can be queried using a single unified query language, the ArangoDB Query Language (AQL).

Given its flexibility, ArangoDB performs superbly in complex applications that demand multiple data representations and querying formats. It also supports ACID transactions and low-latency querying.

OrientDB

Also a multi-model database, OrientDB is known for being high-performance and having rich community support. It offers a distributed graph database featuring SQL and ACID support, fast and flexible querying, and large-scale data storage.

OrientDB users don’t require resource-intensive runtime JOINs, due to the database’s graph relationship and document link capabilities, enabling cost savings.

Graph databases in the cloud

Let’s take a quick look at the graph database offerings from some major cloud providers, specifically Amazon Web Services (AWS) and Microsoft Azure.

AWS graph database (Amazon Neptune)

AWS Neptune supports multiple graph models, such as property graph and RDF. It also integrates with other AWS services, which makes it a strong choice for knowledge graphs, as well as fraud detection and social networking.

Azure graph database (Azure Cosmos DB for Apache Gremlin)

Azure Cosmos DB provides graph capabilities using the Gremlin API. It also supports various other NoSQL databases and is known for its high availability, millisecond latency, and global distribution. These attributes make it an ideal choice for any application that needs multi-model support, fast querying, and seamless scalability.

Features and benefits of cloud-based graph databases

Some unique features and benefits cloud-based graph databases have to offer include:

  • Scalability: The cloud’s elastic nature extends to graph databases; enterprises can effortlessly scale up or down as workload demands change.
  • Managed infrastructure: Unlike self-hosted graph databases, they have much lower operational complexity, as providers handle maintenance, updates, and hardware management.
  • Security and disaster recovery: Cloud features such as encryption, identity and access management, and automatic data backup are baked into cloud-based graph databases, ensuring secure data storage/access and facilitating seamless recovery.
  • Integration with cloud ecosystem: Cloud-based graph databases natively integrate with other cloud services, e.g., relational databases and compute services, enhancing usability.
  • Cross-region replication: Data is replicated across multiple regions. For global organizations, the perks of this are simply limitless: effortless global data access, minimal latency, seamless regional failover, and more.

Getting started with cloud graph databases

The fastest route to set up and work with cloud-based graph databases entails three steps:

  • Step 1: Account setup and access: Create your AWS or Azure accounts via the management console, then install an AWS CLI or Azure CLI. If using a Docker container or Gremlin Server, pull the relevant image from Docker to get started.
  • Step 2: Provisioning and configuration: Provision a compute or container instance (EC2 or ECS for Amazon) to host the application. Configure networking so that data can flow in and out via the appropriate ports.
  • Step 3: Basic querying and integration: Manage your Graph database using the management console or CLI. Connect to Neptune or Cosmos as a proxy and query the data using Gremlin, OpenCypher, or another compatible query language.

Note: Visit AWS or Azure for more detailed tutorials.

GraphDB: A quick overview

GraphDB is a powerful and versatile graph database with open-source and enterprise-grade versions. It supports RDF and SPARQL standards, and offers a range of features, including large-scale semantic inferencing, real-time data synchronization with Kafka, and rich integration with high-performance search engines like Lucene and OpenSearch.

GraphDB is available as a cloud deployment via the AWS or Azure marketplace. Its popularity stems from its compatibility with industry standards and strong support at both the community and commercial levels.

Key GraphDB Features and Use Cases

Let’s explore some GraphDB offerings and their applications:

  • High-performance query engine: GraphDB’s query optimizer ensures efficient execution of complex queries, making it ideal for large organizations with huge and diverse data.
  • Advanced reasoning capabilities: The Triple Reasoning and Rule Entailment Engine (TRREE) automates sophisticated reasoning over RDF data, supporting rule-based inference and entailment. This feature has powerful applications in:
    • Finance, where traversing customer transactions and locations can reveal fraudulent patterns
    • Healthcare, where mapping patients’ histories to administered drugs or vaccines can prevent potentially fatal drug prescriptions
  • Robust storage and indexing: GraphDB employs efficient storage strategies, including entity pooling and page caching, to optimize performance and minimize storage overhead.
  • User-friendly web interface: Workbench provides a web-based interface for managing and administering GraphDB instances, requiring limited technical expertise.

Graph database models

The two primary graph database models are the Resource Description Framework (RDF) and property graphs.

RDF graphs

An RDF graph represents data using triple stores: subject-predicate-object. However, as it wasn’t exactly designed for querying knowledge graphs, RDF falls short where data has a complicated many-to-many rather than a direct subject-to-object relationship. Graph databases based only on RDF are mostly queried using SPARQL.

Property graphs

Designed specifically for graph databases, the property graph optimizes data storage, query execution, and query speeds. It consists of three elements:

  • Nodes: Entities or objects usually represented via nouns in a circle
  • Edges: Relationships between nodes represented by an arrow
  • Properties: Attributes associated with nodes, such as age, location, etc.
Nodes and edges in a graph database depicted Fig. 1: Nodes and edges in a graph database depicted (Source: TerminusDB Community)

Unlike RDF graphs, property graphs support myriad languages, including Graph SQL and Graph Query Language (GQL).

Note: GQL isn’t to be confused with GraphQL, an API query language unrelated to graph databases.

GraphSQL vs. GQL and GraphQL

Although Graph SQL and GQL are both powerful, they are distinctly different:

  • Graph SQL enables querying graph databases in SQL-like syntax, facilitating an easy transition for users better acquainted with relational databases, and seamless migration from relational to SQL databases or vice versa.
  • GQL, developed by ISO/IEC 39075:2024, is an emerging framework aimed at standardizing the multiple graph database querying languages available—something similar to what’s been done with SQL to improve cross-platform compatibility and interoperability.

GraphSQL and GraphQL are also distinct technologies serving different purposes:

  • GraphQL is a query language focused on API communication that allows clients to request specific data structures from a server via a single endpoint.
  • GraphSQL refers to SQL-like extensions or tools designed for querying graph databases (e.g., Apache AGE) using familiar SQL syntax to traverse nodes and edges in graph-structured data.

Both address data efficiency but in separate domains: APls vs. graph databases.

Visualization with database graphics

Data stored in a graph database can be visually represented in a chart or diagram. This visualization can be enhanced by varying the colors and fonts of nodes and edges to make traversing the graph easier.

Visualizing graph data can help users:

  • Identify patterns, trends, and anomalies
  • Understand how different entities are interconnected
  • Make more informed decisions
  • Explore and modify stored data
  • Create high-quality visualizations for end-user applications

Graph database use cases

Given their ability to model interconnected data, graph databases are uniquely suited to use cases where unearthing implicit relationships is critical and would be otherwise impossible—or financially impractical.

Social network analysis

Social media platforms leverage graph databases to represent friends, followers, and connections. Companies such as LinkedIn and Facebook also use them for targeted ads and “People You May Know” suggestions.

Social network relationships in a graph database Fig. 2: Social network relationships in a graph database (Source: The Andela Way)

Fraud detection

Financial, investigative, and e-commerce institutions deploy graph databases to examine the links between individual account holders and various transactions. This enables them to uncover fraudulent activities. ICIJ used Neo4j’s graph database to find the millions of hidden connections in the landmark Panama Papers investigation, recognized as the world's largest financial leak story.

Mapping fraudulent transaction pathways with graph databases (Source: Subhashish Bose on Medium) Fig. 3: Mapping fraudulent transaction pathways with graph databases (Source: Subhashish Bose on Medium)

Recommendation systems

E-commerce websites use graph databases to map information collected about you, such as your purchase patterns, occupation, and age. This allows them to then offer personalized recommendations when you open their webpages or apps.

Making recommendations to customers using a graph database (Source: Analytics Vidhya) Fig. 4: Making recommendations to customers using a graph database (Source: Analytics Vidhya)

Pros and cons of graph databases

Graph databases come with both benefits and specific concerns.

Advantages of graph databases

There are a few key reasons to use graph databases, especially when compared with relational databases:

  • Efficient relationship handling: They are faster, cheaper, and more efficient for relationship-based queries compared to relational databases, where complex and expensive JOINS are required to traverse interconnected data.
  • Flexible schema: Graph databases’ schema-less or schema-flexible design allows for easy modifications, e.g., adding a “Someone-You-Know-Also-Bought-This-Product” schema in a recommendation system graph.
  • Less downtime: They can adapt to rapidly shifting data structures without downtime; in contrast, relational databases often demand new schemas and downtime to accommodate changes.
  • Natural representation of networks: Graph databases are highly intuitive for representing networks, social graphs, and all interconnected data, making them easy to understand and use.
  • Advanced analytics capabilities: Built-in algorithms, e.g., shortest path and centrality, support deeper insights and advanced queries; this is ideal for fraud detection, social network analysis, and supply chain optimization.

Challenges of graph databases

The disadvantages of implementing these databases boil down to three things:

  • Complexity in scaling: While many graph databases scale horizontally, large and highly connected graphs can become difficult to scale efficiently, especially in distributed setups.
  • Lack of standardization: There is no universal query language for graph databases, such as SQL for relational databases; the use of different query languages (e.g., Cypher, Gremlin, SPARQL) can lead to compatibility issues and a steep learning curve.
  • Limited tooling and ecosystem: Compared to relational databases, the graph database ecosystem has fewer tools for ETL, monitoring, and third-party integrations, limiting operational efficiency.

The future of graph databases

The graph database landscape is changing, with new trends emerging to solve existing challenges. From all indications, graph databases are becoming a bigger part of next-generation data solutions. For example, they increasingly support machine learning algorithms, particularly in areas like predictive analytics, where relationship data enhances model accuracy.

ISO’s attempt at standardizing GQL is another interesting innovation, as well as efforts to improve the horizontal scalability and performance of graph databases; this latter would make it feasible to manage increasingly large, distributed, and complex datasets.

Advances in streaming and real-time graph processing are equally evolving, allowing organizations to derive insights from live, dynamic data for applications such as fraud detection, GenAI use cases, recommendation engines, and personalized content delivery.

Conclusion

Graph databases are ideal for modeling and querying complex relationships, making them a valuable asset in sectors like finance, AI, e-commerce, and social media. From powering fraud detection to driving personalized recommendations, they reveal connections that traditional databases often miss.

Site24x7’s database monitoring, you get end-to-end visibility into performance, query execution, resource usage, and more, whether your graph database is self-hosted or cloud-based.

As graph databases evolve with better scalability and machine learning integration, having the right monitoring in place will be key to maximizing their potential.

Was this article helpful?

Related Articles