Skip to main content
  1. Database Landing Page/

What is an Embedded Database? A brief introduction to RocksDB

·3 mins· ·
Blog En Database
Liu Zhe You
Author
Liu Zhe You
Skilled in full-stack development and DevOps, currently focusing on Backend.
Table of Contents

The cover is from the official rocksdb documentation.

The blog is talking about Embedded Database, not Embedding Database ( Vector Database ).

I came across the term Embedded Database recently, and I found them quite interesting. So I decided to write a blog post to share what I learned about them.

What is an Embedded Database?
#

An embedded database is a DBMS that is tightly integrated into an application, running within the same process. Unlike traditional client-server databases, embedded databases do not require a separate server to manage queries and transactions. This makes them highly efficient for applications that need fast, local data storage without the overhead of network communication.

Application with Traditional Database

Application with Embedded Database

The above images is referenced from rocksdb-lab by @erhwenkuo

Introduction to RocksDB
#

One of the most popular embedded databases today is RocksDB, an open-source, high-performance key-value store developed by Facebook. RocksDB is based on LevelDB but has been optimized for better write performance, lower latency, and high scalability.

Key Features of RocksDB
#

  • Log-Structured Merge (LSM) Trees: RocksDB uses LSM trees to optimize write operations and enable fast data ingestion.
  • Optimized for SSDs: It takes advantage of modern SSD storage for efficient read/write operations.
  • Configurable Compression: Supports multiple compression algorithms like Zstandard and Snappy to optimize storage.
  • High Concurrency: Supports multiple read and write operations simultaneously without performance degradation.
  • Point Lookups and Range Queries: Allows both individual key-value retrieval and efficient range scans.

Fault Tolerance in RocksDB
#

RocksDB is designed to be resilient to failures and provides several features to ensure data durability and recovery:

  • Write-Ahead Logging (WAL): Ensures data integrity by logging changes before they are committed.
  • Snapshots: Enables point-in-time recovery by capturing the state of the database at a specific moment.
  • Automatic Recovery: Upon restart, RocksDB can detect inconsistencies and recover data from the WAL.
  • Replication: RocksDB can be configured for replication to provide higher availability in distributed systems.

Where is RocksDB Used?
#

RocksDB is widely used in applications that require high-speed data processing and efficient storage solutions, most of them take RocksDB as a storage engine:

  • Real-time streaming: Integrated into systems like Apache Kafka Streams for state management.
  • Real-time stateful stream processing: Used in systems like Apache Flink for stateful stream processing.
  • Distributed object storage: Used in systems like Apache Ozone for metadata storage in distributed object stores.

RocksDB Users and Their Use Cases

Example: Using RocksDB in Python
#

Let’s look at a simple example of using RocksDB in Python to store and retrieve key-value pairs.

Note:
For RocksDB itself, it provides C++ and Java APIs, but for Python, you can use the rocksdict package, which provides a Pythonic interface to RocksDB.

Installation
#

You can install Python bindings using uv:

uv pip install rocksdict

RocksDict

Sample Code
#

import rocksdict

# Open a RocksDB database
db = rocksdict.Rdict("./testdb")

# Insert data
db[b"name"] = b"Alice"
db[b"age"] = b"25"

# Retrieve data
print(db[b"name"].decode())  # Output: Alice
print(db[b"age"].decode())   # Output: 25

# Close the database
db.close()

More Operations with RocksDB
#

RocksDB provides a rich set of operations for managing data efficiently:

  • Snapshotting: Create a snapshot of the database to capture its state at a specific moment.
  • Batch Writes: Perform multiple write operations atomically for better performance.
  • Iterators: Traverse the key-value pairs in the database efficiently.
  • Transactions: Ensure atomicity and consistency for multiple read and write operations.
  • Column Families: Organize data into multiple column families for better data management.
  • Statistics: Monitor the performance of the database using built-in statistics.
  • Observability: Monitor the database using tools like Prometheus and Grafana for better insights.

For the above operations, you can refer to the rocksdb-lab for more details with hands-on examples in Java.

References
#

Related

Redis Persistence: RDB and AOF
·5 mins
Blog En Database Redis
Detailed explanation of Redis persistence settings: RDB and AOF. Pros and cons of RDB and AOF. Detailed implementation with Redis source code.
PgBouncer: Lightweight Postgres Connection Pool
·2 mins
Blog Database En Postgresql
Solving Django backend DB connection overload with PgBouncer
PGMQ(PostgreSQL Message Queue) Setup
·2 mins
Blog En Backend Postgresql
PGMQ is a lightweight message queue. Like AWS SQS and RSMQ but on Postgres. The article is about how to setup PGMQ with Docker Compose to connect with official Python client
Cloudflare Tunnel
·3 mins
Blog En
Setup Cloudflare Tunnel for NAT, an alternative to Ngrok
Life Is Short Use 'uv'
·1 min
Blog En Python
How to use uv to manage Python packages and projects
2024 Appier Summer Internship Reflection
·12 mins
Blog En Intern
Reflection on my backend summer internship in the Data Platform department at Appier