Redis Persistence: RDB
and AOF
#
This information might be wrong if you are using a different version.
Why Redis needs persistence?#
Because Redis is an in-memory database So when the Redis service restarts, all data will disappear
However, you can use persistence settings Let Redis recover its state when it restarts !
Redis persistence settings#
Redis provides two persistence settings They are RDB and AOF
- RDB: Redis DatabaseRDB persistence performs point-in-time snapshots of your dataset at specified intervals.
- AOF: Append Only FileAOF persistence logs every write operation received by the server. These operations can then be replayed again at server startup, reconstructing the original dataset.
- RDB+AOF: RDB and AOFYou can enable both
RDB
andAOF
persistence to provide a more robust data protection mechanism.
RDB: Redis Database#
In short,RDB
needs to fork()
a child process to save the point-in-time snapshot of your dataset to disk.
Advantages of RDB
#
RDB
is a very compact single-file point-in-time representation of your Redis data.
- Space-efficient: RDB is more space-efficient than AOF
- Easier to Backup: RDB is easier to backup and restore
since it is a single file, you can just copy it to any storage such as S3, Google Cloud Storage, etc.
- Faster restart with large datasets:
RDB
is faster to recover when you have a large dataset compared toAOF
Redis have utilized CoW to optimize the
fork()
process
Disadvantages of RDB
#
- Data loss:
RDB
is not good if you need to minimize data lossbecause it only saves the dataset at specified intervals
- Not suitable for large datasets:
RDB
is not suitable for large datasetsSince it needs to
fork()
a child process to save the dataset to diskfork()
can be slow if you have a large dataset
may stop serving clients for some milliseconds or even seconds !
Detailed Implementation of RDB
#
We have mentioned that RDB
needs to fork()
a child process to save the dataset to disk.
Let’s take a look at the implementation of RDB
in Redis source code.
From
redis/src/rdb.c
: definebgsaveCommand
command entry point function.3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942
/* BGSAVE [SCHEDULE] */ void bgsaveCommand(client *c) { int schedule = 0; /* The SCHEDULE option changes the behavior of BGSAVE when an AOF rewrite * is in progress. Instead of returning an error a BGSAVE gets scheduled. */ if (c->argc > 1) { if (c->argc == 2 && !strcasecmp(c->argv[1]->ptr,"schedule")) { schedule = 1; } else { addReplyErrorObject(c,shared.syntaxerr); return; } } rdbSaveInfo rsi, *rsiptr; rsiptr = rdbPopulateSaveInfo(&rsi); if (server.child_type == CHILD_TYPE_RDB) { addReplyError(c,"Background save already in progress"); } else if (hasActiveChildProcess() || server.in_exec) { if (schedule || server.in_exec) { server.rdb_bgsave_scheduled = 1; addReplyStatus(c,"Background saving scheduled"); } else { addReplyError(c, "Another child process is active (AOF?): can't BGSAVE right now. " "Use BGSAVE SCHEDULE in order to schedule a BGSAVE whenever " "possible."); } } else if (rdbSaveBackground(SLAVE_REQ_NONE,server.rdb_filename,rsiptr,RDBFLAGS_NONE) == C_OK) { addReplyStatus(c,"Background saving started"); } else { addReplyErrorObject(c,shared.err); } }
And
bgsaveCommand
actually callsrdbSaveBackground
.rdbSaveBackground
will check if there is already a child process running, if not, it will callsredisFork
to fork a child process to save the dataset to disk.1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646
int rdbSaveBackground(int req, char *filename, rdbSaveInfo *rsi, int rdbflags) { pid_t childpid; if (hasActiveChildProcess()) return C_ERR; server.stat_rdb_saves++; server.dirty_before_bgsave = server.dirty; server.lastbgsave_try = time(NULL); if ((childpid = redisFork(CHILD_TYPE_RDB)) == 0) { int retval; /* Child */ redisSetProcTitle("redis-rdb-bgsave"); redisSetCpuAffinity(server.bgsave_cpulist); retval = rdbSave(req, filename,rsi,rdbflags); if (retval == C_OK) { sendChildCowInfo(CHILD_INFO_TYPE_RDB_COW_SIZE, "RDB"); } exitFromChild((retval == C_OK) ? 0 : 1); } else { /* Parent */ if (childpid == -1) { server.lastbgsave_status = C_ERR; serverLog(LL_WARNING,"Can't save in background: fork: %s", strerror(errno)); return C_ERR; } serverLog(LL_NOTICE,"Background saving started by pid %ld",(long) childpid); server.rdb_save_time_start = time(NULL); server.rdb_child_type = RDB_CHILD_TYPE_DISK; return C_OK; } return C_OK; /* unreached */ }
Since fork()
is a system call, it copies the entire memory space of the parent process to the child process. This is why fork()
can be slow if you have a large dataset!
Additionally, it might result in OOM (Out of Memory) if you have a large dataset and limited remaining memory.
AOF: Append Only File#
AOF
utilizes the fsync()
system call to save every write operation to disk.
Just like WAL (Write-Ahead Logging) in a relational database
fsync
is performed in background threads to avoid blocking the main Redis event loop.
AOF
have 3 fsync
policies:
always
:fsync()
after every write operationeverysec
:fsync()
every second (default)no
:fsync()
only whenfsync()
is called explicitlyThe faster and less safe method.
Just put your data in the hands of the Operating System. Normally Linux will flush data every 30 seconds with this configuration, but it’s up to the kernel’s exact tuning.
AOF
rewrite :
- Redis will rewrite the
AOF
file in the background to avoid the file becoming too large. serverCron
will check if theAOF
file is too large, if so, it will callrewriteAppendOnlyFileBackground
to rewrite theAOF
file in the background.
Similar to RDB,
AOF
rewrite also needs tofork()
a child process to save the dataset to disk.
Which may encounter the same problems asRDB
when you have a large dataset.
Advantages of AOF
#
- Less data loss:
AOF
is better if you need to minimize data losssince it saves every write operation to disk
Disadvantages of AOF
#
- Larger file size:
AOF
files are usually larger than the equivalentRDB
files - Slower than
RDB
(depends onfsync()
policy):- Default
everysec
policy is still very efficient ! - But if you set
always
policy, it will be slower thanRDB
since it needs tofsync()
after every write operation
- Default
Reference#
- redis: RDB and AOF
- Redis source code explanation
- Linux :
fork()
andfsync()