Skip to main content
  1. Database Landing Page/

Redis Persistence: RDB and AOF

·5 mins· ·
Blog En Database Redis
Liu Zhe You
Author
Liu Zhe You
Skilled in full-stack development and DevOps, currently focusing on Backend.
Table of Contents

Redis Persistence: RDB and AOF
#

Article’s Redis version: 7.4.2
This information might be wrong if you are using a different version.

Why Redis needs persistence?
#

Because Redis is an in-memory database So when the Redis service restarts, all data will disappear

However, you can use persistence settings Let Redis recover its state when it restarts !

Redis persistence settings
#

Redis provides two persistence settings They are RDB and AOF

  • RDB: Redis Database
    RDB persistence performs point-in-time snapshots of your dataset at specified intervals.
  • AOF: Append Only File
    AOF persistence logs every write operation received by the server. These operations can then be replayed again at server startup, reconstructing the original dataset.
  • RDB+AOF: RDB and AOF
    You can enable both RDB and AOF persistence to provide a more robust data protection mechanism.

RDB: Redis Database
#

In short,
RDB needs to fork() a child process to save the point-in-time snapshot of your dataset to disk.

Advantages of RDB
#

RDB is a very compact single-file point-in-time representation of your Redis data.

  • Space-efficient: RDB is more space-efficient than AOF
  • Easier to Backup: RDB is easier to backup and restore

    since it is a single file, you can just copy it to any storage such as S3, Google Cloud Storage, etc.

  • Faster restart with large datasets: RDB is faster to recover when you have a large dataset compared to AOF

Redis have utilized CoW to optimize the fork() process

Disadvantages of RDB
#

  • Data loss: RDB is not good if you need to minimize data loss

    because it only saves the dataset at specified intervals

  • Not suitable for large datasets: RDB is not suitable for large datasets

    Since it needs to fork() a child process to save the dataset to disk
    fork() can be slow if you have a large dataset
    may stop serving clients for some milliseconds or even seconds !

Detailed Implementation of RDB
#

We have mentioned that RDB needs to fork() a child process to save the dataset to disk.

Let’s take a look at the implementation of RDB in Redis source code.

  1. From redis/src/rdb.c : define bgsaveCommand command entry point function.

    3907
    3908
    3909
    3910
    3911
    3912
    3913
    3914
    3915
    3916
    3917
    3918
    3919
    3920
    3921
    3922
    3923
    3924
    3925
    3926
    3927
    3928
    3929
    3930
    3931
    3932
    3933
    3934
    3935
    3936
    3937
    3938
    3939
    3940
    3941
    3942
    
    /* BGSAVE [SCHEDULE] */
    void bgsaveCommand(client *c) {
        int schedule = 0;
    
        /* The SCHEDULE option changes the behavior of BGSAVE when an AOF rewrite
         * is in progress. Instead of returning an error a BGSAVE gets scheduled. */
        if (c->argc > 1) {
            if (c->argc == 2 && !strcasecmp(c->argv[1]->ptr,"schedule")) {
                schedule = 1;
            } else {
                addReplyErrorObject(c,shared.syntaxerr);
                return;
            }
        }
    
        rdbSaveInfo rsi, *rsiptr;
        rsiptr = rdbPopulateSaveInfo(&rsi);
    
        if (server.child_type == CHILD_TYPE_RDB) {
            addReplyError(c,"Background save already in progress");
        } else if (hasActiveChildProcess() || server.in_exec) {
            if (schedule || server.in_exec) {
                server.rdb_bgsave_scheduled = 1;
                addReplyStatus(c,"Background saving scheduled");
            } else {
                addReplyError(c,
                "Another child process is active (AOF?): can't BGSAVE right now. "
                "Use BGSAVE SCHEDULE in order to schedule a BGSAVE whenever "
                "possible.");
            }
        } else if (rdbSaveBackground(SLAVE_REQ_NONE,server.rdb_filename,rsiptr,RDBFLAGS_NONE) == C_OK) {
            addReplyStatus(c,"Background saving started");
        } else {
            addReplyErrorObject(c,shared.err);
        }
    }
    

  2. And bgsaveCommand actually calls rdbSaveBackground.
    rdbSaveBackground will check if there is already a child process running, if not, it will calls redisFork to fork a child process to save the dataset to disk.

    1612
    1613
    1614
    1615
    1616
    1617
    1618
    1619
    1620
    1621
    1622
    1623
    1624
    1625
    1626
    1627
    1628
    1629
    1630
    1631
    1632
    1633
    1634
    1635
    1636
    1637
    1638
    1639
    1640
    1641
    1642
    1643
    1644
    1645
    1646
    
    int rdbSaveBackground(int req, char *filename, rdbSaveInfo *rsi, int rdbflags) {
        pid_t childpid;
    
        if (hasActiveChildProcess()) return C_ERR;
        server.stat_rdb_saves++;
    
        server.dirty_before_bgsave = server.dirty;
        server.lastbgsave_try = time(NULL);
    
        if ((childpid = redisFork(CHILD_TYPE_RDB)) == 0) {
            int retval;
    
            /* Child */
            redisSetProcTitle("redis-rdb-bgsave");
            redisSetCpuAffinity(server.bgsave_cpulist);
            retval = rdbSave(req, filename,rsi,rdbflags);
            if (retval == C_OK) {
                sendChildCowInfo(CHILD_INFO_TYPE_RDB_COW_SIZE, "RDB");
            }
            exitFromChild((retval == C_OK) ? 0 : 1);
        } else {
            /* Parent */
            if (childpid == -1) {
                server.lastbgsave_status = C_ERR;
                serverLog(LL_WARNING,"Can't save in background: fork: %s",
                    strerror(errno));
                return C_ERR;
            }
            serverLog(LL_NOTICE,"Background saving started by pid %ld",(long) childpid);
            server.rdb_save_time_start = time(NULL);
            server.rdb_child_type = RDB_CHILD_TYPE_DISK;
            return C_OK;
        }
        return C_OK; /* unreached */
    }
    

Since fork() is a system call, it copies the entire memory space of the parent process to the child process. This is why fork() can be slow if you have a large dataset!

Additionally, it might result in OOM (Out of Memory) if you have a large dataset and limited remaining memory.

AOF: Append Only File
#

AOF utilizes the fsync() system call to save every write operation to disk.

Just like WAL (Write-Ahead Logging) in a relational database fsync is performed in background threads to avoid blocking the main Redis event loop.

AOF have 3 fsync policies:

  • always: fsync() after every write operation
  • everysec: fsync() every second (default)
  • no: fsync() only when fsync() is called explicitly

    The faster and less safe method.
    Just put your data in the hands of the Operating System. Normally Linux will flush data every 30 seconds with this configuration, but it’s up to the kernel’s exact tuning.

AOF rewrite :

  • Redis will rewrite the AOF file in the background to avoid the file becoming too large.
  • serverCron will check if the AOF file is too large, if so, it will call rewriteAppendOnlyFileBackground to rewrite the AOF file in the background.

Similar to RDB, AOF rewrite also needs to fork() a child process to save the dataset to disk.
Which may encounter the same problems as RDB when you have a large dataset.

Advantages of AOF
#

  • Less data loss: AOF is better if you need to minimize data loss

    since it saves every write operation to disk

Disadvantages of AOF
#

  • Larger file size: AOF files are usually larger than the equivalent RDB files
  • Slower than RDB (depends on fsync() policy):
    • Default everysec policy is still very efficient !
    • But if you set always policy, it will be slower than RDB since it needs to fsync() after every write operation

Reference
#

Related

PgBouncer: Lightweight Postgres Connection Pool
·2 mins
Blog Database En Postgresql
Solving Django backend DB connection overload with PgBouncer
How to use Transaction in SqlAlchemy
·2 mins
Blog En SqlAlchemy Backend Python
How to transaction in SqlAlchemy
Python: Read File(BinaryIO) Multiple Time
·1 min
Blog En Python
Read file (BinaryIO) multiple time in Python. Solution to prevent empty content in the second read.
Cloudflare Tunnel
·3 mins
Blog En
Setup Cloudflare Tunnel for NAT, an alternative to Ngrok
Tmux Cheat Sheet
·3 mins
Blog En
Common tmux commands Cheat Sheet
FastAPI: Mock S3 with Moto
·3 mins
Blog En AWS Backend Testing FastAPI
FastAPI Testing: Mock AWS S3 Boto3 With Moto