Backing up a database depends on how it’s delivered
How you back up your database depends three factors: how the database is delivered to you, the backup logistics of the database, and the recovery time objective (RTO) and the recovery point objective (RPO) you are attempting to meet. This article covers how the database is delivered.
There are three ways databases can be delivered: as software on a server you own; as platform-as-a-service (PaaS); and as a serverless service. Let’s take a look at them.
Traditional database software
Until a few years ago the way all databases were delivered was by buying a license for a product and installing it on a server or VM of your choice. You were responsible for everything, including the security and administration of the server, the storage, the application itself, and (of course) the backup of the database.
This means you have a variety of choices to make, including how to back it up. Some possibilities are completely invalid because databases behave in a particular way that makes them not so easy to backup using methods designed for unstructured data. The following three concepts apply to almost all traditionally delivered databases.
Moving target
The data in a database is generally stored in datafiles that you can see in the filesystem of the server or VM that hosts it. These files are constantly changing as long as something is making updates to the database, which means you cannot just back them up like any other file. The backup would be worthless.
Point-in-time backups and restores
Most supported database-backup methods create a copy of the database at the point in time when the copy was made, for example each night at 10 p.m. That means it will only be able to restore the database to that point in time.
Rolling forward or backward from a point in time
To be able to meet a tighter RPO, most databases have a log of transactions that can be replayed after a point-in-time restore to move the point in time up to a more recent point in time that you specify. This log can also be used to roll back transactions if the database crashes and comes up in an inconsistent state.
Those three general concepts are behind almost all databases running on servers or VMs you administer, although there are exceptions to every rule. Datafiles are sometimes block devices and not files at all, and sometimes they do not change even if the database is changing. The key to getting the backups of any traditionally delivered database right lies in understanding how the database solves the above three challenges.
The most typical method of backing up a traditionally delivered database is a nightly copy that may be full or incremental, followed by continuous backup of the transaction logs. The dump will allow you to restore the entire database, then the logs will allow you to roll transactions forward to the point that things went wrong.
Platform-as-a-Service
A second way databases are delivered is the platform-as-a-service (PaaS) model where you see only the application and have limited to no access to the infrastructure behind it. Amazon Relational Database Service (RDS) is an example of a PaaS offering, and it can be configured to provide Oracle, MySQL, PostgreSQL, and MariaDB, and Aurora databases. Azure also offers SQL Server, MySQL, PostgreSQL, and others in a PaaS configuration.
Backup options for a PaaS database are usually pretty straightforward. Each PaaS offering provides a mechanism that supports backup and recovery. Some come with backups that run automatically every day and typically create a copy in that vendor’s object storage. Others require you to configure the backups for them to run. Therefore, do not assume your PaaS database is automatically backed up.
In fact, you shouldn’t assume any of your infrastructure is backed up. Investigate each PaaS database you are using and see what backup and recovery options it offers. Most default backup methods for PaaS databases copy your backups into the same account and region the database is running in, so another thing to look into is whether or not you can copy those backups to a different account and region. This is a good idea that will protect you against things like the OVH fire that destroyed two data centers.
Serverless databases
Serverless databases take PaaS one step farther, removing even more administrative requirements from the customer and creating an easier-to-use experience. AWS DynamoDB, Aurora Serverless, and Azure Cosmos DB are all examples of these databases.
With a serverless database, you don’t have to configure anything. You literally just start putting in data. The compute and storage resources, as well as database partitioning decisions, will automatically be decided and provisioned for you. It’s so “magic” that many people assume backups are automatically handled, but that’s not always the case.
Like PaaS databases, the backup methods are dictated by the vendor offering the database, so the best practice is the same: Investigate the best backup methods for the database you are using and deploy them. Make sure to figure out how to copy data to another region and account.
The cloud is not magic, but it certainly has made backing up and recovering databases significantly easier. The ease with which you can create a backup of an entire database—even a partitioned one residing on hundreds of nodes—is much easier than what’s required when you are managing everything in a traditional environment. Just don’t let things be so easy that you start assuming things. That’s always a recipe for backup disaster.
Copyright © 2022 IDG Communications, Inc.