Restoring databases from backup requires hands-on practice


It’s important to back up your databases, but it’s even more important to be able to restore it, so once you’ve identified how you’re going to back it up, make sure you test the different recovery scenarios.

Broadly speaking, there are two database types considered here, traditional and modern, and  recovery is different for each. A traditional database in this context is a database that runs in a single server or virtual machine that you manage, and a modern database might run across many nodes or it might even be serverless, where you have no access to the underlying infrastructure.

Recovering traditional databases

Restoring a traditional database is straightforward as long as you have practiced how to handle  different things that could go wrong. You don’t want to test your backup system for the first time during an actual database outage.

First, you need to identify the problem. This may seem obvious, but each database product has a sequence of steps to follow to determine why your database is not running. It could be a double disk failure on your RAID array or accidental deletion of the database. If you’re not sure what the issue is, try starting it up in phases to zero in on the problem. This way, you can save time and effort during the recovery phase.

In most cases database issues can be fixed by restoring the data files and applying media recovery. To do so, first you must restore the data files from the backup system. Depending on the method used to back up the database, you may be able to restore the files directly from backup. If you used the dump-and-sweep method, where you dump to disk and sweep that backup into your backup system, you might have to do a two-phase restore. That entails restoring the files from your backup systems to a staging area, and then restoring from the staging area to the database.

Once that’s accomplished, apply media recovery if your database product supports it, and you have transaction-log backups. This step replays the transactions that have occurred since the backup, bringing the database back to the state it was in before the outage occurred. It’s also necessary if you use hot backup or the snap-and-sweep method in which you take a snapshot and back that up. These will capture an in-progress copy of the database and need media recovery to bring it to a consistent state. In most cases, you will roll the database forward to the last successful transaction. But in other cases, you may want to stop media recovery before you get to some event like a dropped table.

If you’ve followed all the steps correctly, you should be able to start the database at this point. If it won’t start, go back to step one and figure out what went wrong. It’s worth noting that if you’re not restoring data files and applying transaction logs, your recovery process will be very different.

Recovering modern databases

It can be quite challenging to recover Recovering modern databases, especially if the databases are partitioned across many nodes. The recovery process will also largely depend on how you backed up your data, and whether or not it is an immediately consistent or eventually consistent database

Investigate and test various backup and recovery methods of your chosen database before you actually need to use it. The cloud has made testing the recovery of even the largest databases so easy that there is no excuse for being unprepared. Here are some of the available options.

There are backup methods, especially those used for PaaS and serverless databases, that do not provide point-in-time recovery in the same manner as traditional databases. Some newer databases support point-in-time recovery by simply performing more frequent snapshot-based backups. To recover to a particular point in time, select the appropriate snapshot, issue the restore command, and start the database.

Some backup options for modern databases only backup data at the table level, so the restore of these databases will be at the table level as well. In a multi-node, partioned/sharded database, it may be necessary to recover a single node. In some such databases, this can be done without a restore by issuing the commands to create a new replica of data already in the database. (This is because no node should contain data found only on that node.)

If your database is running in a provider’s cloud, it may be possible to perform frequent snapshots of all volumes where the database resides. Recovery here is very similar to point-in-time recovery: Use one of those storage-level snapshots to restore all logical unit numbers (LUN) or filesystems where the database is running, and start the database.

Depending on how data is restored, you may find yourself with part of the database out of sync with the rest of the database. One such scenario is if you restore a single node from backup, in which case you’ll need to force the out-of-sync part to become consistent with the rest of the database. Depending on how out of date it is, how big the cluster is, and the performance of the cluster, this process can take from a few minutes to several weeks.

It’s critical to work with a specialist who understands your database product, learn what ismost likely to damage your database, and practice recovering from those situations so you know exactly what to do when the worst happens. What will it look like if you lose multiple nodes? How will you recover if you lose all members of a replica set? What if something catastrophic, such as a fire or explosion, takes out all nodes in your cluster? What about human mistakes or attacks? How will you respond if someone drops or truncates a table?

Recovering both traditional and modern databases can be a daunting task, but with the right preparation and testing, you can be ready for anything that comes your way. Testing, testing, testing. Make sure you prepare for all scenarios.

Copyright © 2023 IDG Communications, Inc.



Source link