esxsi.com – Disaster Recovery Strategy – The Backup Bible Review


The following post reviews The Backup Bible, a 204 page ebook written by Eric Siron and published by Altaro Software. I’ve decided to break this down into a short discussion around each section, 3 in total, to help with the flow of the text. The Backup Bible can be downloaded for free here.

Initial thoughts on downloading the ebook and flicking through; clearly a lot of research and work has gone into this based on content applicable to both the business outcomes and objectives as well as low level technical detail. A substantial number of example process documents are included that m§ean practical outputs can be applied by the reader straight away. The graphics, colour scheme, font, and text size all look like they’ll make the ebook easy to follow, so let’s jump in.

Introduction and part 1: creating a backup & disaster recovery strategy

Chapter 1 is short and to the point; providing a check list for the scope of disaster recovery. The check list serves as a nice starting point for organisations to expand into sub-categories based on their own goals, requirements, or industry specific regulations. An interesting observation made during the introduction is that human nature leans towards planning for best outcomes, rather than worst. As a result, organisations design systems with defences for known common failures (think disk, network adapter) but less often design with catastrophic failure in mind. Rather than designing systems with the expectation of failure principals, an assumption is made that they will operate as expected. We’ll discuss more points made in the ebook around shifting that mentality later.

A further key-takeaway from the introduction is that disaster recovery planning is not a one-time event, but an ongoing process. Plans and requirements will need to adapt as new services are added and older services change. This is why the ebook focuses heavily on the implementation of a disaster recovery strategy, staying agile, as oppose to a singular process or policy. The disaster recovery strategy will be driven by the overall business strategy.

Some great talking points are introduced in chapter 2 around popular responses as to why disaster recovery fails, or is not properly implemented. In most cases expect a combination of many, if not all, of these reasons. Building disaster recovery into the initial design of a greenfield environment can be relatively easy, in comparison with retrospectively fitting it into brownfield environments. Older systems and infrastructure were generally deployed in silos yet interlinked with dependencies. New services have been deployed over time, bolted onto existing infrastructure, with changing topologies. Disaster recovery plans need to be agile enough to cater for different systems, and keep up with both technical and business changes.

The chapter moves on to talk about common risks, and determining key stakeholders. Again, useful examples that can be adapted and increased based on your industry and organisational structure, such as protecting physical assets as well as digital. Gaining insights into business priorities and current risks from different people within the organisation boosts your chances of building successful plans and processes.

Chapter 3 starts out with some easy to understand explanations and graphics on Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO). It’s good to set out why these are important, since different systems will have different RTO and RPO values. Defining and setting these values helps with the trade-off between budget and availability.

The Backup Bible – Recovery Time Objective

Another tip here that really resonates with me is to translate business impact into monitory value. The suggested “How long can we operate without this system?” suffices in most cases, but to create urgency the ebook recommends asking “How much does this system cost us per hour when offline?”. Hearing that the business is losing £x an hour hits harder than saying part of the website is down. Establishing RPO’s for different events is also something often overlooked, for example recovery from a hardware failure will most likely differ from a site outage or ransomware attack.

The first 3 chapters laid the groundwork for capturing business requirements. Chapter 4 helps break business items down and map them to common technical solutions, applied at both the software and hardware level. Solid explanations of fault tolerance and high availability are followed up with in-depth descriptions of applying these terms to storage, memory, and other systems through clustering, replication, and load-balancing. Although fault tolerance and high availability contribute to IT operations, they absolutely do not replace backups. This point is positioned alongside an extensive list of how to assess potential backup solutions against your individual requirements when evaluating software.

The Backup Bible – Hard Drive Fault Tolerance

Part 2: backup best practices in action

The second part of The Backup Bible begins by setting out some example architectures for backup software placement. Security principals are also now being worked into the disaster recovery strategy, alongside backup software and storage, whilst those not currently in a backup or system administrator role will learn about the differences between crash-consistent, and application-consistent backups. Chapter 5 again allows for practical application, in the form of a 2-phase backup software selection and testing plan. The example tracking table provided can be used for feature comparisons, and kept for audit or reporting purposes.

Options for backup storage targets follow in chapter 6; another section of comprehensive explanations detailing the advantages and disadvantages of different storage types. The information allows consumers to apply each scenario to their own requirements and make informed decisions when implementing their disaster recovery strategy. Most examples are naturally data centre storage types, but perhaps where cloud storage targets differ is that they should be factored into an organisationals overall cloud strategy too. Factors like egress charges, connectivity, and data offshoring governance or regulations for cloud targets will likely need building into the wider cloud computing picture.

Chapter 7 moves on to a really important topic, and I’m glad this section looks pretty weighty. In many industries, securing and protecting backup data can be just as important as the availability of the data itself. The opening pages walk through securing backups using examples like encryption, account management, secret/key vaults, firewalls, network segmentation and air-gapping. At the same time keeping to the functionality within your existing backup, storage, network and security tools, so as not to over-engineer a solution that will be difficult to support in the future. The chapter moves on to talk about layering security practices, and how these can be weighed up using risk assessments, and then built into policy if desired. Very well made points that are realistic and applicable to real world scenarios.

An example backup software deployment checklist is covered in chapter 8, before chapter 9 expands more on documentation. This is another step often missed in the rush to get solutions out of the door. In disaster recovery, documentation is especially important so that on-call staff and those not familiar with the setup can carry out any required actions quickly and correctly.

The next stage in the process is to implement your organisations data protection needs in the form of backup schedule(s) – usually multiple. Various examples of backup schedules are used in chapter 10; placing the most value on full backups without dependencies on other data sources, and filtering through incremental, differential, or delta backups for convenience. A recommendation is made to time backups with business events, such as month end processes, when high value changes take place, or maintenance such as monthly operating system updates. Once again a real focus is placed firstly on building a technical strategy around business requirements, and secondly on communication and documentation to ensure that strategy is as effective as possible.

The final 2 chapters of this section describe maintaining the backup solution beyond its initial implementation. This includes monitoring and testing backups for data integrity, and keeping systems up to date and in good health. Security patches and upgrades will be in the form of scheduled or automatic regular updates, and less frequent one-off events like drivers, firmware, or hot patches for exposed vulnerabilities or fixes. Often newly implemented systems look great on day 1, but come back on day x in 6 or 12 months and things can look different. It’s refreshing to see the ebook call this out and plan for the ongoing lifecycle of a solution as well as the initial design and implementation.

The Backup Bible – Securing Backup Data

Part 3: disaster recovery & business continuity blueprint

In part 3 the ebook dives into business continuity, focusing on areas beyond technology; like making sure that office or physical space, people, and business dependencies are all taken care of to start recovery operations as quickly as possible. A theme that has remained consistent is asking questions and woking with multiple subject-matter experts, in turn exposing more questions and considerations. Example configurations of hot, warm, and cold secondary sites are used, alongside the required actions to prepare each site should business continuity ever be invoked. These topics really get the reader thinking of the bigger picture, and how some of the technical planning earlier feeds into business priorities.

Chapter 15 moves on to replication with a thorough description on replication types, and when they are or are not applicable. Replication enables rapid failover to another site, and does not replace but rather complements backups where budgets allow. Clear direction is given here to ensure replication practices are understood, along with which tasks can be automated but which need to remain manual. The ebook points out that if configured incorrectly replication can also add overhead, and put data at risk, so it’s good to see both sides of the coin. A helpful tip in this chapter is to use external monitoring where possible, rather than rely on built-in email alerting, which could fail along with the system it is supposed to be monitoring!

The use of cloud services is probably beyond the scope of this ebook, that said, because cloud storage is becoming an option for backup targets it is mentioned and implied currently as more of a comfort blanket. Chapter 16 adds some further considerations, like not making an assumption that backups or disaster recovery is included in a service just because it is hosted in the cloud. Cloud services need to be architected for availability in the same way they do on-premises. Some Software-as-a-Service (SaaS) offerings may include this functionality but it will be explicitly stated. The scope of using cloud for disaster recovery obviously varies between organisations already heavily invested in the cloud, or multiple cloud providers, and those not using it at all. Despite there being more talking points I think it’s right to keep the conversation on topic to avoid going down a rabbit hole.

We’re now at a stage where disaster recovery has been invoked and chapter 17 runs through what some of the business processes may look like. There’s a lot of compassion shown in this chapter and whilst flexibility and considerations for people is a given, going a step further and writing it into the business continuity documentation is something I must admit I hadn’t previously thought about. Having plans laid out ready as to how employees will be contacted in the event of a disaster will also save time and reduce confusion. There’s some good information here on implementing home working too, and this is particularly relevant as we still navigate through COVID-19 restrictions. Certain areas of industries like manufacturing and healthcare may need that secondary site mentioned earlier on, but a lot of jobs in the Internet-era can be carried out from home or distributed locations.

As the ebook begins to draw to a close, the latter 2 chapters produce guidance on how the disaster recovery strategy can be tested and maintained. This is crucial for many of the reasons we’ve read through, and we learn that automating testing in a ‘set and forget’ way is usually not sufficient. Some processes need to be manually checked and validated on a schedule to protect against unexpected behaviour. Chapter 19 calls once again on working together with teams outside of IT, and this is more good advice since rather than IT setting baseline values, for example on backup frequency and retention, it encourages line of business to take responsibility for their own data and applications.

The Backup Bible – Systems Fail

Summary

Overall The Backup Bible has been an enlightening read; containing some really useful guidance and personal narratives about situations the author has experienced. A substantial number of templates for documentation and process checklists are included at the end of the ebook, ranging from backup and restore documentation to data protection questionnaires to testing plans. The ebook does enough of the explanation leg work, and not assuming knowledge, to make sure that a reader from any area of the business can take something away from it.

There are of course references to Altaro software, but this in no way is a glorified advertisement. The points discussed are presented in a neutral manner with comprehensive detail for organisations to make informed decisions about technologies and processes most suited to their own business. Rather than publishing a list of ‘must-haves’ for disaster recovery, the ebook acknowledges that business have different requirements and provides the tools for the reader to go about implementing their own version based on what they have learnt through the course of the ebook.

From a small business with very little protection against a disaster, to an enterprise organisation with processes already in place, anybody interested in disaster recovery will be able to gain something from reading this ebook. The Backup Bible can be downloaded for free here.



Source link