Updating Disaster Recovery Plans
By Joseph A. Liguori
You arrive at the office following a long holiday weekend to learn that several employees have called in sick, in numbers significant enough to cause considerable concern throughout the bank. At first, your thoughts wander as you try to reason out what the source might be. Could it be disgruntled employees who planned this “sick-out?” How about the highly publicized avian flu? Quickly you conclude that the cause is really not the company’s primary concern. The bank has a major problem with meeting both its operations and service levels. As you comment to your colleagues, “This is quite the disaster,” you suddenly realize that your disaster recovery planning, training and testing did not prepare you for this kind of an event. The bank’s systems are operating fine, and access to the data center is clearly available. What you find disturbing is that the number of available staff is not enough to execute your bank’s disaster recovery procedures should they be needed.
With the government’s issuance of an emergency response plan to deal with the pandemic flu, banks and other businesses have been asked to develop response plans of their own to cope with this type of scenario. This should not be new to those of us in banking. Following the terrorist attacks on Sept. 11, 2001, and the subsequent wave of anthrax threats, the scenario described here could very well have been played out, and actually was to a limited degree at a few government offices and public businesses. How many banks updated their disaster recovery plans to handle such a scenario following the anthrax threats? Very few did. Why? Except for two or maybe three cases, almost every incident was determined to be a hoax. And, of those that were real, terrorists were not seen as the source. Our response was like that to the boy who cried wolf.
I am not an ornithologist or microbiologist, or trained in any field of the biological sciences to be certain about whether or not the avian flu will cripple our country as some say it will, but I do know what the impact upon our businesses will be should it manifest itself that way. I am not a computer programmer either, but I do know the impact of a major failure of a bank’s core operating software upon its ability to serve its customers. Regardless of what our formal education is, prudence dictates that we have plans for those events we determine to be at a risk level that requires action plans. The key word here is “risk.”
Most of us have written our disaster recovery and business continuity plans in the traditional sense. In other words, we brainstorm the likelihood of certain scenarios for the geographic area in Maine where our data center or computer room resides. Many of us have plans written that respond to situations affecting our core processing system. Hence, we have planned for the most likely events and mitigated their impact upon our business in a way that makes the most sense. What about our network applications? Many of our banks rely upon critical network applications to run the business and deliver services to their customers, and in this day and age, customers expect a much quicker recovery than that which is written in most of our plans. Have we developed procedures in our plans to respond to impacts upon critical applications? And what about testing? We perform the required test every year to make sure that the things we want to work, work at an acceptable level, including ensuring that team members and other staff are familiar with the plan. Our test results should tell us what needs to be revised. But first our test plans need to be revisited and most likely revised. When was the last time your plan was thoroughly reviewed and done so using a risk-based approach?
Business Impact Analysis
As previously mentioned, most of us have traditionally identified those events determined to be the ones most likely to occur, and developed our recovery and continuity plans around them. A risk-based Business Impact Analysis (BIA) is the core document that will result in a revision to your overall plan. This exercise will evaluate the likelihood and impact of certain scenarios upon your business and identify the threats to your bank that pose the greatest risks. For instance, you may decide a particular event has a low probability of occurrence, but its impact may devastate the organization should it occur. Combined, this may translate into a risk calculated to be a high risk, requiring response procedures whereas before there was no concern because only its likelihood of occurrence was considered low. As you develop your bank’s risk-based BIA, it will result in a rewrite of your response plan and a revision of your testing plans. The risk criteria of the BIA should follow the standard established by your bank’s risk management policy and program. If such a policy and program does not exist, one should be done, as it serves as the governing document for all other risk-based assessments performed within your organization.
The initial step of any BIA requires brainstorming performed by a committee of individuals who bring different perspectives and levels of concern to the table. Many of our plans were developed following “likelihood assessments” that focused on natural disasters, human error, hardware/software failures, etc. with the assumption that we either cannot get to our data center, or its hardware/software has been damaged or destroyed. The category of various disaster scenarios has not changed, but our way of thinking must change. The failure of hardware, for example, is in everyone’s plan, but its likelihood of occurrence and impact upon the company has probably never been re-evaluated from a risk perspective. Let me explain why this is a concern.
Many of our organizations are forced to stretch their resources as far as they can in an effort to control or minimize expense. When we stretch hardware beyond its expected life, has not the probability of the likelihood of the failure scenario increased? We know the impact is great, so how does the combined effect of the two translate into a risk-rating of this event upon the organization? And what will that rating look like next year if we do not replace the hardware? Or, what if we do replace the hardware? We have all evaluated the likelihood of a severe hurricane making landfall in Maine, but have written it off as unlikely. Would not an annual or at least a more regular re-evaluation of our BIA allow us to consider the changes in weather patterns and other scientific data when we risk-base the occurrence of such an event? The floods in southern Maine and New Hampshire have not been witnessed in decades or more than a century to the degree with which they occurred. The damage was devastating for many. And how do corporate policies and strategic decisions beyond the confines of operations and technology ultimately affect our disaster recovery and business continuity postures? To base our assessments of various scenarios solely on the likelihood of their occurrence is not in the best interest of our business, its customers or its employees.
When an organization identifies the higher risks to its business through a formal BIA, the disaster recovery plan will be revised in order to mitigate or eliminate the risks, as much as is humanly possible of course. The change in the direction your plan will undergo will also make your test plans more involved. Test plans or scripts will test applications and systems that were not previously tested. They will also result in more thorough testing and more specific actions. The plans need to ensure that desired outcomes are provided as benchmarks, test results documented and action plans identified to correct identified discrepancies. The testing process will not only serve as a means of validating the plan, but also as necessary documentation to further revise the plan and identify upgrades to technology, facilities, third-party arrangements and even certain company policies. Therefore, the BIA and its effect upon your plan and testing is not exclusive to your disaster recovery or business continuity plans. It ties back to strategic plans, budgets, policies, procedures and third-party contracts as well.
Never an Easy Task
Since Y2K (goodness, did I say Y2K?), we have taken a risk-based approach with many of our plans, programs and assessments. Regulators and auditors aside, such an approach is simply good business. Our disaster recovery plans sit on a shelf and get dusted off once a year. They pass their annual tests because we haven’t changed the test criteria, and the criteria have not changed because we have not re-evaluated our plans. The BIA is an important exercise that will produce an extremely valuable document from which a new and improved plan will emerge, bringing with it a discipline that ensures the process remains dynamic and is integrated into the entire business process of your bank.
Joe Liguori (email@example.com) has been responsible for bank technology, operations, information security, physical security, disaster recovery and facilities as former senior vice president with Androscoggin Bank and vice president with Camden National Corp.