Information Technology Resource Contingency Planning
Abstract
Living in a world that is accompanied by unpredictable natural and human-made catastrophic events, organization management requires understanding, developing, and testing of a contingency plan. This paper will examine steps of development a successful contingency plan, which includes: developing a contingency planning policy, conducting a business impact analysis, identifying preventive controls, developing recovery strategies, developing a contingency plan, plan testing, training, exercises, and plan maintenance. After development, the possible recovery options are hot, cold, or warm sites with variations such as mirrored or mobile sites. Testing of business assets that can be affected by a disaster, range from plan review and walk through tests to full interruption tests. Finally, a recommendation for a 24-month testing cycle has been developed. In the recommendation, the first and the second months entail a plan review, third and fourth months include notification procedures, fifth and sixth months involve a live walk through. In addition, the seventh and eighth months have a validation test that should be conducted, while the ninth and tenth months include an update of the continuity book. For the eleventh and twelfth months, there is a full interruptive or drill test. Finally, from the thirteenth to the twenty-fourth months, the management should repeat the procedures from the first to the twelfth months, with adjustments where necessary.
Information Technology Resource Contingency Planning
Introduction
Living in a world that is accompanied by unpredictable natural and human-made catastrophic events, organization management requires understanding, developing, and testing a contingency plan (Harris & Grimalia, 2008). Technology has broadly expanded and the majority of companies have transformed to an extent of over-relying on information technology (IT) systems in the development, monitoring and storage of vital data, essential in the smooth processes of operations (Harris & Grimalia, 2008). Although, development of a business contingency plan can be costly, depending on the size of the firm and criticality of IT systems in the success of the organization, it pays highly in a major event of a crisis or disaster (Harris & Grimalia, 2008).
A contingency plan, which, according to Swanson, Bowen, Phillips, Gallup, & Lynes (2010), is a document containing procedures and measures that aim to assist in the recovery of critical information of an organization during disruption, as wrote Harris & Grimalia (2008), requires to be well organized, documented, and tested for its success. It must be able to assist a business to withstand and recover from a disaster (Swanson et al., 2010). For instance, Janczewski and Colarik (2005) indicate that after the Kyoto earthquake, the prepared companies, which had a well-developed contingency plan, recovered very fast and started resuming normal business routines in a week.
The contingency planning process includes seven crucial steps, provided by the National Institute of Standards and Technology’s (NIST) (Harris & Grimalia, 2008; Swanson et al., 2010). These steps are: developing a contingency planning policy, conducting a business impact analysis, identifying preventive controls, developing recovery strategies, developing a contingency plan, plan testing, training, exercises, and plan maintenance (Harris & Grimalia, 2008; Swanson et al., 2010; Wees, 2013).
Steps in the Contingency Planning Process
Developing a contingency planning policy
This step starts with realization of the necessity of the contingency planning process. It requires approval from the senior management, otherwise any progress will be a waste of time. Once the need is determined, the policy statement should be outlined, indicating the overall objective of planning as well as a framework and task sharing in the contingency planning. It should identify the statutory and regulatory requirements of the plan and once ready it should be published (Harris & Grimalia, 2008; Swanson et al., 2010).
Swanson et al. (2010) indicate that there are several elements captured in the policy statement. The first one is the role and responsibility, which entail clear guideline of the division of tasks to personnel in a time of crisis. The second element is the scope of the contingency plan in relation to other organizational operations. The third element is resource requirements to plan, test, and train personnel to hard contingency operation when disruptions occur. The fourth element is the training requirements, which outline resources needed to offer the best methods of understanding the application of the written plan in practice or real situations. The fifth is the exercise and testing schedules that on one side prepare the employees for disaster management, and on the other, ensure that the plan is functional prior to a disaster. The sixth element is the plan maintenance schedules that make sure that the plans are updated to meet the requirement of a successful recovery. Finally, the last element is the backups indicating the number and location of storage.
The objectives guiding the policy statement are: protection of human life, reduced company loss and risk, increased probability of recovery, and company protection from lawsuits. Additionally, there is the maintenance of a competitive position, customer confidence preservation, goodwill, threatened operations, business impact analysis overview, and brief recovery strategies (Harris & Grimalia, 2008). Following the above-stated elements and objective provides strong foundation of a successful contingency plan.
Conducting a business impact analysis (BIA)
It is a crucial step performed at the initial stage of business establishment. It involves the identification of processes that are vital in the success of business. For instance, procurement or supply chain. Additionally, there is the integration of identified processes with IT systems that support business operations. This step also entails assessment of direct and indirect costs accompanying these processes, e.g. monetary cost, costs of threat or mission failure (Harris & Grimalia, 2008). With identification of critical components, processes, and interrelatedness of operations, the impact of disruption or catastrophe is analyzed.
According to Swanson et al. (2010), the BIA is attained through three steps. The first step is the determination of business processes and recovery criticality. It is essential in unfolding the complexity and dependability of both IT systems and business processes. Consequently, this reveals the vulnerability of systems to a disruption. The second step is identifying a resource requirement, which is an intense evaluation of materials, human resources, records, and systems necessary for successful recovery effort. The final step is identifying the recovery priorities of system resources while understanding that certain processes or system components are more important than others. This should lead to the development of a ‘priority hierarchy’ based on the BIA.
Identifying preventive controls
Some impacts can be handled or avoided using preventative measures. This step involves checking the availability of preventive options that may detect, defer, and reduce effect of a disruption in the business systems (Swanson, et al., 2010). If possible, it is better to prevent than to wait for a serious circumstance that requires the implementation of the contingency plan. At this step, consideration of the costs of preventive measures makes sure they do not exceed the purpose of the systems being protected (Harris & Grimalia, 2008). Preventive measures may include: fire detectors, security staff, alternative power sources, high capacity air conditioners, and offsite backups, which should be installed based on the process and resource priorities (Swanson, et al., 2010).
Developing recovery strategies
This is the fourth step of the contingency planning process. Incidences, where the preventive measure cannot apply, should be catered for by strategies that ensure running and continuation of business processes identified in step two (BIA). The recovery strategies should be priority-based and offering a variety of recovery options from within or outside the organization. Swanson et al. (2010) indicate the recovery strategies provided by NIST. According to the authors, specific recovery methods should include commercial contract with alternative site vendors, mirrored sites, mobile sites, reciprocal agreements with internal and external organizations, and service level agreements (SLA) with equipment vendors. In addition, technologies such as redundant arrays of independent disks (RAID), automatic failover, UPS, server clustering, and mirrored systems should be considered when developing a system recovery strategy (Swanson et al., 2010, p. 20).
Developing IT contingency plan
It is the fifth and a vital step in the planning process. Having dealt with four stages, it will be of no importance if documentation is not conducted. That is, it is like the initial scenarios, where no planning ever existed. While it has been documented that some companies ignore their plans (Johnson, 2006), their applications have positive results. Employees, including senior management, tend to make impacting mistakes when using individual thoughts, which are not following a certain guideline (plan). The contingency plan (CP) development is unique to each organization, and according to Swanson, et al. (2010), the NIST only provide a starting format of the plan. It should act as the consulting tools since it captures details of each section of the planning process (Harris & Grimalia, 2008). After completion, together with other business plans, it should be approved by management.
Plan testing, training, and exercises
A developed CP is incomplete without the evidence that it will work to the intended contingency. Realization that the plan is not functional during a real situation can be very disappointing and detrimental to the business (Harris & Grimalia, 2008). The purpose of sixth step of the planning process is to identify gaps and holes that require fixing as well as provide a practical platform to exercise the strategies, which are on paper (Swanson et al., 2010). Additionally, it enables validation of successful tests that allow recovery to normal operations (Barry, 2012). Training prepares the personnel involved in the recovery phase to their roles and procedures during a disaster and exercises to ensure that the plan and the personnel are effective in executing the disaster recovery (Grance et al., 2006; Swanson et al., 2010). Proper tests, training, and exercises increase success probability of a contingency plan.
Plan maintenance
Businesses will never remain the same due to changes most attributed to shifting business requirements, technology advancement, and new organizational policies (Swanson et al., 2010). Therefore, the plan needs frequent reviewing and updating to keep up with systems functionality. It is dangerous to use an outdated plan since it will never attain the normal operation and will seriously affect the systems competence (Harris & Grimalia, 2008; Swanson et al., 2010).
Possible Recovery Options
The purpose of any business contingency plan is to allow for the recovery of business assets during a contingency. Recovery, therefore, occupies the largest portion of the plan because of several reasons. First, personnel require ample time to deploy the alternative processes. Secondly, testing of the recovery plan needs time and is costly (Wees, 2013).
Backups
Organizations need to create backups for their systems, which should be done on a regular basis (Swanson et al., 2010). The backups should be stored away from the business environment, known as offsite location to reduce chances of backup’s destruction during a disruption (Swanson et al., 2010; Wees, 2013). The sites should be easily accessible to reduce the recovery time when the backups are required (Wees, 2013). According to Barry (2012), the backups should entail all assets and information depended upon by the business including operation systems, databases, and software.
Recovery options
Selection of a recovery option highly depends on the cost of the option as well as the criticality of the business to customers and investors. Three options exist, namely hot, warm, and cold recovery sites (Barry, 2012). Consultation on the best site to use is essential for success of a business.
Hot site
It duplicates the original business system in that it contains all the necessity of operations and hardware that allow immediate loading of the existing system data and backups (Barry, 2012). This allows the businesses to have instant recovery from a disaster, although the cost of such a site is considerably high (Swanson et al., 2010; Barry, 2012; Wees, 2013). The choice of a hot site should be based on the analyses if the cost of system loss is higher than that of the site (Wees, 2013). The site may be the storage location of backups (Barry, 2012; Wees, 2013) or entirely separate site (Swanson et al., 2010).
Warm site
It is partially equipped with the necessities, e.g. software and hardware (Swanson et al., 2010; Barry, 2012; Wees, 2013). In this site, operations may resume, but they lack the up-to-date configurations present in the current business operations (Wees, 2013). Like the hot sites, they may be the offsite storage locations of backups and operational data and, therefore, no much work is needed (Barry, 2012).
Cold site
It only offers the space (facility) with no system infrastructure, hence low cost of acquisition. However, it takes a longer time for businesses to resume operations since backups from offsite locations and operation necessities including hardware and software will have to be acquired (Swanson et al., 2010; Barry, 2012; Wees, 2013). While recovery options rotate within these three sites, variation, including mobile and mirrored sites, exists (Swanson et al., 2010). Mobile sites are movable and well-equipped with specific equipment to meet the system requirements. Mirrored sites are duplicating sites that automatically update information from the primary site, and hence technically resemble the current site (Swanson et al., 2010).
Recommended Testing Requirements
As mentioned earlier, a contingency plan with no proves of its functionality when required is as useless as when there was no plan (Harris & Grimalia, 2008). Adjustments are required to address the identified gaps in a simulated recovery phase, which is made possible by carrying out tests on the contingency plan (Swanson et al., 2010). Mitts, a researcher investigating the continuity and disaster plan in business, recommended seven tests necessary for a successful testing plan (Harris & Grimalia, 2008).
The first test is drills, which are conducted on particular assets in the business. Secondly, it is the Orientation Walk-through, which is a discussion purposed to introduce and orient the key personnel on the part or the whole contingency plan (CP). The third test is the Tabletop Walk-through, which is brainstorming sessions or exercises of the part or the whole CP (Wees, 2013). The fourth test is Live Walk-through, which entails execution of the CP like in the case of a real disaster. The fifth test is Parallel Test, which is carried out hand in hand with critical operations at the primary site to ensure accurate running at a recovery site. The sixth is the Simulation Test, which entails imaginary disruption that requires recovery using material at the off-sites. The seventh is Full Interruption Test, which includes the shutting down of normal operation to test for a recovery (Harris & Grimalia, 2008; Wees, 2013).
When selecting an alternative site for recovery, especially a hot site, the costs of information loss should be more than the costs of the site (Wees, 2013). It therefore means proper documentation of the cost on the contingency plan budgets. However, these are not the only costs important in the contingency planning process. Careful consideration of the personnel training costs and testing costs is needed. Likely costs include indirect costs, e.g. hours lost while testing the plan and training personnel, operational costs, and purchased material costs, needed in the testing phase. Moreover, there are resources required in preparing the staff for a disaster (training and exercise), e.g. hiring external sites, supplies, and outsourced vendors (Wees, 2013).
Recommendation for a 24-Months Cycle Business Contingency Testing Plan
Effective testing would require an even distribution of the months proposed. Therefore, I would recommend a two-months interval for each activity in the testing plan. The first two months (months 1 and 2) would be essential in testing the accuracy of the plan (plan review). Here, all the documented information and relevant items should be given to the key personnel, who will review and run them to check for accuracy. At the same time, training of non-key personnel should be done so that in case the key personnel are not available, the former can take over. For months 3 and 4, I would recommend testing the communication channel used in the notification of an emergency. This should be done through tabletop practices by running all on duty and off duty emergency contacts and updating, where necessary (Wees, 2013).
Months 5 and 6 should be utilized in testing the alternate procedures in the normal business processes. Here, the personnel in charge of operations should be instructed to conduct a Live Walk-through test using systems that the administrators have recovered backups. It should identify gaps in procedures, unbacked up data, or incorrect configurations that need to be updated. Months 7 and 8 should be utilized for system validation. This should be done by testing the operating capabilities of systems (reconstitution test). Here, management should interchange their personnel operation position of a specific timeframe and try to conduct normal tasks with the help of the continuity book. The alternate personnel will be essential substitution when key personnel are not available (Wees, 2013).
Months 9 and 10 should be utilized to update the continuity book as per the preceding test of months 7 and 8. This should be in readiness for a full drill test in months 11 and 12, where all levels of business operations will be tested. The drill should start by selecting a threat, then to apply all the testings from months 1 to 8 that provide a ‘green light’ to move to an alternative site, e.g. a hot site (Wees, 2013). At the new site, employees should attempt to resume and continue with operations while identifying gaps and update, accordingly. At the stage I would recommend notifying business partners and customers to avoid affecting its functionality. Since it is a full interruption test, the risk to operations is high and, therefore, it is recommended for businesses with a hot site. Finally, months 13 to 24 should duplicate the tests from months 1 to 12 using different procedures and time intervals or a direct duplicate depending on the assessment by the management. This should enhance the understanding of the plan and, hence, employee confidence (Wees, 2013).
Conclusion
Living in a world of threats and technological advancement, it is crucial for businesses to be prepared. The best way to do it is by having a business contingency plan, which is well documented and tested. A properly written contingency plan follows the following steps: developing a contingency planning policy, conducting a business impact analysis, identifying preventive controls, developing recovery strategies, developing a contingency plan, plan testing, training, exercises, and plan maintenance. Careful consideration of recovery options includes hot, cold, or warm sites. The final step is validation of the plan and alternate procedures by training personnel and plan testing from plan review through walk-through tests to full interruptive tests.