Business Continuity and Disaster Recovery Plan
Abstract
The terms business continuity and disaster recovery plans are distinct, but tend to overlap in their scope. In this regard, most experts consider the two terms concurrently. However, this discourse will first set out the two definitions, highlight the areas of overlap, and give detailed description of the major areas of concern. The paper will also reflect on the key measures that a company with e-Commerce and IT-related operations can initiate in order to cope up with the disasters or other unforeseen circumstances.
Key words: business continuity, disaster recovery
Introduction
The terms business continuity and disaster recovery plans are distinct, but overlap in their scope. Other than the overlap, there is a general recognition that business management, and the IT experts need to work collaboratively in the design of the necessary procedures regarding business continuity and recovery procedures. Consequently, most discourses consider the two terms simultaneously (Kurt & Douglas, 2011). Generally, disaster recovery refers to certain procedures taken by firms to ensure that operations resume to normalcy in an event of natural disasters or emergencies. In the field of Information Technology, these steps may include restoration of servers, mainframes, and local area networks among other vital services that businesses need.
Business continuity, similarly, refers to the processes, and steps organizations take in order to ensure that vital aspects of their operations continue normally during, and after a disaster. The measures that constitute business continuity prevent the disruption of critical operations, and ensure swift and smooth resumption of normal operations (Kurt & Douglas, 2011). Business continuity, in contrast to disaster recovery, cover more comprehensive plans that are long term and may include events such as staffing issues, supply chain failures, and system malfunctions.
A series of events have led to the development of business continuity and disaster recovery into a complete discipline. This discourse will examine the details of business continuity and disaster recovery issues and the possible solutions that IT firms may adopt.
Business Continuity and Disaster Recovery Issues
Most firms today derive their strength from the value of data and the online communication platform. The data in the firm contains varied information ranging from employee profiles to normal daily transactions. Similarly online communication offers a new platform for sales and marketing of products (Michael & Lawrence, 2011). These areas are so important to any business that their interruption, even for minutes, can be very costly. With the increase in reliance on online sales and marketing platform, stored data, increased customer base, the effects of natural disasters, criminal activity, and unreliable suppliers, the companies must plan for business continuity. However, continuity plans address global nature of most businesses, supply systems, speed, the increased value of data, and IT-dependent operations. Such issues make businesses vulnerable to cases of violence at the work place, terrorist activities, unreliable workforce, computer and cybercrimes, power outages, communication failures, system malfunction, and human negligence.
Business Continuity and Recovery Solutions
The most critical issue to any business is to ensure that the systems are ready, available, and in continuous operation. There are three key aspects of continuity: resilience, recovery, and contingency. Whereas Resiliency implies that businesses should survive disasters, recovery deals with restoration of critical functions (Michael & Lawrence, 2011). However, contingency details separate plans for continued operation incase of failure to recover or survive a disaster. Firms should realistically achieve these goals even in the face of power outages, disk crashes, maintenance routines, and communication failures. The identification of the key areas requiring continuity entails the application of business impact analysis (p.23). A number of areas call for several measures as outlined below.
Operations Recovery
These processes address issues related to loss of functionality due to the failure of equipment. Methods such as redundant array of inexpensive disks and power backups can solve these problems. However, hardware failures are vital, and require attention. Considerations like mean time between failure and mean time to repair are crucial factors for an IT firm while purchasing machines (Michael & Lawrence, 2011). The idea of fault tolerance is also applicable at a server level. Clustering of servers makes end users view the servers as a unit, yet if one server fails, the remaining servers can pick the load, and the operations continue. Similarly, at a drive level, fault tolerance uses RAID, which breaks up data, and writes on several disks. It is possible to remove faulty disks without interrupting the operations when using RAID.
Data and Information Recovery
The methods available here include storage in offsite areas, backups, and remote journaling. These methods uses backup media such as disks, tapes, and cassette for information storage incase of failure (p.291). Some other technology such as MAID (massive array of inactive disks), can stores data for applications to use in multiple ways. In addition to the devices, firms should implement the backup pattern that is desirable (Michael & Lawrence, 2011). These may include full backup system (backups everything), differential backup (partial backup), incremental backup (backups only additional data), and continuous backup.
Backup should be located in very secure places offsite, and preferably, managed efficiently for easy retrieval and usage. However, backups are never fully secure, and at some point may fail. To avoid this scenario, the use of tape rotation methods is vital (p.315). These techniques can be simple or complex such as the Tower of Hanoi method. However, they techniques ensure that the tapes are fully functional all the time through the backup period.
Other data backup methods include shadowing of the database, electronic vaulting, remote journaling, and storage area network. Firstly, database shadowing uses two disks to write data. It has the advantage of being fast (Michael & Lawrence, 2011). Secondly, the vaulting method copies data to a secure location. It has the advantage of copying all the existing records. Thirdly, remote journaling offers continuous data synchronization with the offsite backup system alongside the primary storage onsite. Lastly, storage area network is capable of supporting disks, mirroring, restoration, retrieval, and archiving of data from one device to the other.
Communications Recovery
The loss of vital equipments can also stall communication processes. Protecting the communication lines with measures that are fault tolerant is very critical. The measures available include redundant WAN linkages, alternate routing, and diverse routing (p.267). Diverse routing involves channeling traffic through different cable lines, but this method is not cheap. The second method involves alternate routing (Michael & Lawrence, 2011). The use of redundant channel offer another line of communication incase of failure of the first channel. These alternative methods are cell phones instead of landlines, or microwave channel instead of fiber optics. Other relevant techniques include last mile channel protection, free space optics, and recovery of voice communication.
Facility and Supply Recovery
Disasters such as fire, hurricanes can affect businesses by damaging the physical premises. This leads to loss of transportation, buildings, and communication equipment. An IT firm can seek alternative sites for their operations (Michael & Lawrence, 2011). Alternatively, some firms can take advantage of their offsite premises or agree with other firms for a temporary space. Moreover, the firm may also set a temporary prefab to accommodate its operation. The sites available can range from cold, warm, or hot depending on the ability and time required to make them operational. Hot sites can be operational in hours and suitable for critical services. This contrasts with cold sites that may take even days to be operational (p.335). Cold sites are suitable for non-critical services.
User Recovery
In most firms, the population of workforce is huge and present in the premises all at once. This creates difficulties for changeovers or replacement and this affect business continuity. Incase of failure or injury, the standby personnel should be able to pick up immediately to eliminate failures (Michael & Lawrence, 2011). However, in the absence of space, computers, means of transport, standby workforce, and space, user recovery ceases to be important.
Conclusion
The outline above represents the major business continuity and recovery issues along with their corresponding solutions that suits a firm with onsite data, e-Commerce, online sales, and marketing.
References
Kurt, J., & Douglas, M. (2011). Business continuity and risk management: essentials of organizational resilience. New York, NY: Rothstein Associates.
Michael, W., & Lawrence, W. (2011). The disaster recovery handbook: a step-by-step plan to ensure business continuity and protect vital operations, facilities, and assets. New York, NY: AMACOM.