Additional Disaster Recovery Planning, Testing, and Training Are Needed for Data Communications

 

April 2004

 

Reference Number:  2004-20-079

 

 

This report has cleared the Treasury Inspector General for Tax Administration disclosure review process and information determined to be restricted from public release has been redacted from this document.

 

April 9, 2004

 

 

MEMORANDUM FOR CHIEF INFORMATION OFFICER

 

FROM:     Gordon C. Milbourn III /s/ Gordon C. Milbourn III

                 Acting Deputy Inspector General for Audit

 

SUBJECT:     Final Audit Report - Additional Disaster Recovery Planning, Testing, and Training Are Needed for Data Communications (Audit # 200320019)

 

This report presents the results of our review of the telecommunications disaster recovery strategy.  The overall objective of this review was to determine whether the Internal Revenue Service (IRS) developed and tested an effective telecommunications disaster recovery strategy.

To allow users and taxpayers fast and efficient access to applications and services, the IRS must have a robust, responsive telecommunications infrastructure that provides high-speed, high-availability network connectivity. The IRS Enterprise Networks organization is responsible for managing the design and engineering of the telecommunications environment, which includes approximately 181,500 network devices and 1,200 network connection addresses.

In summary, the IRS has implemented several measures to create a robust and resilient network architecture to support continuous data communications.  For example, it has made significant upgrades to its data communications network, including redundant connections and diverse data traffic routing for key facilities, and standardization and redundancy in network hardware.  The IRS has also taken additional measures at its facilities to reduce the vulnerability of the network, including off-premises storage of network documentation, network system backups, installation of an uninterruptible power supply, and identification and reduction of single points of failure within the network.  In addition, the Enterprise Networks organization has ongoing projects to evaluate its data communications network to improve and upgrade the infrastructure, while at the same time trying to reduce network operations costs.  However, additional actions could further improve the disaster recovery strategy for data communications.

While each of the four facilities we visited prepared a disaster recovery plan for data communications and stored the plan at its off-premises location, the plans did not contain all of the required components and sufficient training had not been conducted for the disaster recovery teams.  Inadequate disaster recovery plans and training for the disaster recovery personnel diminish the assurance that the IRS can rapidly recover data communications at a site in an emergency and that the disaster recovery activities can be conducted efficiently.  In addition, the plans had not been comprehensively exercised.  While the day-to-day operational measures taken by management and staff in response to daily data communications interruptions may diminish the need for testing system restoration, exercising the remaining plan elements, such as plan activation and team member notification and reporting procedures, would improve the site’s ability to recover timely.

Presidential Decision Directive 63, Critical Infrastructure Protection (CIP), dated May 1998, requires that each Federal Government department and agency prepare a plan for protecting its own critical infrastructure. The infrastructure includes systems essential to the minimum operations of the economy and the Federal Government, such as telecommunications, banking and finance, energy, and transportation. As part of its CIP Program, the IRS identified 19 critical assets, which included the data communications network. The IRS also completed a vulnerability assessment in November 2000 for each of the critical assets.  However, the IRS has not completed the disaster recovery planning and risk management activities for data communications, which could result in the inability of the IRS to timely restore critical data communications in the event of a disaster, potentially affecting the IRS’ ability to accomplish its mission and serve taxpayers.

Lastly, the IRS engaged a vendor to assess the old network, propose a new network design, and provide cost estimates for a new network.  The vendor concluded that the proposed design and configuration presented the least amount of complexity and cost while delivering the maximum level of capabilities and benefits, including alternate routing access and recovery. However, the IRS did not prepare a formal cost-benefit analysis which may have resulted in the IRS not selecting the most feasible or cost-effective data communications network design and recovery strategy that would support the needs of the business units.  In addition, our site survey results showed that a bi-directional ring connecting the Campus and Territory Office in Atlanta was not being used as advantageously as possible.  For example, the Territory Office currently does not use the bi-directional ring for routing its data traffic; instead, the data are being sent over a separate circuit.  By implementing a solution that would permit the Territory Office to shift its data traffic to the bi-directional ring, management could remove the circuit and realize potential cost savings of $315,000 over 5 years.

We recommended the Chief, Information Technology Services, ensure each site reviews the disaster recovery plan for completeness and accuracy quarterly or whenever significant changes occur to any plan element, periodically trains employees in their disaster recovery roles and responsibilities, and performs at least one exercise of each disaster recovery plan element annually.  In addition, we recommended the Chief, Information Technology Services, complete the additional disaster recovery and risk management measures outlined in the IRS’ CIP Program for the data communications network, ensure a cost-benefit analysis is prepared for projects redesigning the network architecture that result in a significant investment, and ensure the current IRS project tasked with optimizing the data communications network also assesses the use of the bi-directional rings.

Management’s Response:  IRS management agreed to the recommendations presented in the report.  Planned corrective actions include performing quarterly reviews of the disaster recovery plans, conducting yearly training sessions and disaster recovery tests, and identifying critical points of failure within the local area networks. Enterprise Networks organization management will include the names, responsible program areas, and contact numbers in site-specific disaster recovery plans.  All future risk assessments of the network(s) will be processed under the Treasury Communications Enterprise managed services contract.  In addition, Enterprise Networks organization management will develop a suite of business case and alternative analysis processes for evaluating significant investment projects and will include an evaluation of bi-directional rings when optimizing the data communications network.  Management’s complete response to the draft report is included as Appendix VII.

Copies of this report are also being sent to the IRS managers affected by the report recommendations. Please contact me at (202) 622-6510 if you have questions or Margaret E. Begg, Assistant Inspector General for Audit (Information Systems Programs), at (202) 622-8510.

 

Table of Contents

Background

Several Measures Have Been Taken to Deliver Uninterrupted Data Communications

Improved Site Disaster Recovery Plans and Increased Testing and Training Are Needed for Data Communications

Recommendation 1:

Recommendations 2 and 3:

The Data Communications Network Requires Additional Disaster Recovery and Risk Management Measures

Recommendation 4:

Significant Investments to Enhance Network Availability and Recovery Capability Should Require a Cost-Benefit Analysis

Recommendations 5 and 6:

Appendix I – Detailed Objective, Scope, and Methodology

Appendix II – Major Contributors to This Report

Appendix III – Report Distribution List

Appendix IV – Outcome Measures

Appendix V – Status of Additional Measures by Site to Ensure Uninterrupted Data Communications

Appendix VI – Status of Site Disaster Recovery Plans for Data Communications

Appendix VII – Management’s Response to the Draft Report

 

Background

One of the Internal Revenue Service’s (IRS) major strategies contained in the IRS Strategic Plan Fiscal Years 2000-2005 is to provide high-quality, efficient, and responsive information services.  This strategy includes building a robust, responsive telecommunications infrastructure that provides high-speed, high-availability network connectivity to allow users and taxpayers fast and efficient access to authorized IRS applications and services.  The IRS Enterprise Networks organization is responsible for managing the design and engineering of the telecommunications environment, which includes approximately 181,500 network devices and 1,200 network connection addresses.

To ensure network availability, controls should be implemented that are designed both to prevent interruptions and to promptly recover data communications service should unexpected events occur.  Business continuity planning is the process of establishing, testing, and maintaining policies, procedures, and physical resources to effect the timely resumption of critical business processes in the event of a disaster.  A key component of business continuity planning is disaster recovery planning, which is the advance planning and preparations from a technology aspect that are necessary to minimize loss and ensure continuity of the critical business functions.

In the IRS-Wide Business Continuity Planning – Case for Action, dated November 30, 2001, the IRS reported weaknesses in its ability to perform disaster recovery.  For example, the IRS reported that many of its business continuity plans were not tested and updated on a regular basis.  In December 2002, we reported that the IRS had made substantial progress in its business continuity program.  Activities initiated by the IRS included increasing the visibility and management oversight of business continuity issues, improving physical security at its offices, and developing plans to improve the recovery capability of its mainframe computers.  However, the General Accounting Office (GAO) reported in May 2003 that the IRS had not developed disaster recovery plans for certain key systems at some facilities and had not tested the plans at other facilities.  A disaster recovery plan defines the resources, actions, tasks, and data required to manage the restoration process for an application or system within the stated disaster recovery goals, thereby minimizing the effects of a major disruption.

This review was performed in the Enterprise Networks office at the IRS National Headquarters in New Carrollton, Maryland; the Tennessee Computing Center (TCC) in Memphis, Tennessee; the Martinsburg Computing Center (MCC) in Martinsburg, West Virginia; and the IRS Campus and Territory Office in Atlanta, Georgia, during the period September through December 2003.  The audit was conducted in accordance with Government Auditing Standards.  Detailed information on our audit objective, scope, and methodology is presented in Appendix I.  Major contributors to the report are listed in Appendix II.

Several Measures Have Been Taken to Deliver Uninterrupted Data Communications

Maintaining uninterrupted data communications is critical to the IRS to accomplish its mission of providing top-quality service to taxpayers.  As a result, the IRS has implemented several measures to create a robust and resilient network architecture to support continuous data communications.  As reflected in the Data Communications Utility (DCU) Network Border Router Configuration and Redundancy Design, dated April 2000, and the Infrastructure Architecture Modernization Assessment, dated February 2002, the IRS has made significant upgrades to its network including:

·        Implementation of Asynchronous Transfer Mode (ATM) as the backbone transport.

·        Redundant connections between IRS campuses and computing centers.

·        The use of bi-directional ring topology and microwave to provide diverse and redundant data traffic routing for the computing centers.

·        Standardization and redundancy in network hardware at each of the border router locations.

The results of our site visits to four IRS facilities also reflected that additional measures were being taken to reduce the vulnerability of the data communications network at those sites. Detailed information on our site visits is presented in Appendix V.  These measures included off-premises storage of network documentation, network system backups, installation of an uninterruptible power supply, and identification and reduction of single points of failure within the network.  In addition, the sites maintained some spare parts for network equipment and had service level agreements with vendors for repairs.  The Enterprise Networks organization also has ongoing projects to evaluate its data communications network to improve and upgrade the infrastructure, while at the same time trying to reduce network operations costs.

Improved Site Disaster Recovery Plans and Increased Testing and Training Are Needed for Data Communications  

Office of Management and Budget (OMB) Circular A-130, Security of Federal Automated Information Resources, requires that agency plans assure they can recover and provide sufficient service to meet the minimal user needs of the system in the event of a disaster.  Disaster recovery is the ability to respond to an interruption in services by implementing a disaster recovery plan to restore an organization’s critical business functions. The IRS Internal Revenue Manual (IRM) contains specific requirements for developing a disaster recovery plan for all mission critical systems at each facility.  The IRS has also developed a disaster recovery plan template to assist site management in the development of their respective plans. 

Major components of a site’s disaster recovery plan for data communications should include an overview of the disaster recovery strategy, recovery team information, notification procedures, network/circuit diagrams, hardware and software inventory, system backup requirements, off-premises storage information, and a telephone listing of external contacts such as vendors and suppliers.  The disaster recovery plan should also contain recovery priorities and step-by-step restoration procedures to prevent difficulty or confusion in an emergency.  The IRM stipulates that each site store a complete copy of the plan in both magnetic media and hard copy at the off-premises storage facility for that site.

The IRM also contains requirements for maintaining and testing the disaster recovery plans to assure the system can be recovered in a timely manner.  To be effective, the plan must be reviewed and updated regularly since frequent changes can occur with the names and contact information for team members and with system requirements and procedures as a result of shifting business needs and technology upgrades.  Therefore, the IRM requires that the plan be reviewed quarterly, tested annually, and updated as needed to provide for the reasonable restoration of operations.  According to the National Institute of Standards and Technology (NIST), testing of the disaster recovery plan should include exercising each plan element to identify planning gaps and address plan deficiencies, thereby improving plan effectiveness and overall agency preparedness.  The disaster recovery personnel should also be trained at least annually to prepare them to execute their respective recovery procedures during plan activation.

As illustrated in Exhibit 1, a review of the disaster recovery plans and preparedness activities for data communications at four IRS facilities identified areas where improvements are needed.  Detailed information on our review of the sites’ disaster recovery plans is contained in Appendix VI.  

Exhibit 1:  Status of Disaster Recovery Activities
 for Data Communications

IRS Facility

Plan
Prepared

Plan Complete

Plan Stored Offsite

Comprehensive Exercise of
the Plan

Sufficient Training Conducted

MCC

Yes

No

Yes

No

Yes

TCC

Yes

No

Yes

No

No

Atlanta Campus

Yes

No

Yes

No

Yes

Atlanta Territory Office

Yes

No

Yes

No

No

Source:  The Treasury Inspector General for Tax Administration’s review of site disaster recovery plans and discussions with management using requirements contained in NIST and IRS guidelines.

While each facility prepared a disaster recovery plan for data communications and stored it at its off-premises location, the plans did not contain all of the required components.  In addition, the plans had not been comprehensively exercised and sufficient training had not been conducted for the disaster recovery teams.

The disaster recovery plans require additional information

The disaster recovery plans prepared by each site for data communications contained many of the required components.  In general, the disaster recovery plans contained an overview of the recovery strategy, recovery team member names and telephone numbers, recovery team responsibilities, notification procedures, contact information for vendors and suppliers, network/circuit diagrams, system backup requirements, and off-premises storage information.  However, most of the plans did not contain the following information required by NIST and IRS guidelines:

·        Recovery priorities and step-by-step restoration procedures.

·        An inventory of hardware and software.

·        A listing of Internet Protocol (IP) addresses and circuits.

·        A record of updates to the plan.

While management at the sites we visited did maintain an inventory of hardware and a listing of IP addresses, and stored this information at their off-premises locations, they did not include this information as part of their disaster recovery plans.  Inadequate disaster recovery plans diminish the assurance that the IRS can rapidly recover data communications at a site in an emergency and that the disaster recovery activities can be conducted efficiently.  Management did not develop adequate disaster recovery plans because they were uncertain about exactly what information should have been included in the plans.

Additional testing of the plans and training of the disaster recovery personnel are needed

Each of the sites had not completed a comprehensive exercise of its disaster recovery plan for data communications.  Management explained that the recovery of failed data communications devices is a day-to-day operational issue.  While sites may not specifically document disaster recovery testing, they exercise their disaster recovery capabilities throughout the year in response to incidents, including the restoration of routers.  Management also performs tests by annually powering off and restoring equipment and by participating in the disaster recovery exercises of other systems (e.g., mainframe computers).  In addition, management attributed the absence of a formal disaster recovery test for data communications to their concern for disrupting operations.

According to NIST guidelines, a disaster recovery test should include exercising each plan element, such as plan activation, team member notification and reporting procedures, and system restoration from backup media.  While the day-to-day operational measures taken by management and staff in response to daily data communications interruptions may diminish the need for testing system restoration, exercising the remaining plan elements would improve the site’s ability to recover timely.  To obtain the most benefit from disaster recovery testing, the test plan should contain detailed information, including the scenario, test elements, evaluation criteria, and time periods.  The results of the test should be documented and lessons learned identified to improve plan effectiveness.

Training was inadequate for the disaster recovery personnel because management was unsure what the training should entail for their disaster recovery teams.  According to the NIST, recovery personnel should be trained at least annually on the following elements:

·        Purpose of the plan.

·        Cross-team coordination and communication.

·        Reporting procedures and security requirements.

·        Team-specific processes and individual responsibilities.

The goal of disaster recovery training should be to train the disaster recovery personnel to the extent that they are able to execute initial recovery procedures without aid of the actual document, since a paper or electronic version of the plan may be unavailable for the first few hours as a result of the disaster.

Recommendations

The Chief, Information Technology Services, should ensure each site:

1.      Reviews the disaster recovery plan for completeness and accuracy quarterly or whenever significant changes occur to any plan element.

Management’s Response:  The MCC and TCC developed a process to perform quarterly reviews.  The first review will be completed by April 1, 2004.  The Atlanta Territory Manager implemented a controlled response process to ensure the disaster recovery plan was reviewed.  The responses are due March 31, June 30, September 30, and December 31 requiring verification that each team has met and their respective disaster recovery plans have been reviewed for accuracy.  A Plan Changes or Reviews sheet has been added to the plans to document all changes to and reviews of the plans.

2.      Periodically trains employees in their disaster recovery roles and responsibilities.

Management’s Response:  Both the MCC and TCC will conduct yearly training sessions beginning in September 2004 during the preplanning phase for this year’s disaster recovery exercise.  The Atlanta Territory Manager will ensure the Telecommunications organization conducts an independent biannual disaster recovery table exercise and documents it in the plan.

3.      Performs at least one exercise of each disaster recovery plan element annually.

Management’s Response:  Testing at the MCC and TCC is conducted more frequently than on an annual basis.  This includes participation in disaster recovery of other systems (e.g., mainframe disaster recovery exercise).  Testing for this calendar year will be conducted by December 1, 2004.  The Atlanta Campus and Atlanta Territory Manager will coordinate with the Mission Assurance Office to ensure annual disaster recovery testing is conducted.

The Data Communications Network Requires Additional Disaster Recovery and Risk Management Measures

Presidential Decision Directive (PDD) 63, Critical Infrastructure Protection (CIP), dated May 1998, calls for a national effort to assure the security of the nation’s critical infrastructure.  The infrastructure includes systems essential to the minimum operations of the economy and Federal Government, such as telecommunications, banking and finance, energy, and transportation.  PDD 63 also requires that each Federal Government department and agency prepare a plan for protecting its own critical infrastructure. Executive Order 13231, Critical Infrastructure Protection in the Information Age, issued October 2001, reaffirms the need to continually take actions to secure information systems, emergency preparedness communications, and physical assets.

The Department of the Treasury Critical Infrastructure Protection Plan (TCIPP), dated August 30, 2002, stipulated that each departmental office and bureau is responsible for identifying the critical assets under its control, assessing the vulnerabilities of those assets, and assuring their availability, integrity, confidentiality, survivability, and adequacy. According to the TCIPP, critical infrastructure would include the physical and cyber assets that support critical missions.  Physical assets include the facilities providing service to the public, while cyber assets include networks, computers, applications, data, and information.  Each departmental office and bureau is also required to develop its own CIP Management Plan addressing governance, risk management, critical asset management, threat assessment, vulnerability/risk assessment, disaster recovery planning and management, incident reporting and handling, and training and awareness.

In February 2003, we reported that, while the IRS had not yet completed its CIP Management Plan, it had taken significant steps in protecting its critical assets.  Some of the required activities identified in the IRS’ draft CIP Management Plan included: 

·        Critical asset identification.

·        Vulnerability assessment.

·        Disaster recovery planning.

·        Risk management.

As part of its CIP Program, the IRS identified 19 critical assets, which included the data communications network.  The IRS also completed a vulnerability assessment in November 2000 for each of the critical assets.  However, the IRS has not completed the disaster recovery planning and risk management activities for data communications, which could result in the inability of the IRS to timely restore critical data communications in the event of a disaster, potentially affecting the IRS’ ability to accomplish its mission and serve taxpayers.

According to the draft CIP Management Plan, critical asset owners shall ensure that disaster recovery plans cover their critical assets and that those plans appropriately prioritize actions with respect to those critical assets.  For data communications, the disaster recovery plan should address the compromise or incapacitation of the critical asset as a result of physical or cyber attacks as well as natural disasters.  Critical asset owners were also required to develop and maintain a risk management plan.  Risk management encompasses those activities taken to identify, control, and reduce risks.  The risk management plan should be reviewed and revised annually or more frequently in response to changes in the assessed risk. 

IRS management explained that a disaster recovery plan and risk management plan were not developed for the data communications network because they were notified by the Department of the Treasury that critical assets were going to be reidentified by the National Critical Infrastructure Assurance Office.  However, the IRS has not received any updated listing of its critical assets.  The CIP Program efforts have also stalled to some extent since the stand-up of the Department of Homeland Security (DHS), which resulted in the former Department of the Treasury’s Critical Infrastructure Protection Officer transferring to the DHS.

Recommendation

4.      The Chief, Information Technology Services, should complete the additional disaster recovery and risk management measures outlined in the IRS’ CIP Program for the data communications network.

Management’s Response:  The Enterprise Networks organization will partner with the End User Equipment and Services organization to identify critical points of failure within the IRS’ local area networks. The Enterprise Networks organization will also provide the names, responsible program areas, and contact number of its management team to be included in site-specific disaster recovery plans. 

As the Treasury Communications System will soon be replaced with the Treasury Communications Enterprise (TCE) managed services contract, all future risk assessments of the wide or local area network(s) should be processed under the TCE umbrella.  The Enterprise Networks organization will begin transitioning to the TCE in Fiscal Year 2005.

Significant Investments to Enhance Network Availability and Recovery Capability Should Require a Cost-Benefit Analysis

OMB Circular A-130 requires that agencies take cost-effective steps to manage any disruption of service in the event of a disaster.  In addition, the Clinger-Cohen Act of 1996 (also referred to as the Information Technology Management Reform Act) requires each Federal Government agency to establish effective and efficient capital planning processes for selecting, managing, and evaluating the results of all its major investments in information systems.

According to the NIST, agencies should perform a cost-benefit analysis to identify the optimum recovery strategy.  The cost-benefit analysis should include the following for each alternative considered:  

·        Assumptions and constraints of the business need/problem.

·        A description of the alternative being considered.

·        The benefits and costs on a full life-cycle basis.

·        A risk analysis that addresses both technical and organizational risk.

In April 2000, a team of IRS network engineers and contracted consultants prepared the proposal for the IRS’ current ATM/Frame Relay data communications network.  The network topology in Exhibit 2 shows the hierarchal ATM network design for connections among the computing centers, campuses, and Territory Offices.  The posts-of-duty have Frame Relay connectivity to the Territory Offices. 

Exhibit 2:  Network Topology

Exhibit 2 was removed due to its size.  To see Exhibit 2, please go to the Adobe PDF version of the report on the TIGTA Public Web Page.

 

The goal was to design a consistent, highly available system architecture that could be scaled to meet the current and future requirements.  As illustrated in Exhibit 3, the design provided for standardization of the border router configuration within the IRS network and redundancy at each of the border router locations.  The switches are paired with the border routers to avoid single points of failure and to provide more than one access point into the ATM service provider.

Exhibit 3:  Network Border Router Configuration

Exhibit 3 was removed due to its size.  To see Exhibit 3, please go to the Adobe PDF version of the report on the TIGTA Public Web Page.

 

The design provided for the capability that, in the event of a failure in the primary or secondary communication path, the unaffected path would provide alternate routing access and recovery.  While the vendor concluded that the proposed design and configuration presented the least amount of complexity and cost while delivering the maximum level of capabilities and benefits, the IRS did not prepare a formal cost-benefit analysis. Instead, the IRS engaged the vendor to assess the old network, propose a new network design, and provide cost estimates for the new ATM/Frame Relay network. 

The proposed ATM/Frame Relay data communications network was estimated to cost $4.9 million and was largely comprised of the vendor’s products and equipment. Not conducting a formal cost-benefit analysis may have resulted in the IRS not selecting the most feasible or cost-effective data communications network design and recovery strategy that would support the needs of the business units. IRS management explained that an immediate and significant upgrade to the data communications network was necessary at the time and that the absence of a cost-benefit analysis occurred primarily because they did not consider the redesign of the network to be a separate information technology investment project. 

IRS management recognizes that, while there is a strong argument in favor of ease of operations and management to use a single vendor environment, it hinders their ability to leverage the IRS’ purchasing power.  In fact, the Enterprise Networks organization is actively assessing its data communications network to implement improvements while reducing operational costs.  For example, a current IRS project is tasked with optimizing the data communications network since it was based on the IRS’ organizational structure prior to the reorganization, which has resulted in architectural inefficiencies and operational issues. 

One of the effectiveness measures identified by the project is to identify potential cost savings opportunities (e.g., reduced hardware, circuits, etc.).  This effort should also include assessing the use of the bi-directional rings that provide diverse traffic routing at some IRS locations.  Our site survey results showed that a bi-directional ring connecting the Campus and Territory Office in Atlanta was not being used as advantageously as possible.  For example, the Territory Office currently does not use the bi-directional ring for routing its data traffic; instead, the data are being sent over a separate circuit.  By implementing a solution that would permit the Territory Office to shift its data traffic to the bi-directional ring, management could remove the circuit and realize potential cost savings of $315,000 over 5 years.

Recommendations

The Chief, Information Technology Services, should ensure:

5.      A cost-benefit analysis is prepared for projects redesigning the network architecture that result in a significant investment.

Management’s Response:  The Enterprise Networks organization will develop a suite of business case and alternative analysis processes for evaluating significant investment projects, which will be used as a critical decision factor in all recommendations and approvals.

6.      The current IRS project tasked with optimizing the data communications network also assesses the use of the bi-directional rings.

Management’s Response:  The Engineering Branch of the Enterprise Networks organization will include the use and evaluation of bi-directional rings when optimizing the data communications network.

 

Appendix I

 

Detailed Objective, Scope, and Methodology

 

The overall objective of this review was to determine whether the Internal Revenue Service (IRS) developed and tested an effective telecommunications disaster recovery strategy. To accomplish this objective, we:

I.                    Reviewed the policies and procedures for completing a cost-benefit analysis during the development of a disaster recovery strategy to ensure redundancy and resiliency in the data communications architecture.  We interviewed management and reviewed studies and analyses completed to establish the recommended disaster recovery strategy to determine whether a cost-benefit analysis was used to select the most efficient disaster recovery option.  We also reviewed the IRS’ network topology to determine if the selected strategy was incorporated into the current data communications architecture.

II.                 Reviewed the policies and procedures for developing and updating disaster recovery plans.  We interviewed management at the visited sites about the preparation of a disaster recovery plan for telecommunications and about the effectiveness and efficiency of the current disaster recovery architecture.  We also reviewed the disaster recovery plans at the visited sites to determine their adequacy and completeness for prompt recovery of data communications in the event of a disaster.  In addition, we determined if measures were implemented to ensure uninterrupted telecommunications and reviewed the network topology to assess whether single points of failure had been sufficiently eliminated.

III.               Reviewed the policies and procedures for conducting disaster recovery tests and evaluating test results.  In addition, we reviewed the disaster recovery test plans, test results, and test schedule at each site to identify the extent to which the disaster recovery capabilities for telecommunications were tested and whether identified deficiencies have been adequately addressed.  At each site, we also identified training provided to the telecommunications disaster recovery staff related to their disaster recovery responsibilities.

IV.              Reviewed the policies and procedures for the Critical Infrastructure Protection (CIP) Program to determine what additional actions the IRS requires for its critical assets. In addition, we interviewed management and reviewed documents prepared by the IRS to meet CIP Program requirements related to telecommunications.

 

Appendix II

 

Major Contributors to This Report

 

Margaret E. Begg, Assistant Inspector General for Audit (Information Systems Programs)

Gary Hinkle, Director

Danny Verneuille, Audit Manager

Paul Mitchell, Senior Auditor

Van Warmke, Senior Auditor

Olivia Jasper, Auditor

Linda Screws, Auditor

 

Appendix III

 

Report Distribution List

 

Commissioner  C

Office of the Commissioner – Attn:  Chief of Staff  C

Deputy Commissioner for Operations Support  OS

Chief, Information Technology Services OS:CIO:I

Director, End User Equipment and Services OS:CIO:I:EU

Director, Enterprise Networks  OS:CIO:I:EN

Acting Director, Portfolio Management  OS:CIO:R:PM

Chief Counsel CC

National Taxpayer Advocate  TA

Director, Office of Legislative Affairs  CL:LA

Director, Office of Program Evaluation and Risk Analysis  RAS:O

Office of Management Controls  OS:CFO:AR:M

Audit Liaisons:

        Chief, Information Technology Services  OS:CIO:I

        Director, End User Equipment and Services  OS:CIO:I:EU

        Director, Enterprise Networks  OS:CIO:I:EN

        Manager, Program Oversight and Coordination  OS:CIO:R:PM:PO

 

Appendix IV

 

Outcome Measures

 

This appendix presents detailed information on the measurable impact that our recommended corrective actions will have on tax administration.  This benefit will be incorporated into our Semiannual Report to the Congress.

Type and Value of Outcome Measure:

·        Cost Savings, Funds Put to Better Use – Potential; $315,000  (see page 11).

Methodology Used to Measure the Reported Benefit:

We reviewed the use of the bi-directional ring connecting the Campus and Territory Office in Atlanta, Georgia.  We determined that by shifting the Territory Office’s data traffic to the bi-directional ring, management could remove the current circuit for data traffic and realize potential cost savings of $315,000 over 5 years. 

 

Description

Amount

Estimated average current monthly recurring charge of circuit used for data traffic at the Territory Office.

$5,700

Estimated monthly recurring charge for using the bi-directional ring.

<$450>

Estimated monthly savings by shifting the data traffic to the bi-directional ring. 

$5,250

Estimated 5-year savings ($5,250 * 12 months * 5 years).

$315,000

 

Appendix V

 

Status of Additional Measures by Site to Ensure Uninterrupted Data Communications

 

Checks represent those measures implemented at the site.

Measure

Comments

Martinsburg Computing Center

Tennessee Computing Center

Atlanta Campus

Atlanta Territory Office

1.       Risk Assessment

All sites had risk assessments completed on their networks within the last 3 years.

ü

ü

ü

ü

2.       Backup Power Source

Each of the sites had an uninterruptible power supply device and generator.

ü

ü

ü

ü

3.       Multiple Demarcation Points

The Martinsburg Computing Center consisted of two buildings.  Each building had a demarcation point, and there was a separate fiber cable connecting the two buildings to provide redundancy.

ü

 

 

 

4.       Spare Parts Inventory

All sites maintained some spare parts for repairs.

ü

ü

ü

ü

5.       Service Level Agreements With Vendors

All sites had a service level agreement with vendors for repairs.

ü

ü

ü

ü

6.       Redundant Circuits

All sites had redundant circuits for network connectivity.

ü

ü

ü

ü

7.       Network Diversity

All sites used bi-directional ring topology or microwave to provide network diversity.

ü

ü

ü

ü

 

Checks represent those measures implemented at the site.

Measure

Comments

Martinsburg Computing Center

Tennessee Computing Center

Atlanta Campus

Atlanta Territory Office

8.       Multiple Carriers

Except for the Martinsburg Computing Center, all sites had only one local carrier for their data communications circuits.

ü

 

 

 

9.       System Backups

All sites were backing up critical files and storing them at their
off-premises location.

ü

ü

ü

ü

10.    Off-premises Storage of Documentation

All sites stored system recovery documentation at their
off-premises location.

ü

ü

ü

ü

Source:  The Treasury Inspector General for Tax Administration’s review of Internal Revenue Service documents and management discussions.

 

Appendix VI

 

Status of Site Disaster Recovery Plans for Data Communications

 

Checks represent those items contained in the site’s disaster recovery plan.

Plan Requirement and Description

Martinsburg Computing Center

Tennessee Computing Center

Atlanta Campus

Atlanta Territory Office

1.       Recovery Strategy Overview – A description of the methods that provide recovery capability over the full spectrum of incidents.

ü

ü

ü

ü

2.       Recovery Team Information – The name, role, and telephone number for the recovery team leaders and members.

ü

ü

ü

ü

3.       Notification Procedures – A description of the methods used to notify recovery personnel during business and nonbusiness hours.

ü

ü

ü

ü

4.       Recovery Team Responsibilities – An overview of team member roles and responsibilities in a contingency situation.

ü

ü

ü

ü

5.        Recovery Priorities – A prioritized sequence of recovery activities based upon the business impact analysis.

 

 

 

ü

6.       Restoration Procedures – Step-by-step procedures in sequential order to restore data communications.

 

 

 

 

7.       Vendor and Supplier InformationThe name, address, and telephone number of telecommunications vendors and suppliers.

ü

 

 

ü

8.       Critical Telephone ListThe name and telephone number of other critical personnel that may be needed during the recovery process.

ü

ü

ü

ü

9.       Network/Circuit DiagramsHigh- and low-level topologies that depict the interconnectivity between networks.

ü

 

 

ü

 

Checks represent those items contained in the site’s disaster recovery plan.

Plan Requirement and Description

Martinsburg Computing Center

Tennessee Computing Center

Atlanta Campus

Atlanta Territory Office

10.     Hardware and Software Inventory – A listing of physical hardware (i.e., circuits, routers, and switches) and computer software.

 

 

 

 

11.    System Backup Requirements File backup frequency and rotation schedule for critical files stored at the off-premises facility. 

 

ü

 

ü

12.    Listing of Internet Protocol (IP) Addresses and CircuitsA listing of the IP addresses and circuits for both the facility and other supported sites.

ü

 

 

 

13.    Off-premises Storage InformationThe name, address, and telephone number of the off-premises storage facility.

 

ü

 

ü

14.    Record of Changes – A record of plan modifications that includes the page number, change comment, and date of change.

 

 

 

 

Source:  The National Institute of Standards and Technology Special Publication 800-34, Contingency Planning Guide for Information Technology Systems, the Internal Revenue Service’s Internal Revenue Manual and Disaster Recovery Plan Template, and the Treasury Inspector General for Tax Administration’s review of site disaster recovery plans.

 

Appendix VII

 

Management’s Response to the Draft Report

 

The response was removed due to its size.  To see the response, please go to the Adobe PDF version of the report on the TIGTA Public Web Page.