Service Operations Command Center Management Can Do More to Benefit From Implementing the Information Technology Infrastructure Library
August 16, 2011
Reference Number:
2011-20-078
This report has cleared the Treasury
Inspector General for Tax Administration disclosure review process and
information determined to be restricted from public release has been redacted
from this document.
Phone
Number | 202-622-6500
Email Address | TIGTACommunications@tigta.treas.gov
Web Site |
http://www.tigta.gov
HIGHLIGHTS
SERVICE OPERATIONS COMMAND CENTER MANAGEMENT
CAN DO MORE TO BENEFIT FROM IMPLEMENTING THE INFORMATION TECHNOLOGY INFRASTRUCTURE
LIBRARY
Highlights
Final
Report Issued on August 16, 2011
Highlights of Reference Number:
2011-20-078 to the Internal Revenue Service Chief Technology Officer.
IMPACT ON TAXPAYERS
The Enterprise Operations organization Service
Operations Command Center Branch (SOCCB) exists in part to ensure that normal information
technology service operations are maintained for servers and mainframes by
using the three Information Technology Infrastructure Library® (ITIL) processes of Event Management,
Incident Management, and Problem Management.
If the
SOCCB does not effectively implement these ITIL best practices, service outages may not be addressed efficiently
and the Internal Revenue Service (IRS) will not be effectively utilizing
taxpayer resources.
WHY TIGTA DID THE AUDIT
The
overall objective of this review was to determine whether the SOCCB has
effectively implemented ITIL best practices to ensure service delivery
management for Enterprise Operations organization products and services.
WHAT TIGTA FOUND
SOCCB
management incorporated Event Management, Incident Management, and Problem
Management best practices into SOCCB policies and procedures and
daily operations. In addition, personnel resolved the majority of incident
tickets within the required time periods.
TIGTA analyzed the 312 Fiscal Year 2010
incident tickets worked by the systems administrators and computer support
specialists and determined that 145 (46 percent) of the tickets pertained
to three systems. Most of the incidents
occurred because of problems with software and were resolved by performing a
system reboot or stop/restart. TIGTA
determined that the SOCCB needs to examine incident reports to identify trends
within the information technology infrastructure, making its Problem Management
activities proactive.
TIGTA also determined that SOCCB management did not conduct a baseline assessment of SOCCB staffing and workload and does not have a documented strategic plan to communicate its goals and priorities with milestone and target dates. In addition, the current performance measures do not address whether work is performed efficiently and effectively.
WHAT TIGTA RECOMMENDED
TIGTA recommended that the Associate
Chief Information Officer, Enterprise Operations, ensure that SOCCB management revises
SOCCB procedures to address ticket trending, perform a staffing and workload
analysis of the SOCCB, update the Enterprise Operations
organization’s strategic plan whenever a SOCCB ITIL best practice is required
to support the goals or objectives of the organization, ensure development and
execution of a training plan, and identify and implement additional ITIL
performance measures.
In their response to the report, IRS officials
agreed with all of the recommendations. The IRS plans to revise procedures to account for trending
activities of its incident tickets, perform a staffing and workload analysis,
update the Enterprise Operations strategic plan for Problem Management, develop
and execute a training plan, and identify and implement performance measures.
August 16, 2011
MEMORANDUM FOR CHIEF TECHNOLOGY OFFICER
FROM: Michael R. Phillips /s/ Michael R. Phillips
Deputy Inspector General for Audit
SUBJECT: Final Audit Report – Service Operations Command Center Management Can Do More to Benefit From Implementing the Information Technology Infrastructure Library (Audit # 201020006)
This report presents the results of our review to determine whether the Service Operations Command
Center Branch has effectively implemented Information Technology Infrastructure
Library® best
practices to ensure service delivery management for Enterprise
Operations products and services. This review was included in our Fiscal Year
2010 Annual Audit Plan and addresses the major management challenge of
Modernization of the Internal Revenue Service.
Management’s complete response to the draft report is included as Appendix V.
Copies of
this report are also being sent to the IRS managers affected by the report
recommendations. Please contact me at
(202) 622-6510 if you have questions or Alan R. Duncan, Assistant Inspector
General for Audit (Security and Information Technology Services), at (202)
622-5894.
Appendices
Appendix
I – Detailed Objective, Scope, and Methodology
Appendix
II – Major Contributors to This Report
Appendix
III – Report Distribution List
Appendix
IV – Glossary of Terms
Appendix
V – Management’s Response to the Draft Report
Abbreviations
|
FY |
Fiscal Year |
|
IRS |
Internal Revenue Service |
|
ITAMS |
Information Technology Assets
Management System |
|
ITIL |
Information Technology Infrastructure Library® |
|
MITS |
Modernization and Information
Technology Services |
|
RCA |
Root Cause Analysis |
|
SOCCB |
Service Operations Command Center
Branch |
The Enterprise
Operations organization supports the Modernization and Information Technology
Services (MITS) organization by providing efficient, cost effective, secure,
and highly reliable computing (mainframe and server) services for all Internal
Revenue Service (IRS) business entities and taxpayers. The Enterprise Operations organization has
seven organizations that work to fulfill this mission. One of the organizations, the Enterprise
Computing Center,[1] is responsible for
providing support for the systems used to receive and process tax returns and
payments, all infrastructure servers enterprise-wide, and application servers
located in the 10 campuses and non-Enterprise Computing Center sites.
The Service Operations
Command Center Branch (SOCCB) – also referred to as the Command Center – falls
under the purview of the Enterprise Computing Center and ensures that normal
information technology service operations are maintained for mainframes and servers. The SOCCB consists of two sections:
·
Service Operations Command Center Section – employs 64
systems administrators and computer support specialists to perform Event Management
and Incident Management activities on the IRS’s mainframes and servers.
·
Service Operations Management Section – employs 11
information technology specialists to perform Problem Management activities for
the entire MITS organization and to facilitate the Service Restoration Team’s
part of Incident Management.
Event Management, Incident
Management, and Problem Management processes originate from a set of concepts
and techniques called the Information Technology Infrastructure Library®
(ITIL). The ITIL provides a set of best
practices for managing Information Technology services and aims at delivering
those services to satisfy the business requirements of the organization. The ITIL also provides for a common set of
terminology to be used between MITS organization operations, which helps
increase customer service and reduce costs.
Because the ITIL provides general guidance for what to do rather than
how to do it, it is often described as a framework or approach. Figure 1 synopsizes a timeline of events that
SOCCB management highlighted regarding its accomplishments in standing up and
implementing ITIL best practices over the last 5 years.
Figure 1: Timeline of ITIL
Best Practices Implementation at the SOCCB
|
Date |
Highlighted Events |
|
October 1, 2006 |
The SOCCB stood up. The Service Restoration Team process and
limited Root Cause Analysis (RCA) support was in place at stand-up. |
|
Fiscal Year (FY) 2007 |
SOCCB management negotiated a stand-up/reassignment
agreement with the National Treasury Employees Union, solicited for
volunteers to come to the SOCCB, and competitively announced/filled remaining
vacancies. |
|
FY
2008 |
The SOCCB
represented being the first organization to their knowledge with employees
outside of the Enterprise Computing Center–Martinsburg supporting workloads (without increasing authorized
staffing). |
|
May
2009 |
The SOCCB
established the Knowledge Database. |
|
September
2009 |
The SOCCB
completed migration of monitoring/triage/resolution for Priority 1/Priority 2
service tickets for Enterprise Computing Center networked production servers. |
|
September
2010 |
The SOCCB
completed migration of monitoring/triage/resolution for Priority 1/Priority 2
service tickets for Enterprise Computing Center mainframe workloads. |
|
FY
2011 |
The SOCCB
implemented a process to ensure follow-up and Knowledge Database updates for
all SOCCB tickets that have to be reassigned for lack of documentation. |
Source: SOCCB management .
This review was performed at the SOCCB locations in Martinsburg, West Virginia, and Memphis, Tennessee, during the period July 2010 through April 2011. We conducted this performance audit in accordance with generally accepted government auditing standards. Those standards require that we plan and perform the audit to obtain sufficient, appropriate evidence to provide a reasonable basis for our findings and conclusions based on our audit objective. We believe that the evidence obtained provides a reasonable basis for our findings and conclusions based on our audit objective. Detailed information on our audit objective, scope, and methodology is presented in Appendix I. Major contributors to the report are listed in Appendix II.
Actions Have Been Taken to
Implement Information Technology Infrastructure Library Best Practices and
Provide Timely Service
The
SOCCB updated its policies and procedures to incorporate ITIL best practices
In September 2010, the Chief Technology
Officer outlined a goal to have the MITS organization implement ITIL best
practices over the next several years. The
SOCCB has incorporated the ITIL best practice principles of Event Management,
Incident Management, and Problem Management into its Concept of Operations and policies
and procedures. In addition, the SOCCB
has made these best practices a part of the way it does business by utilizing a
Knowledge Database. This database
provides personnel with the source for resolving incident tickets and is
continually updated with new information.
Incident ticket resolutions occur
within documented service level agreement time periods
The MITS organization
service level agreement provides for a 4-hour response time on a Priority 1
ticket and an 8-hour response time on a Priority 2 ticket. Our analysis of Information Technology Assets
Management System (ITAMS) data showed that 305[2] of the 312 tickets worked by Command Center
personnel were resolved within the documented time periods. Figure 2 shows the average time to
resolve these types of tickets.
Figure 2: Average Resolution
Times for Priority 1 and 2 Tickets
|
FY 2010 |
Priority 1 |
Priority 2 |
|
Number
of Tickets Received |
50 |
262 |
|
Average
Time to Resolve |
1 hour 39 minutes |
55 minutes |
|
Response
Time Per Service Level Agreement |
4 hours |
8 hours |
Source: Our analysis of ITAMS data.
While the SOCCB
has implemented Event Management, Incident Management, and Problem Management
best practices into its environment, additional improvements are needed to show
continued progress with and demonstrate efficiencies gained from implementing them.
Service
Operations Command Center Branch Management Can Do More to Become Proactive and
Closer to the Target State
We reviewed the incident tickets worked by SOCCB personnel during FY
2010, the current staff size within the SOCCB, and a recent independent
contractor’s ITIL assessment of the SOCCB’s efforts to adopt ITIL best
practices in evaluating whether the SOCCB is moving closer to the target state. Our analyses found that SOCCB management can
improve the following activities to help them become more proactive.
Command Center Section personnel
should examine incident reports to identify trends within the information
technology infrastructure
We
analyzed the 312 FY 2010 incident tickets worked by the systems administrators
and the computer support specialists looking for trends among the project type,
cause of the incident, and how the incident was resolved. Figure 3 shows that 145 (46 percent) of
the tickets pertained to 3 systems.
Most of the incidents occurred because of problems with software (e.g., system,
server, or application) and were resolved by performing a system reboot or
stop/restart.
Figure 3: Top Three Causes
and Most Common Resolution
of Incident Tickets by System Name
|
System Name |
# of Incident
Tickets |
Top Three
Ticket Causes |
Most Common
Resolution |
|
Account
Management Services |
63 |
Application Software (28) System Error[3] (8) No Trouble Found (8) |
Stop/Restart
and Reboot |
|
E-Services |
53 |
Unknown (18) Application Software (12) Other[4] (10) |
Reboot |
|
Totally
Automated Personnel System |
29 |
System Error (22) Application Software (3) Server Software (3) |
Stop/Restart
and Reboot |
|
Total |
145 |
Source: Our analysis
of ITAMS data.
When
asked to provide details about the incident ticket trends we observed, SOCCB
management explained that the high volume of tickets for Account Management
Services occurred partly because of a new release of the system for which
Command Center personnel had to perform the work. In addition, Account Management Services
generally has the most users, thus increasing the likelihood for generating
more incident tickets. SOCCB management
also described e-Services as a complex system requiring a MITS-wide approach to
resolution, while simultaneously implementing workarounds (e.g., reboots) to
minimize impact to the customer until the underlying root cause was addressed.
According to ITIL foundations, Problem Management
is both a reactive and proactive process.
Reactive means solving problems uncovered by Incident Management or
other sources, whereas proactive means looking for potential problems before they
are reported as an incident. The Operations
Management Section of the SOCCB conducts RCAs to facilitate its Problem
Management activities and mitigate the impact of system downtime/lost
productivity by identifying the root cause, a workaround, and ultimately a
permanent solution. However, the current
RCA process performed by the Operations Management Section is reactive and only
performed if requested by an internal MITS organization stakeholder or if
directed by an IRS executive.
Reviewing
incident ticket data to discover the types of problems that occur more
frequently can help the SOCCB identify problems that may occur in other places
within the IRS’s information technology infrastructure, as well as show that
repeated failures have not been adequately resolved and are likely to continue
to occur. This move towards proactive
Problem Management will help the SOCCB improve its overall effectiveness by
posturing itself to identify problems before they occur and result in outages
or lost productivity to its customers.
Figure 4 shows a recent ITIL assessment completed by an
independent contractor that identified the current and target state for each
SOCCB ITIL
best practice. The
independent contractor’s assessment supports our observations that the current
state of Problem Management activities within the SOCCB is reactive and notes
the desired target state of trending incidents as part of proactive Problem
Management.
Figure 4: Current and Future State of ITIL Best
Practices at the SOCCB
|
Best Practice |
Current State |
Target State |
|
Event Management |
The SOCCB
reacts to and investigates service outages after they have been
reported. Some events
automatically create incident tickets. |
The SOCCB resolves events before user
notices/reports problem. Correlate repeat events to provide for
additional automatic escalation to an incident. |
|
Incident
Management |
All incidents are logged and tracked. Basic service level agreements are in
place to monitor resolution. |
Incidents proactively tracked and
escalated to ensure service level agreements are met. |
|
Problem
Management |
The bulk of problem resolution is
recurring incident-based as opposed to proactive Problem Management. |
Trend
incidents to create problems on an ongoing basis. Report incident trends and resolutions
through proactive management. |
Source: Enterprise Operations organization, ITIL
Maturity Assessment dated September 30, 2010.
The Government
Accountability Office’s Internal Control Management and Evaluation Tool[5] explains that managers at all activity
levels should review performance reports, measure results against targets, and
analyze trends.
SOCCB management stated they currently do not have
sufficient staffing in the Operations Management Section to move more toward
proactively identifying recurring incidents and creating RCA tickets that will
identify why problems occur and provide a permanent solution. To address the staffing issue, Enterprise
Operations organization management negotiated the transfer of vacant positions
from other MITS organization programs to help perform this kind of work.
SOCCB management needs to baseline
their staffing with workload
The SOCCB formally stood
up in FY 2007, with approximately 25 personnel operating in a
24 hours a day, 7 days a week, 365 days a year environment. At that time, SOCCB management submitted a
request for organization change indicating that a total of 92 personnel
(systems administrators, information technology specialists, and customer
support specialists) were needed to support program activities. As of July 2010, the SOCCB employed 75
personnel to handle Event Management, Incident Management, and Problem Management
activities.
The 64 systems administrators
and computer support specialists (which handle Event and Incident Management) support
three different workloads split between mainframes and servers. According to SOCCB management, there are more
than 3,300 servers and 15 different mainframes that require monitoring. During FY 2010, these personnel resolved 312
incident tickets (an average of 5 tickets per employee), updated the Knowledge Database, performed
event monitoring activities, and reviewed and updated more than 700 Probe and
Response guides. According to SOCCB
management, these personnel also assist with other work (e.g., communicating
with stakeholders regarding changes to monitoring thresholds) that supports
Event and Incident Management activities. SOCCB management also indicated that the
amount of time spent per activity varies based on the task/situation.
According to industry expert Gartner Group,[6] “developing a portfolio of standardized information technology services with repeatable process methodologies for service delivery and support will help information technology organizations improve information technology service quality, reduce costs, and increase business value and agility,” and “a cost-effective information technology organization can do more with less – or at least do more with the same.”
One way for an organization to measure its success and know
if it is “doing more with less” is to perform a staffing and workload
analysis. When the SOCCB stood up in FY
2007, it did not initially baseline its staffing needs against its workload,
and it has yet to perform a staffing and workload analysis. Having this information readily available
will allow SOCCB management to demonstrate to upper management the direct
relationship between staffing and support levels and the impact any reductions
or increases in staffing might have on service levels.
Management’s Responsibility for Internal Control (Office of Management and Budget Circular A-123)
states that managers are responsible for increasing productivity and
controlling costs of agency operations.
The Government Accountability Office’s Standards for Internal Control
in the Federal Government[7] stipulates that program managers need both
operational and financial data to determine whether they are meeting their
goals for effective and efficient use of resources.
Recommendations
The Associate Chief Information Officer, Enterprise
Operations, should:
Recommendation 1: Ensure SOCCB
management revises SOCCB standard operating procedures to account for MITS-wide
trending activities of its incident tickets to help identify repeat occurrences
of incidents that can be used to proactively address problems, increase
efficiency, and result in fewer repeat incidents in the future.
Management’s Response: The IRS agreed with this recommendation. The SOCCB has completed the hiring of the
additional Problem Management staff, and all were onboard on June 19,
2011. The SOCCB will revise the standard
operating procedures to account for MITS-wide trending activities of its
incident tickets.
Recommendation
2: Perform a staffing and workload analysis of the
SOCCB to demonstrate the relationship between staffing and service
levels and help identify opportunities to improve information technology
service quality, reduce costs, and increase business value.
Management’s Response: The IRS agreed with this
recommendation. Currently, the SOCCB has
the minimum staff per each work specialty and, due to budgetary, National
Treasury Employees Union, and Labor Relations constraints, is unable to make
any adjustments to the staffing model.
However, the SOCCB will perform a staffing and workload analysis to
determine the relationship between staffing and service
levels and help identify opportunities to improve information technology
service quality, reduce costs, and increase business value in anticipation of
the ability to acquire or adjust resources.
Additional Management Actions
Are Needed to Ensure Long-Term Success of the Service Operations Command Center
Branch
We
reviewed the organizational goals, personnel training history, and currently
established performance measures to evaluate whether the SOCCB can effectively
and efficiently accomplish its mission.
Our review found that SOCCB management can make improvements in the
following areas to help ensure future program success.
The SOCCB needs a strategic plan
and vision for maturing the implementation of the ITIL best practices
Although SOCCB
management has incorporated the ITIL best practice principles of Event
Management, Incident Management, and Problem Management into SOCCB daily
operations and updated its policies and procedures, they have not documented
any milestones leading to maturing these best practices or long-range goals
that will show the intended benefits.
SOCCB management has been more focused on getting policies and
procedures updated and responding to new workload requests versus documenting a
baseline organization, vision for the future, and a plan to show how they will
attain their future vision. SOCCB
management has developed an action plan for when they transition new work, but the
plan does not show specific time periods associated with completing the
transition (of planned new workloads) or the desired outcomes of the transition
efforts to include how those efforts will help the SOCCB most efficiently and
effectively leverage and mature implementation of ITIL best practices.
The first quarter FY 2011 version of the Enterprise Operations organization’s
5-year strategic plan,[8] which SOCCB
management provides input to, contains two objectives and goals regarding the ITIL. To support the objective of improving filing
season execution, the strategic plan states a goal of continuously improving
service to customers and identifies that the SOCCB will expand its Problem
Management activities. However, the
strategic plan does not contain any milestones or target dates for the
expansion, disclosing that these actions are pending the additional resources
negotiated from the other MITS organizations.
In addition, the strategic plan states another goal of delivering
improved business capabilities and governance, identifying that building the
foundation of the ITIL will help strengthen program management capabilities and
accountability. The strategic plan does
not contain a description of what this means, reflect any milestones or target
dates, or provide a list of anticipated benefits.
According to
industry best practices, to be successful in the ITIL, a deliberate,
well-planned project management approach should be adopted. Key activities in this approach include
creating an overall vision and strategy, developing a project plan and managing
it effectively, and assigning accountability for desired outcomes. A strategic plan outlines an organization’s
priorities and communicates to employees, internal stakeholders, and external
stakeholders how it plans to accomplish those priorities. Without a documented plan and strategy for maturing
ITIL best practices relevant to the SOCCB, it will be difficult for management
to demonstrate their efforts to make Command Center processes more efficient
and influence decisions among their stakeholders.
Personnel need customized training
to effectively implement the ITIL
In our review of the
training records, we determined 44 of 50 SOCCB personnel that received ITIL
training completed an ITIL foundations course.
The foundations course is an entry-level course designed to provide
candidates with a general awareness of the key elements, concepts, and
terminology used in the ITIL. We also
determined that almost 90 percent (44 of 50) of the SOCCB personnel
received training from 2 years to 4 years ago (during Calendar Years 2006
through 2008). In addition, some
personnel completed training in an ITIL best practice area (e.g., financial and
security management, service level, and capacity management) that did not align
with the specific work completed by the SOCCB.
According to SOCCB management, this training was completed as part of
mandatory annual Federal Information Security Management Act[9] security training requirements.
Industry best
practices require that ITIL training be customized to suit individual roles and
responsibilities. SOCCB management
previously expressed concerns about their ability to fund ITIL training for their
personnel, and this might explain why a formal training plan has not been developed
and implemented. Creating and executing a
training plan will allow SOCCB management to ensure that their personnel
consistently remain a valuable part of a highly skilled and high-performing
workforce to support the SOCCB and IRS missions.
Additional measures are needed to capture
the improved efficiency and effectiveness resulting from ITIL implementation
The SOCCB workload primarily consists of monitoring operations and
working incident tickets. An incident
ticket is routed to systems administrators or computer support specialists by
one of three methods: 1) a user calls in
a ticket, 2) the ITAMS generates a ticket, or 3) the Command Center receives an
alert announcing, for example, that a server is going down. An alert is designed to prevent/minimize a
work stoppage. When the SOCCB receives a
ticket, personnel will search for a resolution via the Knowledge Database. Generally, systems administrators and computer
support specialists have 30 minutes to close and/or reassign an incident ticket
before it is elevated to a subject matter expert.
Currently,
SOCCB management has established the following goals to measure SOCCB performance
each fiscal year:
·
Tickets Opened
for Errors Recognized Through Event Management. Goal: 20
percent increase by 4th quarter (compared to 1st quarter).
·
Probe
and Responses Updated to Reflect the SOCCB as the Primary Assignment Group for
Priority 1 and 2 Tickets. Goal: 100 percent for Enterprise Computing Center networked
servers.
·
Number
of Priority Tickets Triaged/Closed. Goal: 80 percent worked/closed without
reassignment.
However, the first 2 of the
3 measures are broad goals that do not address whether the work is performed
efficiently [i.e., work completed within reported time periods (e.g., number and
percentage of tickets resolved in 30 minutes)] or effectively (i.e., quality
resolutions or nonrecurrence of problems).
In addition, none of the current measures allow SOCCB management to
track continuous improvements, and there are no measures to ensure that the SOCCB is
meeting its RCA goals.[10]
Industry best practices emphasize that identifying the
appropriate measures, creating a process for collecting and analyzing the data,
and effectively using the data to guide and direct continued improvement are
essential to establishing a successful measurement process. Meaningful key performance indicators should
align with organizational goals and provide insight into the following:
Also,
metrics should be specific, measurable, attainable, realistic, and time
driven. Metrics help to ensure that the
process in question is running effectively and efficiently.
Figure 5 shows the
SOCCB experienced declines in each of its measures from FY 2009 to
FY 2010.
Figure 5: Comparison of FYs
2009 and 2010 Measures
|
Goal/Measure |
FY 2009 |
FY 2010 |
|
Tickets Opened
for Errors Through Event Management |
9,432 |
8,241 |
|
Probe and
Response Guide Changed |
140 |
71 |
|
Priority
Tickets Closed |
86% |
69% |
Source: Enterprise Computing Center web site.
When asked about the
decline in the percentage of priority tickets closed, SOCCB management
attributed this to the types of tickets the SOCCB received during the later
months of the fiscal year, introduction of a new workload, and more complete
ticket reporting.
A September 2010 independent
contractor assessment of the ITIL within the Enterprise Operations organization
also identified additional performance measures that will allow the SOCCB to
ensure it makes progress in achieving its future state.
According
to SOCCB management, there is limited staffing available to perform ticket
trending that would lead to identifying new performance measures to implement. There is currently only one person available
to do any ticket trending, and this individual has other duties to perform
within the SOCCB. Another reason why
SOCCB management has not implemented additional measures is because of an
initiative to improve the accuracy of ITAMS incident ticket data. The Customer Relationship and Service
Delivery staff began this process in summer 2009 by training its staff on
ticket accuracy and in fall 2009 targeted Priorities 1 and 2 tickets to ensure
the accuracy of the data input into key fields (e.g., ticket start time, ticket
stop time, and cause code).
Without effective measures to demonstrate continued improvements and
quantify cost savings resulting from implementing new processes, like the ITIL, the SOCCB is at risk of being unable to
fully support the MITS organization in its efforts to effectively reallocate
operational savings to program enhancements.
Recommendations
The Associate Chief Information Officer, Enterprise
Operations, should:
Recommendation 3: Update
the Enterprise Operations organization strategic plan whenever an SOCCB ITIL
best practice is required to support the goals or objectives of the organization. The update needs to address the goals, as
well as milestone and target dates, completion dates, benefits, and any
associated risks.
Management’s Response: The
IRS agreed with this recommendation. The
SOCCB will update the Enterprise Operations strategic plan for Problem
Management whenever an SOCCB ITIL best practice is required to support the goals or
objectives of the organization.
Recommendation 4: Ensure SOCCB management develops and executes
a training plan to ensure personnel continue to receive customized training in
ITIL best practices relevant to the Command Center.
Management’s Response: The IRS agreed with this recommendation. The SOCCB will develop and execute a training
plan to ensure personnel continue to receive customized training in ITIL best practices
relevant to the Command Center. The
training plan will be based on available online courses and other courses as
budget constraints allow.
Recommendation 5: Identify and implement performance measures
that will demonstrate the efficiencies and effectiveness of implementing Event
Management, Incident Management, and Problem Management.
Management’s Response: The IRS agreed with this recommendation. The SOCCB will identify and implement
performance measures that will demonstrate the efficiencies and effectiveness
of implementing Event Management, Incident Management, and Problem Management.
Appendix I
Detailed Objective, Scope, and Methodology
Our overall
objective was to determine whether the SOCCB had effectively implemented
ITIL[11] best practices to ensure service delivery
management for Enterprise Operations products and services. In prior audits,[12] our overall
assessment has been that ITAMS data are of undetermined reliability. However, in our opinion, using these data did
not weaken our analysis or lead to an incorrect or unintentional message. Prior audit reports included language that
clearly stated the data limitations. To accomplish our objective, we:
I. Reviewed SOCCB program management controls over ITIL implementation.
A. Interviewed management to obtain their understanding of ITIL best practices, how they communicated to employees and stakeholders regarding implementation of the ITIL, and the training provided to employees.
B. Reviewed standard operating procedures, the Internal Revenue Manual, and the Concept of Operations to determine whether SOCCB policies and procedures were updated to reflect ITIL best practices. In addition, we reviewed documentation which showed what benefits the SOCCB expected to accomplish by implementing the ITIL.
C. Evaluated the process the SOCCB has in place to ensure continuous improvements and whether those improvements are delivered.
II. Determined whether the SOCCB is performing its Event Management, Incident Management, and Problem Management functions in accordance with the ITIL.
A. Reviewed Event Management statistics maintained on the Enterprise Computing Center web site and interviewed SOCCB management about Event Management activities.
B. Determined the maturity/status of the SOCCB’s Incident Management activities.
1. Analyzed all 312 FY 2010 Priority 1 and Priority 2 tickets obtained from SOCCB management and the ITAMS to identify average ticket resolution time.
2. Interviewed Customer Relationship and Service Delivery function management about the reports it generates for the SOCCB.
3. Using the ITAMS, selected a judgmental sample of 20 from 97 Priority 1 and Priority 2 tickets resolved during FY 2010 and traced them to the Knowledge Database to ensure it was updated. We selected tickets from those systems that had a higher volume of reported incidents (e.g., Account Management Services, Eservices, and the Totally Automated Personnel System) and for which the causes of the incidents included System Error, Unknown, or Application Software. We used judgmental sampling because we did not intend to project our results.
C. Interviewed SOCCB management about their Problem Management activities.
D. Reviewed policies and procedures that define how the SOCCB adds new services and applications to its workload.
III. Determined whether the SOCCB is monitoring and measuring program performance in accordance with the ITIL.
A. Interviewed management to determine how they monitored and measured program performance for FY 2010. We determined how and when results are communicated to employees and stakeholders.
B. Compared SOCCB measures to determine whether they align with Enterprise Operations organization goals.
C. Identified the long-range goals that management envisions will show progress in SOCCB operations based on ITIL implementation.
D. Identified the type of quality review process performed and how that information influences overall performance.
E. Determined whether SOCCB management performs any trend analyses of performance metrics to ensure repeat incidents have been identified and resolved.
F. Evaluated how SOCCB management quantifies the efficiencies they gain through proactive monitoring (i.e., cost savings).
Internal
controls methodology
Internal controls
relate to management’s plans, methods, and procedures used to meet their
mission, goals, and objectives. Internal
controls include the processes and procedures for planning, organizing,
directing, and controlling program operations.
They include the systems for measuring, reporting, and monitoring
program performance. We determined the
following internal controls were relevant to our audit objective: the MITS
organization’s policies and procedures for implementing an effective SOCCB to address the
critical issues of addressing service outages efficiently and utilizing taxpayer
resources effectively. We evaluated
these controls by interviewing management and reviewing policies and procedures,
such as the Internal Revenue Manual, Federal guidance such as the Clinger-Cohen
Act of 1996,[13] and Office of Management and Budget Circulars and relevant
supporting documentation.
Appendix II
Major Contributors to This Report
Alan Duncan, Assistant Inspector General for Audit (Security and
Information Technology Services)
Danny Verneuille, Director
Diana Tengesdal, Audit Manager
Mark Carder, Senior Auditor
Myron Gulley, Senior Auditor
Allen Henry, Program Analyst
Sarah White, Program Analyst
Appendix III
Commissioner C
Office of the Commissioner – Attn: Chief of Staff C
Deputy Commissioner for Operations Support OS
Deputy Chief Information Officer for Operations OS:CTO
Associate Chief Information Officer, Enterprise Operations OS:CTO:EO
Associate Chief Information Officer, Strategy and Planning OS:CTO:SP
Director, Enterprise Computing Center OS:CTO:EO:EC
Chief Counsel CC
National Taxpayer Advocate TA
Director, Office of Legislative Affairs CL:LA
Director, Office of Program Evaluation and Risk Analysis RAS:O
Office of Internal Control OS:CFO:CPIC:IC
Audit
Liaison: Director, Risk Management Division OS:CTO:SP:RM
Appendix IV
|
Term |
Definition |
|
Account
Management Services |
A project that
will modernize the capability to collect, view, retrieve, and manage taxpayer
information. |
|
Best
Practice |
A technique or
methodology that, through experience and research, has proven to reliably
lead to a desired result. |
|
Campus |
The data
processing arm of the IRS. The
campuses process paper and electronic submissions, correct errors, and
forward data to the Computing Centers for analysis and posting to taxpayer
accounts. |
|
Computer
Support Specialist |
Position within the SOCCB whose
duties include monitoring the mainframes. |
|
Concept
of Operations |
A framework that includes a
defined vision, strategic goals, operational themes, and program capabilities. It identifies key organizational concepts required
to achieve the organization’s vision. |
|
E-Services |
Provides a set of web-based
business products as incentives to third parties to increase electronic
filing, in addition to providing electronic customer account management
capabilities to all businesses, individuals, and other customers. |
|
Enterprise
Computing Center |
Supports tax processing and
information management through a data processing and telecommunications
infrastructure. |
|
Event Management |
The first line of
defense for preventing an interruption to or reduction in the quality of
service. Event monitoring is used to
sustain and improve quality service, identify significant events and initiate
actions before an incident occurs, and automate the process. |
|
Incident Management |
The process for
managing incidents with the goal of restoring service as quickly as possible
and minimizing the adverse impact on the customer. |
|
Information Technology Assets Management System |
The workflow tool for all MITS organization service providers. This module reports and tracks all MITS organization
incidents and service requests. |
|
Information Technology Infrastructure Library® |
A set of concepts and techniques for managing information technology
infrastructure, development, and operations. |
|
Mainframe |
A
powerful, multiuser computer capable of supporting many hundreds of thousands
of users simultaneously. |
|
Priority
1 Ticket |
An incident ticket exhibiting the following characteristics: 1) resulting in severe mission-critical work stoppage or any issue relating to safety or health (e.g., fire, electrical shock), 2) impacting on vital IRS customer commitments of national or area-wide scope, 3) affecting multiple internal or external customers and service to taxpayers, and 4) requiring immediate action. |
|
Priority
2 Ticket |
An incident
ticket with the potential to result in a work stoppage (could have a direct
impact on the service to taxpayers or if scope is multi-user and there is no |
|
Probe
and Response |
Provides
information for enhanced triage, first contact resolution, timely incident
assignment, standard incident coding, and resolution solutions. |
|
Problem
Management |
Consists of performing an RCA to mitigate the impact
of system downtime/lost productivity caused by errors in the information
technology infrastructure and to prevent the recurrence of events. |
|
Release |
A specific edition of software. |
|
Server |
A
computer that carries out specific functions (e.g., a file server stores
files, a print server manages printers, and a network server stores and
manages network traffic). |
|
Service
Level Agreement |
A
document that describes the minimum performance criteria a provider promises
to meet while delivering a service, typically also setting out the remedial
action and any penalties that will take effect if performance falls below the
promised standard. |
|
Service
Restoration Team |
A
group of individuals who work to rapidly escalate, coordinate, and resolve
Priority 1 or Priority 2 outages. The
team can leverage resources from outside the Enterprise Computing Center to
assist with restoration activities. |
|
Systems
Administrator |
Position
within the SOCCB whose duties include monitoring Enterprise Computing Center
networked production servers as well as participating in RCA and Service
Restoration Team efforts. |
|
Totally
Automated Personnel System |
Automated
personnel system used by management for processing requests for personnel actions,
as well as employee information report generation. |
|
Triage |
Refers
to the analysis work that determines the priority of computer applications
that need to be remediated. |
Appendix V
Management’s Response to the Draft Report
DEPARTMENT OF THE TREASURY
INTERNAL REVENUE SERVICE
WASHINGTON, D.C. 20224
CHIEF
TECHNOLOGY OFFICER
JULY 20, 2011
MEMORANDUM
FOR DEPUTY INSPECTOR GENERAL FOR AUDIT
FROM: Terence V. Milholland
/s/ Terence V. Milholland
Chief Technology Officer
SUBJECT: Draft Audit Report -
Service Operations Command Center Management Can Do More to Benefit From
Implementing the Information Technology Infrastructure Library (Audit #
201020006) (e-trak # 2011-22851)
Thank you for the
opportunity to review and respond to the subject audit report.
We appreciate your
comments and observations on how the Enterprise Operations organization is
implementing the Information Technology Infrastructure Library (ITIL) best
practices in the Service Operations Command Center Branch (SOCCB). We have made
considerable strides and are working toward fully utilizing our resources in
each of the process areas, Event Management, Incident Management and Problem
Management.
We agree with the five
recommendations and as we continue our tasks toward ITIL Level 3, we will
implement them on or before December 30, 2012.
We value your
continued support and the guidance your team provides. If you have any questions,
please contact me at (202) 622-6800 or Andrea Greene-Horace, Senior Manager of Program
Oversight, at (202) 283-3427.
Attachment
RECOMMENDATION #1: The Associate Chief Information Officer, Enterprise
Operations should ensure SOCCB management revises its standard operating
procedures to account for MITS-wide trending activities of its incident tickets
to help identify repeat occurrences of incidents that can be used to
proactively address problems, increase efficiency, and result in fewer repeat
incidents in the future.
CORRECTIVE ACTION #1:
We agree with the recommendation. SOCCB completed the hiring of the additional
problem management staff and all were onboard on June 19, 2011. We will
revise the Standard Operating procedures to account for MITS-wide trending
activities of its incident tickets to help identify repeat occurrences of
incidents that can be used to proactively address problems, increase
efficiency, and result in fewer repeat incidents in the future.
IMPLEMENTATION DATE: December 30, 2012
RESPONSIBLE OFFICIAL: Associate
Chief Information Officer, Enterprise Operations
CORRECTIVE ACTION MONITORING PLAN: We enter
accepted Corrective Actions into the Joint Audit Management Enterprise System
(JAMES) and monitor them on a monthly basis until completion.
RECOMMENDATION #2: The Associate Chief Information Officer, Enterprise Operations should
perform a staffing and workload analysis of the SOCCB to demonstrate the
relationship between staffing and service levels and help identify
opportunities to improve information technology service quality, reduce costs,
and increase business value
CORRECTIVE ACTION #2: We agree with the recommendation. Currently, SOCCB has the minimum
staff per each work specialty and due to budgetary, NTEU and LR constraints
SOCCB is unable to make any adjustments to the staffing model. However, SOCCB
will perform a staffing and workload analysis to determine the relationship
between staffing and service levels and help identify
opportunities to improve information technology service quality, reduce costs, and
increase business value in anticipation of the ability to acquire or adjust
resources.
IMPLEMENTATION DATE: December 30, 2012
RESPONSIBLE OFFICIAL: Associate Chief Information Officer, Enterprise
Operations
CORRECTIVE ACTION MONITORING PLAN: We enter
accepted Corrective Actions into the Joint Audit Management Enterprise System
(JAMES) and monitor them on a monthly basis until completion.
RECOMMENDATION #3: The Associate Chief Information Officer, Enterprise Operations should
update the Enterprise Operations strategic plan whenever a SOCCB ITIL® best
practice is required to support the goals or objectives of the organization.
The update needs to address the goals, as well as milestone and target dates,
completion dates, benefits and any associated risks.
CORRECTIVE ACTION #3:
We agree with the recommendation. SOCCB will update the Enterprise Operations
strategic plan for Problem Management whenever a SOCCB ITIL® best practice is
required to support the goals or objectives of the organization. The update
will address the goals, as well as milestones and target dates, completion
dates, benefits, and any associated risks.
IMPLEMENTATION DATE:
December 30, 2012
RESPONSIBLE OFFICIAL: Associate Chief
Information Officer, Enterprise Operations
CORRECTIVE ACTION MONITORING PLAN: We enter
accepted Corrective Actions into the Joint Audit Management Enterprise System
(JAMES) and monitor them on a monthly basis until completion.
RECOMMENDATION #4:
The Associate Chief Information Officer, Enterprise Operations should ensure
SOCCB management develops and executes a training plan to ensure personnel continue
to receive customized training in ITIL® best practices relevant to the Command
Center.
CORRECTIVE ACTION #4:
We agree with the recommendation. SOCCB will develop and execute a training
plan to ensure personnel continue to receive customized training in ITIL best practices
relevant to the Command Center. The training plan will be based on available
online courses and other courses as budget constraints allow.
IMPLEMENTATION DATE:
December 30, 2012
RESPONSIBLE OFFICIAL: Associate Chief
Information Officer, Enterprise Operations
CORRECTIVE ACTION MONITORING PLAN: We enter
accepted Corrective Actions into the Joint Audit Management Enterprise System
(JAMES) and monitor them on a monthly basis until completion.
RECOMMENDATION #5:
The Associate Chief Information Officer should identify and implement
performance measures that will demonstrate the efficiencies and effectiveness
of implementing Event Management, Incident Management, and Problem Management.
CORRECTIVE ACTION #5: We agree with
the recommendation. SOCCB will identify and implement performance measures that
will demonstrate the efficiencies and effectiveness of implementing Event
Management, Incident Management, and Problem Management.
IMPLEMENTATION DATE:
December 30, 2012
RESPONSIBLE OFFICIAL: Associate Chief
Information Officer, Enterprise Operations
CORRECTIVE ACTION MONITORING PLAN: We enter
accepted Corrective Actions into the Joint Audit Management Enterprise System
(JAMES) and monitor them on a monthly basis until completion.
[1] See Appendix IV for a glossary of terms.
[2] This excludes the Priority 1 and Priority 2 tickets considered Unscheduled Maintenance Requests and worked by an outside vendor.
[3] SOCCB management described a system error as one that occurs due to a problem with the service software suite.
[4] SOCCB management indicated this cause code is used synonymously with “Unknown” (i.e., the employee could not identify what caused the incident ticket to occur).
[5] GAO-01-1008G, dated August 2001.
[6] IT Infrastructure and Operations Leaders Key Initiative: ITIL and Process Improvement, dated January 16, 2009.
[7] GAO/AIMD-00-21.3.1, dated November 1999.
[8] Enterprise Operations organization management reviews and updates its strategic plan quarterly.
[9] 44 U.S.C. Sections 3541 – 3549.
[10] The goals include mitigating the impact of system downtime/lost productivity and preventing the recurrence of incidents.
[11] See Appendix IV for glossary of terms.
[12] Management Advisory Report: Review of Lost or Stolen Sensitive Items of Inventory at the Internal Revenue Service (Reference Number 2002-10-030, dated November 29, 2001), Progress Has Been Made in Using the Tivoli® Software Suite, Although Enhancements Are Needed to Better Distribute Software Updates and Reconcile Computer Inventories (Reference Number 2006-20-021, dated December 14, 2005), and Management Practices Over End-user Computer Server Storage Need Improvement to Ensure Effective and Efficient Storage Utilization (Reference Number 2007-20-103, dated July 3, 2007).
[13] Pub. L. No. 104-106, 110 Stat. 642 (codified in scattered sections of 5 U.S.C., 5 U.S.C. app., 10 U.S.C., 15 U.S.C., 16 U.S.C., 18 U.S.C., 22 U.S.C., 28 U.S.C., 29 U.S.C., 31 U.S.C., 38 U.S.C., 40 U.S.C., 41 U.S.C., 42 U.S.C., 44 U.S.C., 49 U.S.C., 50 U.S.C.).