top of page
Team Meeting

SLAs and KPIs in IT Outsourcing 
a
 complete guide to Performance Management

Master the process of defining, implementing, and optimizing SLAs and KPIs in IT outsourcing agreements.

This guide shows you how to establish performance metrics that matter and how to improve SLA performance.

SLAs and KPIs in IT Outsourcing: Definition, Best Practices, and Penalty Examples

The SLA structure is one of the most unique and complex aspects of an outsourcing engagement. While you may have experienced operational measures within an internal IT organization, the complexities around managing IT performance under an outsourced arrangement is very different.

What Are Service Level Agreements (SLAs) and Key Performance Indicators (KPIs)?

​Service Level Agreement (SLAs):  the primary measures which the client and vendor agree will be utilized to evaluate the vendors performance under the contract.  The SLA commercial construct will include provisions for financial penalties and even termination for cause in the event the vendor fails to meet agreed upon performance standards.

 

Key Performance Indicators (KPIs):  the second level of measures that are important to the client, but do not rise to the level of importance as to have financial penalties associated with them.  The vendor is expected to adhere to the performance targets in the same manner as an SLA.  There are normally provisions in the agreement that allow the client to promote a KPI to an SLA in the event the vendor is not performing as expected.

Key Elements of a Well-Defined SLA and KPI:

 The definitions of the SLA and KPI will include the following items:

​

  • Name and description - including the formula of the measure. It is also important that the parties agree as to the source of the data to be used for the calculation.

 

  • Performance requirement – the performance target for each task or event

​​

​

  • Expected performance – the percentage of successful tasks or event that the vendor should deliver

 

  • Minimum performance  -  the percentage that triggers the potential assessment of penalties

​

  • Significant minimum – the result that is any individual event falls below (not the aggregate) penalties will be incurred

 

Measures:  each key service performed by the vendor will have one or more quantifiable performance metrics identified. As tasks are delivered, they are evaluated as to achievement of the agreed upon performance requirement and determined to have either passed or failed. At the end of the reporting period, the passed/failed results will be totaled.

Real-World Example of SLA Performance Measurement: Incident Management

Incident Management typically has two type of measures;  response time and resolution time. The performance requirements vary based on the severity of the incident, so there are actually 8 individual measures that are evaluated independently.

 

SLA Definition – Incident Management:                                                                                          

 

                                    Performance Requirements​ 

            Severity           Resolution Time                          Expected               Minimum

          1                   2 hours                                    99%                      97% 

          2                   4 hours                                    95%                      90%   

          3                   5 business days                        90%                      85%   

          4                 30 business days                        90%                      85%

 

At the end of the month, each incident that has been closed is evaluated against the response and resolution times (individually) and determined if the performance passed or failed expectations, resulting in a percentage.​For our example, we will use the Severity 2 Resolution measure and assume that there were 10 Severity 2 incidents, 9 were solved in under 4 hours, making the Severity 2 Incident Resolution metric 90%.  

 

In this case the vendor did NOT meet the expected performance and is under the minimum performance, therefore evaluation as to the application of penalties needs to be conducted.

Sample SLAs for a Managed Services Outsourcing Engagement:

Service Level Agreements (SLAs) are the foundation for managing vendor performance and ensuring consistent service quality in IT outsourcing engagements. The following key SLA measures represent critical areas that organizations should define, monitor, and enforce as part of their contract and ongoing service governance.

Why Balanced Scorecards Matter in IT Outsourcing

While SLAs and KPIs focus on measuring specific service outcomes and performance levels, an IT Balanced Scorecard provides a broader, strategic view of organizational health.

 

It connects operational performance to customer satisfaction, financial efficiency, and innovation goals — offering executives and IT leaders a comprehensive dashboard to guide decision-making, prioritize improvements, and align outsourcing efforts with business objectives.

​

A Balanced Scorecard has the added advantage of moving towards an evaluation of outcomes that provide a more holistic view of the business value being driven by the engagement.​​

Reach out to us to explore establishing a Balanced Scorecard for your company 

Top 5 SLA & KPI mistakes and How to Avoid Them

Establishing SLAs and KPIs is critical to outsourcing success. However, many organizations fall into common traps that can weaken vendor accountability and drive unexpected costs. Below, we explore the top 5 mistakes and how to avoid them

​​Poorly Defined Metrics:  Companies often write SLAs with vague or incomplete definitions. If the metric isn’t measurable, observable, and objective, it will be hard to enforce — leading to disputes.

To prevent poorly defined SLAs, document each measure with a clear definition, including formula and the data source which will be used to perform the calculation. Perform due diligence on the ability to extract the required data from the source systems, confirm accuracy and finalize reporting requirements and oversight approach. 

​​Overlapping Accountabilities:  A vendor can only be held accountable for delivery for those services that they have responsibility for. 

To ensure viability of the SLAs being proposed, evaluate the level of ownership the vendor has in delivering the service being measured, if other parties have significant influence on the success of the outcome you may need to decompose the measure into a calculation that aligns to the vendors level of responsibility to avoid finger pointing.

Not Aligning SLAs to Business Impact:  A missed SLA on a key financial system is not the same as a missed SLA on a low-priority internal tool.

To align SLAs to business impact, consider using a structure that recognizes a Tiered ranking for applications, such as: Gold, Silver and Bronze application categorization. Then set the performance targets for each category to a level that matches the importance of the application. As an example, Severity 1 resolution time for a Gold application may be 2 hours, where for a Bronze it may be 8 hours. This also allows you to place a higher penalty on missing the Gold applications that Bronze, shifting the vendor risk to align to the business value.  

Vision_edited_edited.jpg

Forcing a Vendor to Perform Low Value Work:  Inclusion of a measure as an SLA requires a vendor to spend resources to complete the effort within required timelines, thereby costing you money. As an example, many clients include Severity 4 incidents in the SLAs.  Severity 4 incidents by definition are low priority items, many of which should not be done.

To drive maximum value from your SLAs, evaluate the service and determine how critical the work is and if you are giving the vendor pre-approval to complete all activities within the scope of the measure. An alternative approach to assigning an SLA is to place the service into a category that gets prioritized and assign a KPI to the activity. If necessary, the KPI can be promoted to an SLA.  

Not Changing Your SLAs:  Performance measures are not intended to be static. The term of most managed services is 3-5 years. During that time the things that are important to the client will change and the performance of the service provider will improve. 

Evolution of your managed service outsourcing engagement is supported by changing the active SLAs. On a semi-annual basis, leadership should review the health of the engagement and adjust the SLAs in use to address areas of concern or to re-enforce new areas of opportunity.  SLAs no longer in use can be moved to the active KPI list, thereby keeping them in play in the event that vendor performance slips. 

Reach out to us for a complementary review of your IT Performance Framework

Managing vendor performance isn’t just about setting SLAs — it’s about defining exactly how results are measured, exceptions are handled, and penalties are enforced. The following sections cover essential rules that are utilized while evaluating like exemptions, small sample protections, earn backs, application tiering, and dead bands that ensure fair, enforceable accountability.

Understanding Exemptions in SLA Performance Management

​Exemptions:  one of the core principles of an SLA framework is that the vendor can only be held accountable for something that they control. In the event that an outside party caused the vendor to miss the performance target, then that task would be excluded from the calculation. 

​

The exclusion requires the client to agree to the exclusion.  If the revised calculation results in a measurement that is equal to or higher than the minimum performance target, then the penalties do not apply.

The Impact of the Law of Small Numbers on SLA Penalties

Law of Small Numbers:  there is a principle in the SLA framework that a single miss should not trigger a financial penalty.  In our Severity 2 Resolution example above, you will see that the vendor missed 1 out of 10 incidents causing a 90% performance rating where the target was 95%.  In order for a single miss to not trigger this  performance measure, there needs to be at least 20 Severity 2 incidents in the month, i.e. 19 successful events against a total of 20 = 95% performance.

​

The typical approach to remedy this situation would be to carry over the results to the following month and aggregate the volume of the two months to reach the required volume of incidents for evaluation.

​

Exception:   some measures that are so important that a single miss can cause a financial penalty, for example application availability. 

How Earn Back Provisions Affect Vendor Penalty Recovery

​Earn Back:   in some agreements, the vendor has the right to earn back penalties that are incurred. The ability to earn back is based on the achievement or overachievement of performance targets in subsequent months.

​

It is worth noting that Earn Back provisions are no longer common in the competitive outsourcing marketplace.

Using Application Tiers to Set Different Service Level Targets

Application Tiers:   It is not unusual for large scale organizations to split their application portfolios into different groups (ex: Gold, Silver, Bronze) based on their criticality. In this case, each application tier would have different service levels associated with them.

​

As an example, Gold applications would have higher requirements for system availability than the Silver applications, such as Gold 99.99% availability and Silver 99.9% availability.

Managing Volume Variations with Dead Bands in SLAs

Dead Bands:  when a vendor presents a solution it is based on a series of assumptions, one of the most critical being volume of work.  As an example, for application support engagements the number of incidents per month is an important factor in staffing and has a direct impact on the ability of the vendor to meet the agreed upon service levels.

 

Dead Bands are constructed to indicate when variability is deemed high enough to impact the vendor’s ability to perform, thereby giving them relief from service level penalties.

  

In the example shown, the expected volume of incidents is 1,500 per month. The parties have agreed to establish the dead bands at +/- 500 incidents per month.  In the month of January and February, the incident volumes were within the expected range. 

Dead Band:  establishes expected volume and upper and lower levels before SLAs are suspended

In March there was an unexpected surge of incidents that resulted in volumes above the upper range. Provided that these incidents were not the result of something the vendor did, there would be relief granted for SLA misses that were “volume related” during that month.  SLAs such as Incident Remediation for Severity 3-4 might be excused because Severity 3-4 incidents is the vast majority of the incidents. Remediation for Severity 1-2 incident would not be excused because the volume is typically less than 5% of the total incidents. 

​

June, July and August all are above the upper range. When 3 or more months are outside the range (above or below) a formal meeting is called to review the expected volume. Adjustments can be made to either the service level performance requirements (no cost impact) or the necessary staffing to align with the new expected volume  (potential cost impact +/-).   

Critical Concepts for Contracting SLA(s) and KPI(s)

These concepts are crucial during SLA and KPI negotiations and often form the basis for dispute resolution when vendor performance issues arise.

Due Diligence for SLAs and KPIs

Before contract signature, vendors will typically conduct Due Diligence to validate that the client's requested SLA targets are realistic based on the current health of the systems and historical performance of IT. Vendors rely on the current state of operations to estimate their solutions, and if past performance does not meet the proposed targets, it signals potential underestimation.​

 

Example:  if a client is requiring an uptime of 99.99% but the systems have never been available more than 99.2%, then the SLA requested is not appropriate without remediation efforts.

​

Typical approach for conducting Due Diligence: is to obtain the last 6 months of SLA data and compare the results against the performance targets being requested. The measures and data utilized must conform to those incorporated in the agreement. It is not permitted to vary the formulas or data sources in order for the results to be valid. 

​

Exceptions:  Due Diligence DOES NOT apply to all measures, only those that are heavily dependent on the current environment should be subject to Due Diligence. Measures that are within the vendor’s control such as; quality measures, on-time/on-budget delivery, training compliance, etc. are examples of tasks which would NOT require that the client achieve the performance targets for the vendor to accept accountability.  The most common examples of measures that DO require Due Diligence would include; System – Application Availability, Severity 1-2 Incident Resolution, Application Performance, N-1 Upgrade Compliance, etc.

Options for Resolving Due Diligence Issues

Activities undertaken pre-contract

Recommended Approach for Resolving Due Diligence Issues: In virtually every engagement there are applications that do not meet the system availability requirements and the patch levels for systems are behind the new targets. 

​

Simply exclude the offending systems from the SLA calculations until such time as they are either brought into compliance or the SLA performance targets are adjusted to reflect the current environment. 

​

The efforts to bring the systems into compliance can be performed by the client or the vendor can submit a project estimate to complete the work.

Service Commencement Date (SCD):

As part of the contracting process, the vendor and client will agree on a transition plan and timeline. Transition covers the period when the vendor secures the staffing to assume responsibility for the services and performs knowledge transfer from the incumbent resources.  The date that the vendor resources take over responsibility for services, referred to as “Service Commencement Date (SCD)”. 

​

The SCD acts as the anchor for date commitments, ex:   SCD + 30 would indicate that a deliverable is due 30 days after the Service Commencement Date. This eliminates the need for restatement in the event the go-live date moves.

SLA Penalty Activation:

​Service Levels are captured and reported beginning on the Service Commencement Date, however there are variations on when the penalty structure will begin and it differs by measure.  The reason for the delay is that while the new resources have been trained and operational, they have not had enough experience to have financial penalties applied and are given time to gain additional proficiency. A standard delay from SCD to penalties is approximately 3 months, i.e.  “SDO + 90”. 

​

In some situations, there is not enough information known about the environment so that service level performance targets can’t be established at the time of contracting.  These measures are deemed to need a “Baseline” which is where the actual performance of the vendor during the first 3 to 6 months of operation is used to calculate what the performance targets will be. For measures where the targets are agreed but the penalty start is delayed, those measures are referred to as “Burn-In”, which is the time given to the resources to come further up to speed while penalties do not apply.

Additional Key Contract Concepts for SLAs:

Activities negotiated during contracting

SLAs with 100% Targets:

SLAs with 100% Targets:   “perfection is not achievable” as such service levels should not have an expected performance rating of 100%.  There are very rare examples where the vendor may agree, such as; compliance with laws or company policy, adherence to training or certification requirements. But these should be the exception not the norm.  

Productivity on SLA Performance:

Often vendors will commit to improving the environment and eradicating systemic problems which will result in productivity savings.  Be sure that these productivity improvements are reflected in the agreement in such a way as the capacity of the vendor staff is reduced without proof that the output remains at the same level. For example, a commitment to increase development productivity could result in the reduction of developers without the same level of development work being delivered.

How we enhance IT Outsourcing SLAs & KPIs

How We Enable the Creation/Refinement of an SLA & KPI Management Structure: 

OAS will work with the appropriate parties in helping to establish a Performance Management structure and train the team to establish the necessary skills to manage the implementation and/or execution of the performance governance ongoing.  We can support your efforts during: design, contract negotiation, due diligence, baselining and ongoing run. The traditional SLA and KPI structure can be expanded to include a Balanced Scorecard which will incorporate measures that evaluate IT's impact on the business.  In this role, OAS can be your advisor in the room as you work with your IT vendors or work with all parties to enhance the execution of performance leadership and oversight.

How We  Facilitate Recovery of Service Performance in IT Outsourcing Engagements:

OAS will work with the delivery organizations when service delivery is not meeting performance targets or there is a disagreement over the application of commercial constructs that govern the handling of changes and/or exceptions. If performance targets are not being achieved, OAS will work across organizations to facilitate the identification of the Root Cause of the deviations and help construct a roadmap to success. In the event there is lack of alignment regarding how to move forward in obstacles with the current framework, OAS will facilitate joint solutioning sessions to arrive at agreed upon solutions.  

Reach out to us for a complementary review of your IT Performance Framework

bottom of page