Difference between revisions of "Software Support Lifecycle"

From Simulace.info
Jump to: navigation, search
(Result)
Line 57: Line 57:
  
 
== Result ==
 
== Result ==
 +
The conflicting goals of the task are obvious. On the one side, we wish to minimize the time of incident resolution. Unless we consider rearchitecting the support process, this is done mainly by deploying additional resources. On the other side, we wish to minimize the amount of deployed resources to optimize support costs and create room for margin generation.
 +
 +
The model shows, that a good compromise between these two goals can be reached with the following resource deployment:
 +
 +
{| class="wikitable"
 +
|-
 +
! Resource Type
 +
! Number of Resources
 +
|-
 +
| Junior Developer
 +
| 1
 +
|-
 +
| Senior Developer
 +
| 2
 +
|-
 +
| Standard Developer
 +
| 4
 +
|-
 +
| Standard Developer - Overtime
 +
| 2
 +
|}
 +
 +
The rest of this chapter will aim at providing supporting evidence for this conclusion. I will refer to the above configuration as 1-2-4-2 configuration
 +
 +
=== Utilization and Depreciating Returns on Additional Resources ===
 +
The utilization of the above configuration is as follows:
 +
 +
{| class="wikitable"
 +
|-
 +
! Resource Type
 +
! Utilization (%)
 +
|-
 +
| Junior Developer
 +
| 49.93
 +
|-
 +
| Senior Developer
 +
| 65.35
 +
|-
 +
| Standard Developer
 +
| 79.1
 +
|-
 +
| Standard Developer - Overtime
 +
| 83.09
 +
|}
 +
 +
Granted, the utilization above might seem low. Consider the case when we remove one Senior Developer from the team. Than the utilization will be as follows (1-1-4-2):
 +
 +
{| class="wikitable"
 +
|-
 +
! Resource Type
 +
! Utilization (%)
 +
|-
 +
| Junior Developer
 +
| 87.77
 +
|-
 +
| Senior Developer
 +
| 94.92
 +
|-
 +
| Standard Developer
 +
| 95.7
 +
|-
 +
| Standard Developer - Overtime
 +
| 96.45
 +
|}
 +
 +
This looks much better. It is however important to realize, that while near 100% utilization is good for product development teams, for support teams the situation looks different. There, utilization near 100% means very little headroom for situations where more than expected incident occur. To illustrate this, let’s compare average incident resolution times between 1-2-4-2 and 1-1-4-2

Revision as of 15:57, 24 January 2015

Problem Recap

A software firm was contracted to develop a new customer-facing solution for a major banking institution. As part of the negotiation process, an SLA needs to be reached. The banking institution provided required issue resolution times and asked the software firm to appropriately price the contract while provide reasoning for the contract pricing.

The software firm decided to create a simulation of a typical month of the support cycle as a basis for approximate the resources needed to provide the support.

Approach

The model consists of various severity incidents, represented as entities, and various development resources, represented as resources in SIMPROCESS. The model aims to represent a reasonably simplified version of the real development process.

The model needs to represent developer shifts, “emergency holding” (where developer does not work, but is available to start solving incidents in a reasonable amount of time) and overtime billing.

Model Structure

Entities

Incidents

There are several severity of incidents, represented as different types of entities. The severity of incident, apart from having different SLA requirements, differ in their flow throughout their process. Different severity incidents are generated using different rules. The SLA terms of different incidents can be found here.

Incident Type Severity (lower is less severe) Probability of Occurrence (per hour)
Standard 1 Nor(0.4, 0.25, 1)
Severe 2 Nor(0.2, 0.25, 1)
Critical 3 Nor(0.075, 0.25, 1)

It is important to note, that higher severity incidents can preempt lower severity incidents, which is desirable as higher severity incidents have more strict SLA terms.

While the normal distribution is sometimes considered problematic when using it to generate entities, due to the fact that a lot of real distributions are not symmetrical and instead are “right-leaning”, I believe that the normal distribution is sufficient for this scenario. An alternative shape that seem to be a bit more realistic was a beta distribution, but seeing the relatively small impact on the results, I chose a normal distribution, since it is far more accessible and requires less expertise to understand.

Technical Entities

Another type of entity in the system is a Release Trigger. The Release Trigger is responsible for triggering an automated software build every 24 hours.

Resources (Developers)

Developers are grouped into three tiers – standard, junior and senior. Each developer tier has different pricing (here) and might not be able to participate in all parts of the process. The developers get paid a fixed wage, regardless of their utilization. The developers work in the 8x5 mode. This is however problematic when dealing with high-severity incidents, which have strict SLA terms.

Therefore, a new tier has been added – “Developer – Standard – Overtime”. The role of this tier is to hold “emergency” in non-working hours of the day (17:00 – 9:00 on work days + whole weekends). Holding emergency means, that the developer is ready to immediately start resolving critical bugs from his home office. For this, the developer is compensated in the following way: The developer gets paid 10% of his standard hourly wage for every hour he holds emergency, regardless of the number of incidents (fixed cost). Apart from that, the developer gets paid for every hour he spends resolving incidents in the emergency hours (variable cost).

Support Process

The incident resolution process is as follows:

SoftwareSupport-Process.jpg

Things to note about the process:

  • Standard severity incidents are not eligible for hotfixing
  • Since junior developers do not have full knowledge of the system, they are excluded from the hotfix development and incident resolution activities
  • Hotfix development is a high-risk activity (deployed directly to production without proper testing), standard developers need to pair up when developing the hotfix
  • Critical incidents are released “out-of-band”, meaning they do not wait for the next release and are released individually

Result

The conflicting goals of the task are obvious. On the one side, we wish to minimize the time of incident resolution. Unless we consider rearchitecting the support process, this is done mainly by deploying additional resources. On the other side, we wish to minimize the amount of deployed resources to optimize support costs and create room for margin generation.

The model shows, that a good compromise between these two goals can be reached with the following resource deployment:

Resource Type Number of Resources
Junior Developer 1
Senior Developer 2
Standard Developer 4
Standard Developer - Overtime 2

The rest of this chapter will aim at providing supporting evidence for this conclusion. I will refer to the above configuration as 1-2-4-2 configuration

Utilization and Depreciating Returns on Additional Resources

The utilization of the above configuration is as follows:

Resource Type Utilization (%)
Junior Developer 49.93
Senior Developer 65.35
Standard Developer 79.1
Standard Developer - Overtime 83.09

Granted, the utilization above might seem low. Consider the case when we remove one Senior Developer from the team. Than the utilization will be as follows (1-1-4-2):

Resource Type Utilization (%)
Junior Developer 87.77
Senior Developer 94.92
Standard Developer 95.7
Standard Developer - Overtime 96.45

This looks much better. It is however important to realize, that while near 100% utilization is good for product development teams, for support teams the situation looks different. There, utilization near 100% means very little headroom for situations where more than expected incident occur. To illustrate this, let’s compare average incident resolution times between 1-2-4-2 and 1-1-4-2