Author: Anton Sverlov VSE email: svea01@vse.cz Simulation in Simprocess

Problem definition

At the time I am working in an IT team of a start-up insuretech company that offers life insurance online and provides its clients with the application which makes a score of client’s physical activities - the more the client does sport the bigger cashback from the price of the insurance contract he can get. As that company is software-based, the IT team plays a big role, but unfortunately it did not have a proper process of solving business requirements. The following sections will describe the new process of a team (implemented recently) and the simulation is supposed to help to identify the bottlenecks of the process, meanwhile, the results from the simulation are supposed to help to come with the solution about the right distribution of the resources in order to boost the effectiveness of the team.

Questions

There are several possible solutions the management of the company is discussing:

Hire 3 more full-stack developers
To "grow" one od the current developers into the Techlead main assistant
To provide the team members with learning sessions (but not sure which ones)
Add one more analyst to analyze the task faster

The following model is supposed to help the author of this work to choose the right move.

Method

Simprocess is a relevant tool for such kind of problem mainly because the aforementioned situation can be described as discrete. There is a clear entity (business requirement or bug) which goes through a series of operations (development process), it includes different type of events (delays, “fight” for the resources etc.) which creates queues. Simprocess and the process modeling also allows the author of this model to include decision points and define time delays by a probability distribution. The question stated in the problem section could be answered using different scenarios.

Model

In order to create a Simprocess model, the author of this works has to describe the process of the given Development team, resources available, historical information available, and limitations of the model that the author included.

Requirements gathering process

The main operation, in this case, is the procedure for registering requests. In this case, a requirement is understood as any formally described condition that the created solution must satisfy.

In particular, the requirements are:

New functionalities or change requests
Errors identified during internal or external testing on already running solution

Requirements can come from various sources, for example:

Company management
Customers (through the Customer care line)
Tester as part of a project team

All the requirements are placed into Backlog in the Jira software tool and are monitored.

Resources in the simulation

Product owner

Formation of plans
Monitoring the implementation of plans
Organizational work (including the sources of the requirements)
Conceptual architecture of the solution
Part of the analytical work

The product owner is the first contact to deal with business requirements. He is in close contact with the analyst when it is decided that a task is ready for backlog (business case or cost-benefit analysis of new functionality is done; in case of defect management – bugs are either accepted or specified for the Debugging processes).

1x Analyst

Development of technical specification for functionality
Development of test plans
Conceptual testing of functionality
Development of user documentation

The analyst is available only 2 days a week

1x Techlead

Conceptual consultancy
Code review control
Production release

Techlead is able to solve Code reviews and Releases only 2 days a week, rest of the time is allocated to management activities and cooperation on the other projects.

9x Developers

Development of functionality
Correction of errors in the code
Conducting initial testing of the code
Participates in complex code testing

All developers are full-time employees with the same level of seniority

1x Tester

Functionality testing in all of the environments
Writing UAT tests

Development process for the model

For the development process, the team has chosen the Kanban approach under the condition that the FIFO principle will be used (the first task in – the first task out). The following process describes all the phases from the moment the task was passed to the team to its release.

The team’s Kanban board includes 8 phases each task goes through:

To develop
After identifying the requirements to be developed within the framework of this version, a detailed development plan for this version is drawn up and added to Kanban Board. Then, according to the plan, the Developer, together with the Techlead (in terms of conceptual architectural solutions), based on the technical specification, prepares the work draft.
In progress
After agreeing on the working draft, the Developer proceeds to its implementation. For sufficiently large requirements, the functionality) is implemented in several parts.
Code review
After the development of one part, the code quality is checked by the Techlead. If deficiencies are revealed, the Developer eliminates them.
Acceptance
If needed, the functionality demonstrated to the Business owner on the development environment already (rare cases).
Testing
After the implementation of all parts of one requirement, the Developer demonstrates the developed functionality to the Tester who is testing for compliance with user needs. If inconsistencies with the needs of users are revealed, the changes are implemented and the functionality is immediately finalized (in case of minor comments that do not require changing of the business logic and the architectural model).In case of successful concept testing, the Developer proceeds to the implementation of the next requirement in accordance with the version work plan.
Stage release
The functionalities that are tested on DEV are prepared for the stage release and waiting for the Techlead to prepare the release.
UAT
The tester repeats the testing on Stage environment
Done
All the tasks which are ready for the PROD release are moved to the column Done and are waiting for the Techlead. After all the improvements planned in the version have been implemented, the Techlead builds the version and generates Release notes, in which:

Version parameters are described (number, release date etc.)
Requirements implemented in the version are listed (new functionality, fixed errors)
Contains version deployment information (steps)

Also, after agreeing on the new functionality, the Analyst updates the user documentation for the solution.

Model presumtions and limitations

After the thorough analysis of the team the author of this work decided to create a model with the following presumptions and limitations:

The model presumes that the reaction of the product owner is that agile that no irrelevant tasks appear at the Kanban board and the cycle of the team is fast enough, that this task is never moved to the old backlog during the development process
The model presumes that the analysis is done with the FIFO principle
The model presumes that the seniority of all of the developers (resources) is exactly the same (while in reality even if all the developer are similarly senior it does not mean that they are able of exactly the same performance on all type of tasks, or have the same performance every day)
This model excludes the impact of unpredictable factors (like new legal requirements, that can get higher priority and be handed over directly to the dev team and implemented as a hotfix)
This model excludes the phase of Acceptance because of this phase is often jumped over and even though it in reality this fact often leads to the big list of bugs reported later, this is not the objective of this exact model.
This model does not separate the process Testing and UAT testing, because UAT is considered to be insignificant for the purpose of this model
Bug are supposed to be less priority and always to be served after Change requests (in reality some of the bugs are that huge, that they are served earlier
There were not sufficient statistics about the code review quality, but it was estimated like that by the Techlead of the team. There were not enough statistics about the testing failure, estimation by the tester was used for the purpose of this simulation

Values in the model

Generators "Change request" and "Bugs"

For both of the generators, the author used the presumption of Poisson distribution 5 requests/day and 2 bugs/day that was transformed into Exponential distribution Exp(1.6) an Exp(4.0) respectively (with the time unit Hour). This estimation was done based on the data from 5.10.2020 - 2.1.2020 (90 calendar days) exported from Jira. The data had to be cleaned due to incoming requests on non-business days, statistical outliers ( days when somebody created a number of irrelevant to this model sub-issues or migrated a number of issues from the old backlog). From the cleaned data we can derive that the average and the variance of the final set of data are close enough, which were used as a hit to suppose that the distribution of this value could be considered to be Poisson distribution.

Batch "Product owner"

For the effective process of analysis and prioritization, the product owner collects several requests all at once and during meetings with the analyst explains the tasks for further specification. A value of 10 was chosen from the personal experience and intuitive estimation. The statistically derived value was considered to be irrelevant for the purpose of this model.

Delay "IT Analyst"

The time spent on analysis Exp (1) days for the batch from the Product owner was derived from the average time based on the usual process when the Analyst and Product owner have a 1to1 and the other day assigns the tasks to the team members. For the purposes of this simulation Analyst automatically moves requests to the backlog/ Kanban

Delay and forks in To develop / Code review / Testing/ Release

In this case, the name of the Delay function "To develop" is not the same as in the real Kanban board and serves as an actual process of development, while the number of the tasks in this activity symbolizes both backlog and task that are being developed at the same time. Exp (2) day was chosen because of the team policy to decompose the requirements into requests with the value of 2 to 3 MD. All of the following values are coming from the empirical data and interviews with the tech lead and tester who estimated the current effectiveness of the team as follows:

Half of the tasks due to the blank in the knowledge of the business logic and code clearance best practices have to be revised
For the purposes of the code review Techlead organizes 1,5 h sessions
A rough estimation of 10 % probability of mistakes at the Testing stage was done
Preparation of the release was estimated for 0.5MD (the estimation is reliable because the Techlead considered this task as his routine activity)

Results and conclusion

In order to answer the question stated in the first part there were run 100 replication of three scenarios of the simulation in the time frame from 1.1.2021 to 31.12.2021:

Default Scenario: Default settings
Scenario: 3 more developers are added to the resources
Scenario: One of the developers is moved to the Techlead position
Scenario: Learnings and sessions for the team

Default scenario

From the average result of 100 simulation it is 618 Change requests and 771 Bugs that were not solved out of 1713 and 780 respectively.

From the Default scenario, we can see that the analyst and tester are definitely not the bottlenecks of the process because their capacities are unused most of the time (apx. 50 % and 85 % respectively). For the management of the company it gives a clear hint that they can be used for the other tasks as well or the tasks are stuck somewhere else and the resources are waning for the previous process. (this will be covered by the other scenarios)

It is obvious that the main reason for the inefficiency of the process is that the entities are stuck and waiting for the developers.

Three more developers

In the next scenario, the number of Resources of the Resource type Full stack developer is extended to 12.

It is interesting to observe that extended number of the developers has just insignificantly impacted the result of the team.

And loaded the capacities of the testing by 1 per cent only.

Thus, both hypothesis from the previous two paragraphs were incorrect.

Two Techleads

For the next scenario, one of the most potential developers was promoted to the assistant of the techlead and will be helping with the Code reviews and Releases. For that, simulations are run with 8 units of Full stack developer resource and 2 units for Techlead.

The result of this change was worse than expected. It did a minor change in the numbers of requests processed in the system, but even worsened the usage of the resources.

Learnings and the sessions for the team

Less obvious, but seemingly the most effective scenario is that the team has gone through the sessions on code clearance, business logic, and the check lists for the code review were created. In the model this situation means that the KPI for the probability that Code review was successful is set to 0.8 instead of 0.5. From the business point of view, it means that the KPI of the team is to deliver better code and reduce need for the rework.

For this 100 simulations are run with the value 0.8 of the connector Yes outgoing from the Branch Is the code ok? and value 0.2 for the connector No.

This move had several outcomes:

Increased the number of the tasks which exist the system within the given timeframe.
Increased the usage of the tester's capacity
Helped to reduce the Average cycle time of being in the process by 2 weeks for the Change requests
Reduced number of task waiting in the "Backlog" (delay To develop)

Conlusion

If answering the question from the main part of the Problem section one can state that the best scenario is the last one and the question is which activities can guarantee the KPI. Also, tester and analyst can be allocated to other projects to fulfill their capacity. The policy of Change request goes always first leads to huge technical debt, do another prioritisation policy should be found, but it was not the purpose of this simulation. I would like to emphasise that adding more resources is an obvious solution, but as simulation showed not effective one, even though it could be overseen by the managements effectiveness of the team is the main component.

Code

PS. I was unable to upload three other scenarios due to the error This file contains HTML or script code that may be erroneously interpreted by a web browser. Therefore I am sending a link to the other scenarios that are placed on the Disk https://drive.google.com/drive/folders/1RW-cRiQVDGPxrkpDmBnFokCtI7nnbwEb?usp=sharing

IT Team simulation

Contents