Do data science teams use (or think they use) an agile data science team process framework? To help find out, my student (Sucheta Lahiri) recently presented an academic paper that reported on data science agility across 16 organizations.
The key takeaway was that even if your team thinks you are being “agile”, most likely, there are several key aspects of agility missing from your team’s process.
Specifically, most agile concepts are critical to improving a data science project’s success. However, teams did not use these concepts (even in teams that reported they use used an agile framework). This means that teams do not realize that they are not getting the key benefits of an agile framework (e.g., improving customer communication, focusing on the highest priority items).
Is your team in this situation? Read below to find out.
Who was interviewed?
Data scientists and data science managers from 16 diverse organizations were interviewed. Note that, in order to have non-biased results, none of the participants were part of our DSPA community.
As shown in the table below, everyone interviewed was an experienced data science professional, with 5 to 32 years of work experience. The data scientists worked in large and small teams, for North American and European organizations.
Aspects of data science agility teams used
While 62% of the organizations reported using a data science agile framework, none actually used all the key agile concepts.
Many organizations defined a process that incorporated one or more aspects of an agile process. However, these custom-defined process frameworks did not leverage the key aspects of an agile framework. For example, they did not have small incremental iterations during the project, with feedback from product owners / stakeholders.
The other organizations (38%) used a proprietary / ad-hoc approach without any agile concepts, often based on a proprietary data science life cycle.
Digging a little deeper
Only 12% of the teams used iterations – and none of those organizations had a product owner (or stakeholder) giving feedback on those iterations.
Only 18% of the organizations had a clearly-defined product owner (but none of those organizations used iterations).
The table below has more information on which aspects of agility the teams used.
How teams actually execute their projects
At a high level, we found that teams used one of the following four process frameworks.
Ad Hoc/Proprietary Process
Teams that used an ad-hoc/proprietary process did not try to leverage a life cycle approach, nor use any agile-specific concepts. Some of the teams focused on trusting the data science team lead, while others had a meeting to help ensure consistent information sharing across the team.
For example, one organization used a traditional project plan, with defined dates/milestones that were tracked. A different organization did not follow a formal project management framework (but did think of themselves as agile). Rather, that organization started with broad questions, such as “who is going to sign off on the outcome of the project”, “what are our objectives”, and “what are our risks”. These are all good questions, however, it was not clear how the team knew they had good answers to these questions.
Proprietary Life Cycle Focused Process
These organizations used a proprietary process, which leveraged a phased approach. In a phased approach, each phase is part of a data science life cycle. While some organizations leveraged a life cycle similar to CRISP-DM, other organizations started from scratch to define the phases that made the most sense for their organization.
These organizations were waterfall-like in their process (since they focused on each project going through each phase of their life cycle). However, the teams often tried to be somewhat agile, such as being able to go back to a previous phase (as one can do in CRISP-DM).
For example, one organization leveraged CRISP-DM for its life cycle. As a reminder, CRISP-DM is the most commonly used data science life cycle framework, and has 6 phases (business understanding, data understanding, data preparation, modeling, evaluation, and deployment).
However, a different organization used a proprietary life-cycle-focused team process (which they called ML flow). Their life cycle had the following phases:
(1) requirement understanding
(2) data collection
(3) data engineering
(4) building the model
(5) model training
(6) model validation
(7) model deployment in their production environment
(8) performance monitoring.
This organization stated that their ML flow was used in conjunction with an agile approach, but in reality, their life cycle steps were given precedence over agility, and the team typically followed a phased-based process, based in their ML flow life cycle, without any specific agile concepts leveraged.
Proprietary Process with Some Agile Concepts
Within the organizations that used a proprietary process with some agile concepts, they typically used some aspects of Scrum or Data Driven Scrum (they tailored the agile framework to their needs). However, none of the organizations in this category adhered to the Scrum Guide (or any other agile framework). Specifically, none of these organizations tried to use the concept of an iteration.
For example, one team viewed themselves as agile, since the team collectively worked to define a project plan, as well as the fact that interactions across the team were viewed as more important than the process artifacts. They also used a Kanban-like board to track project progress via swim lanes and named columns. However, they also defined an overall project plan, with dates and deliverables, and struggled to get feedback from stakeholders.
In a different example, the organization defined a methodology that focused on a project management committee and a series of meetings. For example, the team did have a daily meeting. However, their daily meeting was a status meeting, with a minimal focus on typical daily topics (i.e., such as roadblocks stopping work being done today).
Yet a different organization viewed their approach as Agile since they focused on what they referred to as POD (product-oriented delivery). However, much of their process was a more traditional phased project approach with separate teams doing business analysis, development, and testing. Using their POD approach, the organization leveraged some agile concepts, including the notion of a Scrum Master (even though the team did not fully follow Scrum), but that role did help the team think through the process they used.
Proprietary Process with Well-Defined Iterations
Teams in this category identified the benefit of small iterations. Note that these iterations were often not Scrum sprints. For example, for some teams, their iterations were planned in advance, and for other teams, iterations did not have a fixed length time-boxed duration (which is similar to DDS iterations).
For example, one organization divided a project into smaller increments that could provide quicker deliverables. This team was concerned that big projects require significant time and money to execute, which was difficult to deliver. Hence, bigger projects were divided into smaller, more attainable, efforts. However, these smaller increments were very loosely defined.
For another organization, their framework was a proprietary modification of Scrum, with pre-defined sprints. This organization’s typical project had a project plan, with ten defined sprints. Due to the pre-defined sprints, the process also could be considered a phased approach.
Does your team have data science agility?
Does your team’s process framework leverage data science agility? Or does your organization use a process that matches one of these approaches I previously described?
If you have gotten this far in the post, thank you for reading the entire post!