What is TDSP?

by | Last updated Mar 31, 2024 | Life Cycle

Team Data Science Process

If you combine Scrum and CRISP-DM, you will get something that looks like Microsoft’s Team Data Science Process. Launched in 2016, TDSP is “an agile, iterative data science methodology to deliver predictive analytics solutions and intelligent applications efficiently.” (Microsoft, 2020 ).

This is a modern data science process that combines elements of the core data science life cycle, software engineering, and Agile processes.

Microsoft Team Data Science Process

Team Data Science Lifecycle (Based on Microsoft, 2020)

TDSP Components

TDSP has four main components:

  • A data science lifecycle definition
  • A standardized project structure
  • Recommended infrastructure and resources
  • Recommended tools and utilities

It comes as no surprise that these two often leverage Microsoft Azure; however, a team could use other tech stacks and still adhere to TDSP.

TDSP Life Cycle

Although the lifecycle graphic looks quite different,  TDSP’s project lifecycle is like CRISP-DM and includes five iterative stages:

  1. Business Understanding: define objectives and identify data sources
  2. Data Acquisition and Understanding: ingest data and determine if it can answer the presenting question (effectively combines Data Understanding and Data Cleaning from CRISP-DM)
  3. Modeling: feature engineering and model training (combines Modeling and Evaluation)
  4. Deployment: deploy into a production environment
  5. Customer Acceptance: customer validation if the system meets business needs (a phase not explicitly covered by CRISP-DM)

Team Definition

TDSP addresses the weakness of CRISP-DM’s lack of team definition by defining six roles:

  • Solution architect
  • Project manager
  • Data engineer
  • Data scientist
  • Application developer
  • Project lead

Microsoft defines the relevant tasks and artifacts for many of the team roles during each phase of the project life cycle.

Standardized Resources

Microsoft also provides standardized project documents such as project charters and data reports, infrastructure and resources for data science projects, and tools and utilities for project execution. The use of some of these artifacts is also mapped to the five phases.


Evaluation

Pros

  • Agile: Emphasizes the need for incremental deliverables.
  • Familiar: The product backlog, features, user stories, bugs, Git versioning, and sprint planning are familiar to those used to common software practices.
  • Data Science Native: TDSP acknowledges that data science and software engineering are different, and is built for data science teams working on production-bound projects.
  • Flexible: TDSP can be implemented as it is defined or in conjunction with other approaches such as CRISP-DM.
  • Thorough: Because of its rich team focus and detailed documentation, TDSP is arguably the most mature CRISP-derived project management approach. It is conceptually similar to Domino Data Lab’s Lifecycle but is more detailed.
  • Free Templates: Go to Microsoft Azure’s GitHub repository to get started.

Cons

  • Fixed Sprints: TDSP leverages fixed-length planning sprints which many data scientists struggle with.
  • Some Inconsistencies: Not all of Microsoft’s documentation is consistent.

Bottom Line

TDSP is a good option for data science teams who aspire to deliver production-level data science products. It may not be appropriate for one-team data scientists or for projects without a production goal.

Learn More

  • 72 Min Video from Open Data Science walking through a project using TDSP

Finally...a field guide for managing AI projects!

Artificial Intelligence is unique. It's time to start managing it as such.

Get the jumpstart guide to manage your next project.

Plus get monthly tips on managing AI projects and products.

You have Successfully Subscribed!

Share This