A girl biting on a pencil stressed about a quiz. There is text on the image. It reads: What data team member are you? Take the quiz to go find out!

Data Drift

Share icon

When your model suddenly starts making terrible predictions because the real world refused to stay the same.

Data Drift

Data drift refers to the phenomenon where the statistical properties of the input data to a machine learning model change over time, leading to a decline in the model's performance. This shift can occur due to various factors, including changes in the underlying data distribution, evolving user behavior, or external environmental influences. Data drift is particularly critical in production environments where models are deployed to make real-time predictions based on incoming data. Understanding and managing data drift is essential for data scientists, machine learning engineers, and data governance specialists, as it directly impacts the accuracy and reliability of predictive models.

Data drift can be categorized into two main types: covariate shift, where the distribution of input features changes, and prior probability shift, where the distribution of the target variable changes. Detecting data drift involves statistical tests and monitoring techniques that assess the performance of models over time. If not addressed, data drift can lead to model obsolescence, necessitating retraining or adjustment of the model to align with the current data landscape.

In practice, data drift is managed through various strategies, including continuous monitoring, retraining models with fresh data, and employing synthetic data to simulate potential future scenarios. By proactively addressing data drift, organizations can maintain the integrity of their machine learning systems and ensure that they continue to deliver accurate insights and predictions.

Example in the Wild

"It's like realizing your favorite coffee shop changed their brew recipe—suddenly, your morning model predictions taste a bit off!"

Alternative Names

  • Covariate Shift
  • Model Drift
  • Concept Drift

Fun Fact

Data drift was first formally recognized in the machine learning community during the early 2000s, but the concept has roots in statistics dating back to the 19th century, when statisticians began to notice that data collected over time could exhibit changes that affected analysis outcomes.

Data Drift
An ad for Secoda which says, experiencing metadata migraines? Ask your data engineer about Secoda.
URBAN DATA DICTIONARY IS WRITTEN WITH YOU
Submit a word
The ad reads "When it comes to your valuable data, don't leave it to chance! Contact us". With a mother and baby looking at a computer together while sitting in a kitchen.An image of a book mock up called "The State of Data Governance in 2025" by Secoda. Below the image there's text that reads" The state of Data Governance in 2025. Download the report."