Loss of Documentation/Context:

Bappy10 · Post by **Bappy10** » Tue May 27, 2025 4:47 am

The Problem: The statistical properties of the target variable (what you're trying to predict) change over time. The relationship between input features and the output changes.
Example: A fraud detection model trained on historical fraud patterns might fail if new, sophisticated fraud methods emerge that deviate from the original concept of fraud.
Data Skew/Bias Amplification:

The Problem: The dataset starts with, or develops, inherent biases dataset that don't reflect the real-world distribution or unfairly represent certain groups. This can happen if collection methods change or specific data points become over/under-represented.
Example: A facial recognition dataset that disproportionately contains images of people from one demographic might fail to accurately identify individuals from other demographics, especially in a diverse country like Bangladesh.
Poor Data Quality Degradation:

The Problem: Over time, data can become corrupted, inconsistent, duplicated, or riddled with errors due to flawed entry, migration issues, or lack of proper governance.
Example: Customer contact information in a CRM becomes a mess of incorrect phone numbers and email addresses due to manual entry errors or lack of validation, rendering it useless for marketing.
Loss of Relevance (Feature Drift):

The Problem: The importance or meaning of input features changes. Some features might become irrelevant, or new, crucial features emerge that aren't captured.
Example: A model predicting housing prices based heavily on landline phone ownership would become less effective as mobile phone adoption soared and landlines declined.