Data Scientist
hardds-data-leakage

What is data leakage and how do you prevent it in ML projects?

Answer

Data leakage happens when training uses information not available at prediction time. Common causes: - Using future data in features - Leakage through target encoding - Improper train/test splitting (same user in both) Prevent with strict splitting rules, feature audits, and pipeline design that mirrors production inference.

Related Topics

Best PracticesMachine LearningData Science