Five Agentic Workflows Automate Data Science Pipelines as 45% of Time Still Goes to Data Prep
Updated
Updated · KDnuggets · Jun 26
Five Agentic Workflows Automate Data Science Pipelines as 45% of Time Still Goes to Data Prep
3 articles · Updated · KDnuggets · Jun 26
Summary
Five workflows are laid out for the main stages of a data science pipeline: automated EDA, feature engineering and selection, hyperparameter tuning, model monitoring, and self-healing orchestration.
Roughly 45% of a data scientist’s time still goes to repetitive preparation and cleaning work, the article argues, making tasks like null checks, EDA scripts, parameter search, and monitoring rules suitable for agent automation.
Concrete examples show the agents profiling a 5,000-row retail dataset in under 30 seconds, lifting RandomForest AUC on the 48,842-row Census Income dataset from 0.87 to 0.91, and triggering retraining when drift turns severe.
Databricks and similar platforms are already embedding agentic capabilities into core infrastructure, but the article stresses that humans still review model choices, thresholds, and production fixes while agents handle procedural work.