Our paper about Interactive Curation of ML Pipelines was accepted to Sigmod 2019

Democratizing Data Science through Interactive Curation of ML Pipelines

2019/03/22

Democratizing Data Science requires a fundamental rethinkingof the way data analytics and model discovery is done. Availabletools for analyzing massive data sets and curating machine learningmodels are limited in a number of fundamental ways. First, existingtools require well-trained data scientists to select the appropriatetechniques to build models and to evaluate their outcomes. Second,existing tools require heavy data preparation steps and are often tooslow to give interactive feedback to domain experts in the modelbuilding process, severely limiting the possible interactions. Third,current tools do not provide adequate analysis of statistical riskfactors in the model development