Data Quality Issues in Big Data and Machine Learning Applications: Going Beyond Data Cleaning and Transformations
A Workshop Held in Conjunction with the 2017 IEEE International Conference on Big Data
11 - 14 December 2017
Data Quality Issues in Big Data and Machine Learning Applications: Going Beyond Data Cleaning and Transformations workshop will be held in conjunction with the 2017 IEEE International Conference on Big Data, Boston, Massachusetts, 11 - 14 December 2017.
Minor data errors can cause major damage in Big Data applications. Damages manifest in various forms including loss of revenue, operational inefficiency, and regulatory compliance failure. Moreover, these errors cascade through downstream applications and exacerbate damages. The goal of this workshop is to bring together data quality researchers and industry practitioners to share their ideas and best practices, identify and define important problems to further the field.
Scope of Research Topics for the Workshop
- Contextualizing vendor data by defining intended use-specific validity and consistency checks.
- Reconciling differences and cross-linking data from multiple data vendors.
- Algorithms and approaches for spotting outliers and inconsistent data.
- Statistical and mathematical models for deriving missing data.
- Deterministic and probabilistic approaches to detecting duplicate data.
- Data quality metrics and trustworthiness.
- Maintaining data validity and consistency across recent and older datasets.
- Data quality role in high-level semantic frameworks such as schema.org
- Data quality issues in Knowledge Graph driven semantic search.
- Data quality improvements through visual analytics.
- Data quality aspects that contribute to gender bias in machine learning.
- Data quality issues in application domains including but not limited to sensor data streams, linked data, data integration, scientific workflows, machine learning for natural language understanding, Internet of Things (IoT), prediction models in empirical software engineering, team software process frameworks, cyber-physical systems, assisted living systems, citizen science, and drug databases.
Please submit a full-length paper (up to 10 pages IEEE 2-column format) through the online submission system.
Paper Formatting Instructions/Templates
Papers should be formatted to IEEE Computer Society Proceedings Manuscript Formatting Guidelines. Use the templates shown below.
- 8.5" x 11" Word template downloadable from here.
- 8.5" x 11" Word template (PDF) downloadable from here.
- LaTeX formatting macros downloadable from here.
- Oct 10, 2017: Due date for full workshop papers submission
- Nov 1, 2017: Notification of paper acceptance to authors
- Nov 15, 2017: Camera-ready of accepted papers
- Dec 11 - 14, 2017: Workshop window
For workshop related questions, please contact any of the following organizers:
- Venkat N. Gudivada, East Carolina University, Greenville, North Carolina. email: email@example.com
- Junhua Ding, East Carolina University, Greenville, North Carolina. email: firstname.lastname@example.org
- Srividya Bansal, Arizona State University, Tempe, Arizona. email: Srividya.Bansal@asu.edu
- Dr. Kemafor Anyanwu Ogan, Department of Computer Science, NC State University, Raleigh, North Carolina.
- Dr. Paolo Papotti, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University.
- Haricharan Ramachandra, Senior Engineering Manager, LinkedIn, Mountain View, California.
- Dr. Maksims (Maks) Volkovs, Co-Founder \& Principal Data Scientist, Layer6 AI (http://layer6.ai/), Toronto, Canada.
- Dr. Roberto V. Zicari, Frankfurt Big Data Lab, Goethe University Frankfurt, Germany.