Context-Driven Data Mining through Bias Removal and Data Incompleteness Mitigation

10/19/2019
by   Feras A. Batarseh, et al.
0

The results of data mining endeavors are majorly driven by data quality. Throughout these deployments, serious show-stopper problems are still unresolved, such as: data collection ambiguities, data imbalance, hidden biases in data, the lack of domain information, and data incompleteness. This paper is based on the premise that context can aid in mitigating these issues. In a traditional data science lifecycle, context is not considered. Context-driven Data Science Lifecycle (C-DSL); the main contribution of this paper, is developed to address these challenges. Two case studies (using data-sets from sports events) are developed to test C-DSL. Results from both case studies are evaluated using common data mining metrics such as: coefficient of determination (R2 value) and confusion matrices. The work presented in this paper aims to re-define the lifecycle and introduce tangible improvements to its outcomes.

READ FULL TEXT
research
06/28/2020

Data Science: Challenges and Directions

While data science has emerged as a contentious new scientific field, en...
research
01/12/2023

Open Case Studies: Statistics and Data Science Education through Real-World Applications

With unprecedented and growing interest in data science education, there...
research
09/07/2020

Text Mining over Curriculum Vitae of Peruvian Professionals using Official Scientific Site DINA

During the last decade, Peruvian government started to invest and promot...
research
03/12/2018

Data Science Methodology for Cybersecurity Projects

Cyber-security solutions are traditionally static and signature-based. T...
research
09/19/2023

In Consideration of Indigenous Data Sovereignty: Data Mining as a Colonial Practice

Data mining reproduces colonialism, and Indigenous voices are being left...
research
05/14/2010

Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets

Data analysis and data mining are concerned with unsupervised pattern fi...

Please sign up or login with your details

Forgot password? Click here to reset