Adapting CRISP-DM for Idea Mining: A Data Mining Process for Generating Ideas Using a Textual Dataset

by   W. Y. Ayele, et al.

Data mining project managers can benefit from using standard data mining process models. The benefits of using standard process models for data mining, such as the de facto and the most popular, Cross-Industry-Standard-Process model for Data Mining (CRISP-DM) are reduced cost and time. Also, standard models facilitate knowledge transfer, reuse of best practices, and minimize knowledge requirements. On the other hand, to unlock the potential of ever-growing textual data such as publications, patents, social media data, and documents of various forms, digital innovation is increasingly needed. Furthermore, the introduction of cutting-edge machine learning tools and techniques enable the elicitation of ideas. The processing of unstructured textual data to generate new and useful ideas is referred to as idea mining. Existing literature about idea mining merely overlooks the utilization of standard data mining process models. Therefore, the purpose of this paper is to propose a reusable model to generate ideas, CRISP-DM, for Idea Mining (CRISP-IM). The design and development of the CRISP-IM are done following the design science approach. The CRISP-IM facilitates idea generation, through the use of Dynamic Topic Modeling (DTM), unsupervised machine learning, and subsequent statistical analysis on a dataset of scholarly articles. The adapted CRISP-IM can be used to guide the process of identifying trends using scholarly literature datasets or temporally organized patent or any other textual dataset of any domain to elicit ideas. The ex-post evaluation of the CRISP-IM is left for future study.


page 1

page 2

page 3

page 4


A toolbox for idea generation and evaluation: Machine learning, data-driven, and contest-driven approaches to support idea generation

The significance and abundance of data are increasing due to the growing...

CASP-DM: Context Aware Standard Process for Data Mining

We propose an extension of the Cross Industry Standard Process for Data ...

A Systematic Literature Review about Idea Mining: The Use of Machine-driven Analytics to Generate Ideas

Idea generation is the core activity of innovation. Digital data sources...

A Pipeline for Analysing Grant Applications

Data mining techniques can transform massive amounts of unstructured dat...

A contextual analysis of multi-layer perceptron models in classifying hand-written digits and letters: limited resources

Classifying hand-written digits and letters has taken a big leap with th...

How to Recognize Actionable Static Code Warnings (Using Linear SVMs)

Static code warning tools often generate warnings that programmers ignor...

Computer-Aided Data Mining: Automating a Novel Knowledge Discovery and Data Mining Process Model for Metabolomics

This work presents MeKDDaM-SAGA, computer-aided automation software for ...

Please sign up or login with your details

Forgot password? Click here to reset