A Novel data Pre-processing method for multi-dimensional and non-uniform data
We are in the era of data analytics and data science which is on full bloom. There is abundance of all kinds of data for example biometrics based data, satellite images data, chip-seq data, social network data, sensor based data etc. from a variety of sources. This data abundance is the result of the fact that storage cost is getting cheaper day by day, so people as well as almost all business or scientific organizations are storing more and more data. Most of the real data is multi-dimensional, non-uniform, and big in size, such that it requires a unique pre-processing before analyzing it. In order to make data useful for any kind of analysis, pre-processing is a very important step. This paper presents a unique and novel pre-processing method for multi-dimensional and non-uniform data with the aim of making it uniform and reduced in size without losing much of its value. We have chosen biometric signature data to demonstrate the proposed method as it qualifies for the attributes of being multi-dimensional, non-uniform and big in size. Biometric signature data does not only captures the structural characteristics of a signature but also its behavioral characteristics that are captured using a dynamic signature capture device. These features like pen pressure, pen tilt angle, time taken to sign a document when collected in real-time turn out to be of varying dimensions. This feature data set along with the structural data needs to be pre-processed in order to use it to train a machine learning based model for signature verification purposes. We demonstrate the success of the proposed method over other methods using experimental results for biometric signature data but the same can be implemented for any other data with similar properties from a different domain.
READ FULL TEXT