A Research on Big Data Clustering with Improvisation in K-Means Clustering using Semi Supervised Clustering

Authors

  • Bindu Rani
  • Shri Kant

DOI:

https://doi.org/10.29027/IJIRASE.v2.i11.2019.362-367

Abstract

The rapid revolution and adoption of big data by organizations has changed the approaches for using sophisticated information technologies as well as to gain insight knowledge for proactive decisions making. This data-oriented concept is remarkable as data is generated and available easily via various living (normal users) as well as non living media (sensors, web media etc) also and is increasing exponentially at rapid pace. Due to advancement in technologies, data storage is not trouble but how to analyze data is a major issue. Taking into account analysis of data, considerable data mining techniques are association, classification, clustering and regression analysis. These techniques have position in the design phase of Decision making process. Clustering have the property to acquire knowledge from data and can be considered the best technique to improve decision making process. Existing clustering algorithms are appropriate for small data sets but for big data or real life data it is challenging task, no unique algorithm for clustering can be applied directly. Scaling, correct parameterization, parallelization, cluster validity are some problems in using clustering techniques. In consideration of all aforementioned problems, continuous efforts are being made by data mining researchers. Big data Clustering techniques are discussed in this paper with main focus on unsupervised K-means clustering algorithms and their limitations. In addition with unsupervised clustering, semisupervised clustering methods are also reviewed and

Author Biographies

Bindu Rani

Sharda University,Greater Noida,India

Shri Kant

Greater Noida,India

Additional Files

Published

15-05-2019