Volume 14, Issue 2 (6-2022)                   itrc 2022, 14(2): 41-53 | Back to browse issues page


XML Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Fasanghari M, Bahrami H, Sadat Cheraghchi H. Clustering Large-Scale Data using an Incremental Heap Self-Organizing Map. itrc 2022; 14 (2) :41-53
URL: http://journal.itrc.ac.ir/article-1-503-en.html
1- Iran Telecommunication Research Center Tehran, Iran. , fasanghari@itrc.ac.ir
2- School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, New Zealand.
3- Iran Health Insurance Organization Tehran, Iran.
Abstract:   (1777 Views)
In machine learning and data analysis, clustering large amounts of data is one of the most challenging tasks. In reality, many fields, including research, health, social life, and commerce, rely on the information generated every second. The significance of this enormous amount of data in all facets of contemporary human existence has prompted numerous attempts to develop new methods for analyzing large amounts of data. In this research, an Incremental Heap Self-Organizing Map (IHSOM) is proposed for clustering a vast amount of data that continues to grow. The gradual nature of IHSOM enables environments to change and evolve. In other words, IHSOM can quickly adapt to the size of a dataset. The heap binary tree structure of the proposed approach offers several advantages over other structures. Initially, the topology or neighborhood relationship between data in the input space is maintained in the output space. The outlier data are then routed to the tree's leaf nodes, where they may be efficiently managed. This capability is supplied by a probability density function as a threshold for allocating more similar data to a cluster and transferring less similar data to the following node. The pruning and expanding nodes process renders the algorithm noise-resistant, more precise in clustering, and memory-efficient. Therefore, heap tree structure accelerates node traversal and reorganization following the addition or deletion of nodes. IHSOM's simple user-defined parameters make it a practical unsupervised clustering approach. On both synthetic and real-world datasets, the performance of the proposed algorithm is evaluated and compared to existing hierarchical self-organizing maps and clustering algorithms. The outcomes of the investigation demonstrated IHSOM's proficiency in clustering tasks.
Full-Text [PDF 1084 kb]   (847 Downloads)    
Type of Study: Research | Subject: Information Technology

References
1. X. Cui, P. Zhu, X. Yang, K. Li, and C. Ji, "Optimized big data K-means clustering using MapReduce," The Journal of Supercomputing, vol. 70, no. 3, pp. 1249-1259, 2014.
2. [2] R. M. Alguliyev, R. M. Aliguliyev, and L. V. Sukhostat, "Parallel batch k-means for Big data clustering," Computers & Industrial Engineering, vol. 152, p. 107023, 2021.
3. [3] Y. Zhang and Y.-M. Cheung, "Discretizing numerical attributes in decision tree for big data analysis," in 2014 IEEE International Conference on Data Mining Workshop, 2014: IEEE, pp. 1150-1157.
4. [4] S. K. Punia, M. Kumar, T. Stephan, G. G. Deverajan, and R. Patan, "Performance analysis of machine learning algorithms for big data classification: Ml and ai-based algorithms for big data analysis," International Journal of E-Health and Medical Communications (IJEHMC), vol. 12, no. 4, pp. 60-75, 2021.
5. [5] G. Li, Z. Liu, J. Lu, H. Zhou, and L. Sun, "Big data-oriented wheel position and geometry calculation for cutting tool groove manufacturing based on AI algorithms," The International Journal of Advanced Manufacturing Technology, pp. 1-12, 2022.
6. [6] P. Rebentrost, M. Mohseni, and S. Lloyd, "Quantum support vector machine for big data classification," Physical review letters, vol. 113, no. 13, p. 130503, 2014.
7. [7] [7] M. Tanveer, T. Rajani, R. Rastogi, Y. Shao, and M. Ganaie, "Comprehensive review on twin support vector machines," Annals of Operations Research, pp. 1-46, 2022.
8. [8] S. Lu, Y. Chen, X. Zhu, Z. Wang, Y. Ou, and Y. Xie, "Exploring Support Vector Machines for Big Data Analyses," in 2021 4th International Conference on Computer Science and Software Engineering (CSSE 2021), 2021, pp. 31-37.
9. [9] G. Teles, J. J. Rodrigues, R. A. Rabêlo, and S. A. Kozlov, "Comparative study of support vector machines and random forests machine learning algorithms on credit operation," Software: Practice and Experience, vol. 51, no. 12, pp. 2492-2500, 2021.
10. [10] V. D. Katkar and S. V. Kulkarni, "A novel parallel implementation of Naive Bayesian classifier for Big Data," in 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE), 2013: IEEE, pp. 847-852.
11. [11] L. Wang, X. Zhang, K. Li, and S. Zhang, "Semi-supervised learning for k-dependence Bayesian classifiers," Applied Intelligence, vol. 52, no. 4, pp. 3604-3622, 2022.
12. [12] R. Rahmadi and R. A. Rajagede, "Analisis Sentimen Politik Berdasarkan Big Data dari Media Sosial Youtube: Sebuah Tinjauan Literatur," AUTOMATA, vol. 2, no. 1, 2021.
13. [13] B. Liang and J. Austin, "A neural network for mining large volumes of time series data," in 2005 IEEE International Conference on Industrial Technology, 2005: IEEE, pp. 688-693.
14. [14] D. Aberdeen, J. Baxter, and R. Edwards, "92¢/mflops/s, ultra-large-scale neural-network training on a piii cluster," in SC'00: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing, 2000: IEEE, pp. 44-44.
15. [15] W. Höpken, T. Eberle, M. Fuchs, and M. Lexhagen, "Improving tourist arrival prediction: a big data and artificial neural network approach," Journal of Travel Research, vol. 60, no. 5, pp. 998-1017, 2021.

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.