Identifying Persian bots on Twitter; which feature is more important: Account Information or Tweet Contents? - International Journal of Information and Communication Technology Research

Volume 15, Issue 1 (Special Issue on AI in ICT 2023) itrc 2023, 15(1): 35-44 | Back to browse issues page

‎ 10.61186/itrc.15.1.35

Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Mazoochi M, Asadi N, Rahmani F, Rabiei L. Identifying Persian bots on Twitter; which feature is more important: Account Information or Tweet Contents?. itrc 2023; 15 (1) : 4
URL: http://ijict.itrc.ac.ir/article-1-534-en.html

Identifying Persian bots on Twitter; which feature is more important: Account Information or Tweet Contents?

Mojtaba Mazoochi

¹, Nasrin Asadi²

, Farzaneh Rahmani²

, Leila Rabiei³

1- Information Technology Research Faculty ICT Research Institute Tehran, Iran mazoochi@itrc.ac.ir , mazoochi@itrc.ac.ir
2- Information Technology Research Faculty ICT Research Institute Tehran, Iran mazoochi@itrc.ac.ir
3- Information Technology Research Faculty ICT Research Institute Tehran, Iran

Abstract: (1357 Views)

The spread of internet and smartphones in recent years has led to the popularity and easy accessibility of social networks among users. Despite the benefits of these networks, such as ease of interpersonal communication and providing a space for free expression of opinions, they also provide the opportunity for destructive activities such as spreading false information or using fake accounts for fraud intentions. Fake accounts are mainly managed by bots. So, identifying bots and suspending them could very much help to increase the popularity and favorability of social networks. In this paper, we try to identify Persian bots on Twitter. This seems to be a challenging task in view of the problems pertinent to processing colloquial Persian. To this end, a set of features based on user account information and activity of users added to content features of tweets to classify users by several machine learning algorithms like Random Forest, Logistic Regression and SVM. The results of experiments on a dataset of Persian-language users show the proper performance of the proposed methods. It turns out that, achieving a balanced-accuracy of 93.86%, Random Forest is the most accurate classifier among those mentioned above.

Article number: 4

Keywords: social networks, Twitter, bot detection, classification, Persian language

Full-Text [PDF 780 kb] (689 Downloads)

Type of Study: Research | Subject: Information Technology

References

1. [1] Q1-2019 earnings report, Available at: 2. [2] https://s22.q4cdn.com/826641620/files/doc_financials/2019/q1/Q1-2019-Slide-Presentation.pdf. 3. [3] Chu, Zi; Gianvecchio, Steven; Wang, Haining; Jajodia, Sushil (2012). "Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg?" (PDF). IEEE Transactions on Dependable and Secure Computing. 9 (6): 811-824.doi:10.1109/TDSC.2012.75. ISSN 1545-5971. S2CID 351844. [DOI:10.1109/TDSC.2012.75] 4. [4] P. Gamallo and S. Almatarneh (2019) "Naive-Bayesian Classification for Bot Detection in Twitter Notebook for PAN at CLEF 2019", CLEF 2019, Lugano, Switzerland. 5. [5] M. Latah (2020) "Detection of Malicious Social Bots: A Survey and a Refined Taxonomy", Expert Systems with Applications, vol. 151, pp. 113383. [DOI:10.1016/j.eswa.2020.113383] 6. [6] M. Shamsfard (2019) " Challenges and Opportunities in Processing Low Resource Languages: A study on Persian", International Conference Language Technologies for All (LT4All), Dec 2019, Paris, France. 7. [7] I. Inuwa-Dutse, M. Liptrott, I. Korkontzelos (2018) "Detection of spam-posting accounts on Twitter", Neurocomputing, vol.315, pp 496-511. [DOI:10.1016/j.neucom.2018.07.044] 8. [8] Z. Gilani, R. Farahbakhsh, G. Tyson, L. Wang, and J.Crowcroft (2017) "Of bots and humans (on twitter)", In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017,pp. 349-354. ACM. [DOI:10.1145/3110025.3110090] 9. [9] S. B. Jr, G. F. C. Campos, G. M. Tavares, R. A. Igawa and M.L. P. Jr (2018) "Detection of Human, Legitimate Bot, and Malicious Bot in Online Social Networks Based on Wavelets ",ACM Trans. Multimedia Comput. Commun. Appl, vol. 26, no.1, pp. 1-17. [DOI:10.1145/3183506] 10. [10] Z. Chu, S. Gianvecchio, H. Wang and S. Jajodia (2012) "Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg?", IEEE Transactions on Dependable and Secure Computing, vol. 9, no. 6, pp. 811-824. [DOI:10.1109/TDSC.2012.75] 11. [11] C. A. Davis, O. Varol, E. Ferrara, A. Flammini and F. Menczer(2016) " Botornot: A system to evaluate social bots", In Proceedings of the 25th international conference companion on world wide web, pp. 273-274. [DOI:10.1145/2872518.2889302] 12. [12] O. Loyola-Gonzalez, R. Monroy, J. Rodriguez, A. LopezCuevas, J. I. Mata-Sanchez (2019) " Contrast pattern-based classification for bot detection on twitter", IEEE Access, vol.7, pp. 45800-45817. [DOI:10.1109/ACCESS.2019.2904220] 13. [13] D. M. Beskow, K. M. Carley (2018) "Bot conversations are different: leveraging network metrics for bot detection in twitter", In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM),pp 825-832. [DOI:10.1109/ASONAM.2018.8508322] 14. [14] O. Loyola-Gonzalez, R. Monroy, J. Rodriguez, A. LopezCuevas, J. I. Mata-Sanchez (2019) " Contrast pattern-based classification for bot detection on twitter", IEEE Access, vol.7, pp. 45800-45817. [DOI:10.1109/ACCESS.2019.2904220] 15. [15] I. Inuwa-Dutse, M. Liptrott, I. Korkontzelos (2018) "Detection of spam-posting accounts on Twitter", Neurocomputing, vol.315, pp 496-511. [DOI:10.1016/j.neucom.2018.07.044] 16. [16] R. A. Igawa, S. Barbon Jr, K. C. S. Paulo, G. S. Kido, . R. C.Guido, M. L. P. Júnior and I. N. d. Silva (2016) "Account classification in online social networks with LBCA and wavelets", Information Sciences, vol. 332, pp. 72-83. [DOI:10.1016/j.ins.2015.10.039] 17. [17] Wei and U. T. Nguyen (2019) " Twitter Bot Detection Using Bidirectional Long Short-term Memory Neural Networks and Word Embeddings", In First IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), pp. 101-109. [DOI:10.1109/TPS-ISA48467.2019.00021] 18. [18] J. Pennington, R. Socher and C. Manning (2014) "Glove: Global vectors for word representation" in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar. [DOI:10.3115/v1/D14-1162] 19. [19] N. Chavoshi, H. Hamooni and A. Mueen (2016) " DeBot: Twitter Bot Detection via Warped Correlation", In IEEE 16th International Conference on Data Mining (ICDM), pp. 817-822. [DOI:10.1109/ICDM.2016.0096] 20. [20] S. Cresci, R. D. Pietro, M. Petrocchi, A. Spognardi and M.Tesconi (2018) "Social Fingerprinting: Detection of Spambot Groups Through DNA-Inspired Behavioral Modeling," in IEEE Transactions on Dependable and Secure Computing, vol. 15, no. 4, pp. 561-576. [DOI:10.1109/TDSC.2017.2681672] 21. [21] S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi and M.Tesconi (2016), "DNA-Inspired Online Behavioral Modeling and Its Application to Spambot Detection" in IEEE Intelligent Systems, vol. 31, no. 5, pp. 58-64. [DOI:10.1109/MIS.2016.29] 22. [22] P. G. Pratama and N. A. Rakhmawati (2019) "Social Bot Detection on 2019 Indonesia President Candidate's Supporter's Tweets" Procedia Computer Science, vol. 161, pp.813-820. [DOI:10.1016/j.procs.2019.11.187] 23. [23] D. Stukal, S. Sanovich, J. A. Tucker and R. Bonneau (2019)"For whom the bot tolls: A neural networks approach to measuring political orientation of Twitter bots in Russia", SAGE Open, vol. 9, no. 2. [DOI:10.1177/2158244019827715] 24. [24] S. Cresci, F. Lillo, D. Regoli, S. Tardelli and M. Tesconi (2019) " Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on twitter", ACM Transactions on the Web (TWEB), vol. 13, no. 2, pp. 1-27. [DOI:10.1145/3313184] 25. [25] A. Balestrucci, R. De Nicola, O. Inverso, and C. Trubiani (2019) " Identification of credulous users on Twitter", In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp. 2096-2103.Volume 15- Number 1 - 2023 (35 -44) 33 [DOI:10.1145/3297280.3297486] 26. [26] A. Balestrucci, R. De Nicola, M. Petrocchi and C. Trubiani (2019) "Do you really follow them? Automatic detection of credulous Twitter users", In International Conference on Intelligent Data Engineering and Automated Learning, pp.402-410, Springer, Cham. [DOI:10.1007/978-3-030-33607-3_44] 27. [27] Hazm. (2014). Python library for digesting Persian text, "Https://github.com/sobhe/hazm." 28. [28] Z. Sarabi, H. Mahyar, M. Farhoodi (2013, October) "ParsiPardaz: Persian Language Processing Toolkit", In Computer and Knowledge Engineering (ICCKE), 2013 3th International eConference on (pp. 73-79). IEEE. [DOI:10.1109/ICCKE.2013.6682862] 29. [29] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov (2017) "Enriching word vectors with sub-word information", Transactions of the Association for Computational Linguistics, vol. 5, pp. 135-146. [DOI:10.1162/tacl_a_00051] 30. [30] K. H. Brodersen, C. S. Ong , K. E. Stephan and J. M. Buhmann (2010) " The balanced accuracy and its posterior distribution", In 2010 International Conference on Pattern Recognition, (pp. 3121-3124). IEEE. [DOI:10.1109/ICPR.2010.764] 31. [31] Y. Qi (2012) "Random forest for bioinformatics." Ensemble machine learning. Springer, pp. 307-323. [DOI:10.1007/978-1-4419-9326-7_11]

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.