The Effect of Data Augmentation Techniques on Persian Stance Detection

Farhoodi, Mojgan; Toloie Eshlaghi, Abbas; Motadel, Mohamadreza

doi:10.61186/itrc.15.1.63

Volume 15, Issue 1 (Special Issue on AI in ICT 2023) itrc 2023, 15(1): 63-71 | Back to browse issues page

‎ 10.61186/itrc.15.1.63

Mendeley

Zotero

RefWorks

Farhoodi M, Toloie Eshlaghi A, Motadel M. The Effect of Data Augmentation Techniques on Persian Stance Detection. itrc 2023; 15 (1) : 7
URL: http://ijict.itrc.ac.ir/article-1-559-en.html

The Effect of Data Augmentation Techniques on Persian Stance Detection

Mojgan Farhoodi¹

, Abbas Toloie Eshlaghi

², Mohamadreza Motadel³

1- Department of Information Technology Management, Science and Research Branch, Islamic Azad University, Tehran, Iran
2- Department of Information Technology Management, Science and Research Branch, Islamic Azad University, Tehran, Iran , toloie@gmail.com
3- Central Tehran Branch, Islamic Azad University, Tehran, Iran

Abstract: (1346 Views)

The purpose of stance detection is to identify the author's stance toward a particular topic or claim. Stance detection has become a key component in applications such as fake news detection, claim validation, argument searching, and author profiling. Although significant progress has been made in stance detection in languages such as English, little attention has been paid in some other languages, including Persian. One of the main problems of research in Persian stance detection is the shortage of appropriate datasets. In this article, to address this problem, we consider data augmentation, the artificial creation of training data, which is used to conquer the shortage of datasets. In this research, we studied several methods of data augmentation such as EDA, back-translation, and merging source dataset with similar one in English language. The experimental results indicate that combining the primary data set with the translation of another dataset with similar content in another language (for example English) result in a significant improvement in the performance of the model.

Article number: 7

Keywords: stance detection, data augmentation. fake news, dataset

Full-Text [PDF 775 kb] (768 Downloads)

Type of Study: Applicable | Subject: Information Technology

References

1. [11] Kucuk, D., and Can, F. (2022, February). A Tutorial on Stance Detection. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (pp. 1626-1628). [DOI:10.1145/3488560.3501391]

2. [12] Lai, M., Cignarella, A. T., Farías, D. I. H., Bosco, C., Patti, V., and Rosso, P. (2020). Multilingual stance detection in social media political debates. Computer Speech and Language, 63, 101075. [DOI:10.1016/j.csl.2020.101075]

3. [13] Zotova, E., Agerri, R., Nuñez, M., and Rigau, G. (2020, May). Multilingual stance detection in tweets: The Catalonia independence corpus. In Proceedings of the 12th Language Resources and Evaluation Conference (pp. 1368-1375).

4. [14] Swami, S., Khandelwal, A., Singh, V., Akhtar, S. S.,and Shrivastava, M. (2018). An english-hindi codemixed corpus: Stance annotation and baseline system. arXiv preprint arXiv:1805.11868.

5. [15] Du, J., Xu, R., He, Y., and Gui, L. (2017, August). Stance classification with target-specific neural attention networks. International Joint Conferences on Artificial Intelligence. [DOI:10.24963/ijcai.2017/557]

6. [16] Darwish, K., Magdy, W., and Zanouda, T. (2017, September). Trump vs. Hillary: What went viral during the 2016 US presidential election. In International conference on social informatics (pp. 143-161).Springer, Cham. [DOI:10.1007/978-3-319-67217-5_10]

7. [17] Bar-Haim, R., Bhattacharya, I., Dinuzzo, F., Saha, A., and Slonim, N. (2017, April). Stance classification of context-dependent claims. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (pp. 251-261). [DOI:10.18653/v1/E17-1024]

8. [18] Kucuk, D., and Can, F. (2020). Stance detection: A survey. ACM Computing Surveys (CSUR), 53(1), 1-37. [DOI:10.1145/3369026]

9. [19] Wojatzki, M., and Zesch, T. (2016, June). ltl. uni-due at semeval-2016 task 6: Stance detection in social media using stacked classifiers. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (pp. 428-433). [DOI:10.18653/v1/S16-1069]

10. [20] Cignarella, A. T., Lai, M., Bosco, C., Patti, V., and Paolo, R. (2020). Sardistance@ evalita2020: Overview of the task on stance detection in italian tweets. EVALITA 2020 Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, 1-10. [DOI:10.4000/books.aaccademia.7084]

11. [21] Wei, P., Lin, J., and Mao, W. (2018, June). Multi-target stance detection via a dynamic memory-augmented network. In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1229-1232). [DOI:10.1145/3209978.3210145]

12. [22] Tutek, M., Sekulić, I., Gombar, P., Paljak, I., Čulinović, F., Boltužić, F., ... and Šnajder, J. (2016, June). Takelab at semeval-2016 task 6: Stance classification in tweets using a genetic algorithm based ensemble. Volume 15- Number 1 - 2023 (63 -71) 70 In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016) (pp. 464-468). [DOI:10.18653/v1/S16-1075]

13. [23] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301. 3781.

14. [24] Pennington, J., Socher, R., & Manning, C. D. (2014,October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing [DOI:10.3115/v1/D14-1162]

15. EMNLP) (pp. 1532-1543).

16. [25] Feng, S. Y., Gangal, V., Wei, J., Chandar, S., Vosoughi,S., Mitamura, T., and Hovy, E. (2021). A survey of data augmentation approaches for NLP. In ACL 2021. [DOI:10.18653/v1/2021.findings-acl.84]

17. [26] Tidke, P. (2022, February). Text Data Augmentation in Natural Language Processing with Texattack

18. [27] Zhang, X., Zhao, J., and LeCun, Y. (2015). Characterlevel convolutional networks for text classification. Advances in neural information processing systems, 28.

19. [28] Liu, R., Xu, G., Jia, C., Ma, W., Wang, L., and Vosoughi, S. (2020). Data boost: Text data augmentation through reinforcement learning guided conditional generation. In proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) [DOI:10.18653/v1/2020.emnlp-main.726]

20. [29] Li, B., Hou, Y., and Che, W. (2022). Data augmentation approaches in natural language processing: A survey. AI Open. [30] Beddiar, D. R., Jahan, M. S., and Oussalah, M. (2021). Data expansion using back translation and paraphrasing for hate speech detection. Online Social Networks and Media, 24, 100153. [DOI:10.1016/j.osnem.2021.100153]

21. [31] Sennrich, R., Haddow, B., and Birch, A. (2015). Improving neural machine translation models with monolingual data. In 54th ACl 2016. [DOI:10.18653/v1/P16-1009]

22. [32] Yu, A. W., Dohan, D., Le, Q., Luong, T., Zhao, R., and Chen, K. (2018, May). Fast and accurate reading comprehension by combining self-attention and convolution. In International Conference on Learning Representations (Vol. 2, No. 1).

23. [33] d'Sa, A. G., Illina, I., & Fohr, D. (2020, February). Bert and fasttext embeddings for automatic detection of toxic speech. In 2020 International Multi-Conference on:"Organization of Knowledge and Advanced Technologies"(OCTA) (pp. 1-5). IEEE. [DOI:10.1109/OCTA49274.2020.9151853]

24. [34] Nasiri, H., and Analoui, M. (2022, February). Persian Stance Detection with Transfer Learning and Data Augmentation. In 2022 27th International Computer Conference, Computer Society of Iran (CSICC) (pp. 1-5). IEEE. [DOI:10.1109/CSICC55295.2022.9780479] [PMID]

25. [35] Huang, W., and Wang, J. (2016). Character-level convolutional network for text classification applied to chinese corpus. The 3rd international conference on machine learning and machine intelligence (pp. 83-87)

26. [36] Zhang, Y., Jin, R., and Zhou, Z. H. (2010). Understanding bag-of-words model: a statistical framework. International journal of machine learning and cybernetics, 1(1), 43-52. [DOI:10.1007/s13042-010-0001-0]

27. [37] Qaiser, S., and Ali, R. (2018). Text mining: use of TFIDF to examine the relevance of words to documents. International Journal of Computer Applications, 181(1), 25-29. [DOI:10.5120/ijca2018917395]

28. [38] Zarharan, M., Ahangar, S., Rezvaninejad, F. S., Bidhendi, M. L., Pilevar, M. T., Minaei, B., and Eetemadi, S. (2019). Persian Stance Classification Data Set. In Conference on Truth and Trust Online [DOI:10.36370/tto.2019.30]

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Principal Contact