Volume 17, Issue 2 (5-2025)                   itrc 2025, 17(2): 59-74 | Back to browse issues page

XML Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Asghari H, Mohtaj S. Multi-type Obfuscation Corpus for CrossLingual Plagiarism Detection. itrc 2025; 17 (2) :59-74
URL: http://ijict.itrc.ac.ir/article-1-632-en.html
1- Department of Advanced Information Systems ICT Research Institute (ACECR) Tehran, Iran , habib.asghari@ictrc.ac.ir
2- Speech and Language Technology (SLT) Department German Research Centre for Artificial Intelligence (DFKI), Labor Berlin, Berlin, Germany
Abstract:   (2667 Views)
In recent years, due to the high availability of documents through the Internet, plagiarism is becoming a serious issue in many fields of research. Moreover, the availability of machine translation systems facilitates the re-use of textual content across languages. So, the detection of plagiarism in cross-lingual cases is now of great importance especially when the source and target language are different. Various methods for automatic detection of text reuse have been developed whose objective is to help human experts investigate suspicious documents for plagiarism cases. For evaluating the performance of theses plagiarism detection systems and algorithms, we need to construct plagiarism detection corpora. In this paper, we propose an English-Persian plagiarism detection corpus comprised of different types of paraphrasing. The goal is to simulate what would be done by humans to conceal plagiarized passages after translating the text into the target language. The proposed corpus includes seven types of paraphrasing methods that cover (but not limited to) all of the obfuscation types in the previous works into one integrated CLPD corpus. To evaluate the corpus, an extrinsic evaluation approach has been applied by executing a wide variety of plagiarism detection algorithms as downstream tasks on the proposed corpus. The results show that the performance of the algorithms decreases by increasing the obfuscation complexity. 
 
Full-Text [PDF 1115 kb]   (709 Downloads)    
Type of Study: Research | Subject: Information Technology

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.