International Journal of Information and Communication Technology Research
مجله بین المللی ارتباطات و فناوری اطلاعات
International Journal of Information and Communication Technology Research
Engineering & Technology
http://ijict.itrc.ac.ir
1
admin
2251-6107
2783-4425
doi
1652
25391
en
jalali
1388
6
1
gregorian
2009
9
1
1
3
online
1
fulltext
en
Corpus-Based Analysis for Multi-Token Units in Persian
فناوری اطلاعات
Information Technology
پژوهشي
Research
<p>Because of the joining behavior of Persian script and its orthographic variation, the morphological and syntactic annotations of multi-token units meet various issues. By the analysis of Perso-Arabic script and its problems, the various collocation types of the tokens including the compositional, non-compositional and the new semi­compositional constructions are described in the present paper. Then, to illustrate these constructions, the static and dynamic multi-token units will be presented for the generative and non-generative structures of the main categories including the verbs, infinitives, prepositions, conjunctions, adverbs, adjectives and nouns. Defining the multi-token unit templates for these categories is one of the important results of this research. The findings can be input to the segmentation module of the Persian Treebank generator system. The other usage of the present research is in the design and implementation of the morphological analyzers and syntactical parsers.</p>
Persian script, orthographic variation, morphological and syntactic annotations, Persian Treebank generator system, syntactical parsers, morphological analyzers
15
26
http://ijict.itrc.ac.ir/browse.php?a_code=A-10-27-257&slc_lang=en&sid=1
Masoud
Sharifi Atashgah
1003194753284600926
1003194753284600926
Yes
Department of Literature and Human Science University,Tehran University,Tehran,Iran
Mahmoud
Bijankhan
1003194753284600927
1003194753284600927
No
Department of Literature and Human Science University,Tehran University,Tehran,Iran