This paper reports our improvement over the previous benchmark of the task of answering poetic verses' thematic similarity multiple-choice questions (MCQs). In this experiment, we have trained a Doc2Vec model on a corpus of Persian poems and proceeded to use the trained model to get the vector representations of the poetic verses. Subsequently, the poetic verse among the options with the highest cosine similarity to the stem verse was selected as the correct answer by the model. This model managed to answer 38% of the questions correctly, which was an improvement of 6% over the previous benchmark. Provided that a large-scale thematic similarity MCQ dataset is developed, the performance of a language representation model on this task could be considered as a novel benchmark to measure the capacity of a model to understand metaphorical language.
Rights and permissions | |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |