Dr. Sandra Maria Aluísio- Universidade de São Paulo
a Large Eye-tracking Corpus of Reading Data for Automatic Sentence Readability Assessment in Portuguese
Currently, eye-tracking data are often used in the study of language complexity to evaluate models and metrics of syntactic difficulty, to improve or evaluate computational models of simplification via sentential compression and to evaluate the quality of automatic translation with objective metrics. However, only few resources exist, for a small number of languages, for example, English (Luke and Christianson, 2018; Cop et al., 2017), English and French (Kennedy et al., 2013), German (Kliegl et al., 2004), Russian (Laurinavichyute et al., 2018), Hindi (Husain et al., 2015) and Chinese (Yan et al., 2010). For Portuguese, there is no large eye-tracking corpus with predictability norms such as those cited above. This is a large gap that prevents the progress of research in Cognitive Psychology, Psycholinguistics and Natural Language Processing (NLP) areas. In this project, we propose: (i) to create and make publicly available a large corpus of eye movements in reading short paragraphs in Portuguese with predictability norms that estimate the predictability of the full orthographic form (traditional Cloze scores), of the morphosyntactic and semantic information for each word in the paragraph, and (ii) to contribute to the dissemination of research using the technique of eye-tracking in both Psycholinguistics and NLP areas.
コメント