top of page

RASTROS Project:

Foto do escritor: 4th BR Workshop on Sentence Processing4th BR Workshop on Sentence Processing

Atualizado: 24 de jan. de 2019

Dr. Sandra Maria Aluísio- Universidade de São Paulo




a Large Eye-tracking Corpus of Reading Data for Automatic Sentence Readability Assessment in Portuguese


Currently, eye-tracking data are often used in the study of language complexity to evaluate models and metrics of syntactic difficulty, to improve or evaluate computational models of simplification via sentential compression and to evaluate the quality of automatic translation with objective metrics. However, only few resources exist, for a small number of languages, for example, English (Luke and Christianson, 2018; Cop et al., 2017), English and French (Kennedy et al., 2013), German (Kliegl et al., 2004), Russian (Laurinavichyute et al., 2018), Hindi (Husain et al., 2015) and Chinese (Yan et al., 2010).  For Portuguese, there is no large eye-tracking corpus with predictability norms such as those cited above. This is a large gap that prevents the progress of research in Cognitive Psychology, Psycholinguistics and Natural Language Processing  (NLP) areas. In this project, we propose: (i)  to create and make publicly available a large corpus of eye movements in reading short paragraphs in Portuguese with predictability norms that estimate the predictability of the full orthographic form (traditional Cloze scores), of the morphosyntactic and semantic information for each word in the paragraph, and (ii) to contribute to the dissemination of research using the technique of eye-tracking in both Psycholinguistics and NLP areas.

 
 
 

Comments


© 2019 4th BRAZILIAN WORKSHOP ON SENTENCE PROCESSING

bottom of page