Home

Work on Part-of-Speech (PoS) tagging has mainly concentrated on standardized texts for many years. However, the interest in automatic evaluation of social media texts, in particular for microblogging texts such as tweets, is growing considerably: information found on Twitter has already been shown to be useful for a variety of applications for identifying trends and upcoming events in various fields. As the nature of social media texts is clearly different from standardized texts, both regarding the nature of lexical items and their distributional properties (short messages, emoticons and mentions, threaded messages, etc.), Natural Language Processing methods need to be adapted for obtaining a reliable processing. The basis for such an adaption is a tagged social media text corpus [Neunerdt et al. 2013] for training and testing automatic procedures. Various attempt to produce such kind of specialised tools are available in literature (e.g. [Gimpel et al. 2011; Derczynski et al. 2013; Neunerdt et al. 2013; Owoputi et al. 2013]) for other languages, but Italian completely lack of such resources both regarding annotated corpora and specific PoS-tagging tools.
For all the above mentioned reasons, we proposed a task for EVALITA 2016 concerning the domain adaptation of PoS-taggers to Twitter texts.

For the proposed task we plan to re-use the tweets being part of the EVALITA2014 SENTIPLOC corpus: both the development and test set will be manually annotated for a global amount of 4041+1749 tweets and will be distributed as the new development set. Then a new manually annotated test set composed of 600/700 tweets will be produced using texts from the same period of time. All the annotations will be carried out by three different annotators.

We will distribute a tokenised version of the texts in order to avoid tokenisation problems among participants and the boring problem of disappeared tweets.

References

Derczynski, L.,Ritter, A., Clark, S., Bontcheva, K. (2013). “Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data}”, In Proceedings of RANLP 2013, 198--206.

Gimpel, K., Schneider, N., O'Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.A. (2011). “Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments”, In Proceedings of ACL 2011.

Neunerdt, M., Trevisan, B., Reyer, M., Mathar, R. (2013). “Part-of-speech tagging for social media texts.", Language Processing and Knowledge in the Web. Springer, 139-150.

Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A. (2013). “Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters”. In Proceedings of NAACL 2013.