A corpus of written Italian - CORIS/CODIS is available
on-line for research purposes. The project, designed and co-ordinated
by R. Rossini Favretti, was started in 1998, with the purpose of
creating a representative and sizeable general reference corpus of
written Italian which would be easily accessible and user-friendly.
CORIS contains 150 million words and has been updated every three years by
means of a built-in monitor corpus. It consists of a collection of
authentic and commonly occurring texts in electronic format chosen by
virtue of their representativeness of modern Italian.
The project is the result of research carried out
at the University of Bologna. This has been possible thanks to
technological development, the possibility of building on previous
experience and the long period of preparatory study that preceded the
planning and construction phases.
It is aimed at a broad spectrum of potential
users, from Italian language scholars to Italian and foreign students
engaged in linguistic analysis based on authentic data and, in a wider
prospective, all those interested in intra- and/or interlinguistic
and implementation of a CORpus di Italiano
R. Rossini, F. Tamburini, A. Zaninello (2011), Exploiting corpus evidence
for automatic sense induction, Actas del III Congreso de la Asociación
Española de Lingüística de Corpus, Universitat Politècnica de València.
Grandi N., Montermini F., Tamburini F. (2011), Annotating large corpora for
studying Italian derivational morphology. Lingue e Linguaggio, X.2.,
R. Rossini Favretti (2009), "Corpus data and frame semantics", in Abstracts of the International conference of the Linguistics Society of Belgium on Framing: from grammar to application, Antwerp, p. 17. (Presentation)
R. Rossini Favretti, F. Tamburini (2009), "Exploring register variation through corpus evidence", in Abstracts of DGfS 2009 Workshop on Corpus, Colligation, Register Variation, Osnabruck, p. 155. (Presentation)
F. Tamburini (2009). "PoS-tagging Italian texts with CORISTagger". In Proc of EVALITA 2009. AI*IA Workshop on Evaluation of NLP and Speech Tools for Italian, Reggio Emilia, December 2009.
R. Rossini Favretti (2008), "Grounding frame elements identification in corpus collocational patterns", in Proceedings of the 41st Meeting of the British Association of Applied Linguistics, Swansea, p.91-92. (Presentation)
F. Tamburini, C. Seidenari, A. Bolognesi, R. Bernardi (2008).
"Italian Lexical-Classes Definition Using Automatic Methods.", In Rossini Favretti R. (ed.), Frames, Corpora and Knowledge Representation, Bologna: Bononia University Press, 95-120.
R. Bernardi, A. Bolognesi, C. Seidenari, F. Tamburini (2008).
"Learning an Italian Categorial Grammar." In Rossini Favretti R. (ed.), Frames, Corpora and Knowledge Representation, Bologna: Bononia University Press, 185-200.
R. Rossini Favretti (2008). "Text, collocations and frames", in Rossini Favretti R. (ed.), Frames, Corpora and Language Representation, Bologna, BUP, pp.79-94.
R. Rossini Favretti (2008), Frames, Corpora and Language
Representation, Bologna, BUP, 2008, pp.301.
R. Rossini Favretti, F. Tamburini, D. Proietti (2007), "Strumenti di esplorazione: i corpora", Tradurre per l'Europa, Accademia della Crusca, Florence, Italy.
R. Rossini Favretti (2007), "Multilinguismo e comunicazione in rete", in: Annali del Collegio Superiore. Anno Accademico 2007/08, G. Brandi (ed.), Bologna, BUP, 2007, pp. 175-185.
F. Tamburini (2007). "CORISTagger: a high-performance PoS tagger for Italian." Intelligenza Artificiale, IV(2), 14-15.
Onelli C., Proietti D., Seidenari C., Tamburini F. (2006). "The DiaCORIS project: a diachronic corpus of written Italian". In Proc. 5th International Conference on Language Resources and Evaluation - LREC 2006, Genova, 1212-1215.
Bernardi R., Bolognesi A., Seidenari C., Tamburini F. (2006). "POS tagset design for Italian". In Proc. 5th International Conference on Language Resources and Evaluation - LREC 2006, Genova, 1396-1401.
Bernardi R., Bolognesi A., Seidenari C., Tamburini F. (2005).
"Automatic induction of a POS tagset for Italian". In Proc. Australasian Language Technology Workshop 2005, Sydney, 176-183.
Tamburini F. (2004). "Building Distributed Language Resources by Grid Computing". In Proc. 4th International Conference on Language Resources and Evaluation - LREC 2004, Lisbon, 1217-1220.
Bernardi R., Bolognesi A., Tamburini F., and Moortgat M. (2004). "Categorial Type Logic meets Dependency Grammar to annotate an Italian corpus". In Proc. Recent Advances in Dependency Grammars Workshop - COLING 2004, Geneva, 57-64.
Tamburini F. (2002). "A dynamic model for reference corpora structure definition". In Proc. Third International Conference on Language Resources and Evaluation - LREC2002, Las Palmas, Canary Islands, Spain, 1847-1850.
R. Rossini Favretti, F. Tamburini, C. De Santis (2002). "A corpus of written Italian: a defined and a dynamic model.", in A. Wilson, P. Rayson, T. McEnery (eds.) , A Rainbow of Corpora: Corpus Linguistics and the Languages of the World, Lincom-Europa, Munich.
F. Tamburini, C. De Santis, E. Zamuner (2002). "Identifying phrasal connectives in Italian using quantitative methods." In Phrases and Phraseology - Data and Descriptions, S. Nuccorini (ed.), Berlin: Peter Lang.
R. Rossini Favretti (2002), "Corpus linguistics and Italian studies", In S. Nuccorini (ed.), Phrases and Phraseology - Data and Descriptions, Bern: Peter Lang, pp. 27-43.
R. Rossini Favretti (2001), "La linguistica dei corpora
in Europa: prospettive di analisi", in Lingua e Stile, XXVI, 2, pp.
R. Rossini Favretti (2001), "Interpretation and Representation in the Discourse of Economics". In P.L. Porta, R. Scazzieri, A. Skinner (eds.), in Knowledge, Institutions and the Division of Labour, Cheltenham: Edward Elgar, pp. 65-74.
R. Rossini Favretti (2000). "Progettazione e costruzione di un corpus di italiano scritto: CORIS/CODIS", in R. Rossini Favretti (ed.), Linguistica e informatica. Multimedialità, corpora e percorsi di apprendimento, Bulzoni, Roma, pp. 39-56.
F. Tamburini (2000). "Annotazione grammaticale e lemmatizzazione di corpora in italiano.", in R. Rossini Favretti (ed.), Linguistica e informatica. Multimedialità, corpora e percorsi di apprendimento, Bulzoni, Roma, pp. 57-73.