Part I - Techniques for corpus creation and management

DATE	TOPIC	MATERIALS
14/11	What is a corpus?	- Slides [I.2] - [CL] - Chapter 1
14/11	Corpus representativeness and annotations	- Slides [I.2] - [CL] - Chapters 2 and 3
14/11	Concordances, collocations and measures of words association	- Slides [I.2] - [CL] - Chapters 2 and 3
15/11	Regular Expressions	- Slides [I.4] - [SLP] - Chapter 2 - Reg. Exp. Quick Start. - Reg.Exp. Demo.
15/11	Tokenisation and Sentence segmentation.	- Slides [I.3], - [Schmid, 2008]
18/11	Text Character Encoding.	- Slides [I.3b],
18/11 21/11	Techniques for Text Retrieval Search Engines Indexing	- Slides [I.2], [I.2b]
21/11	Corpus Querying with AntConc and Qwick.	- AntCont website (Local Copy) - Qwick instructions - English Demo Corpus
22/11 25/11	Techniques for annotating texts with XML. Building and using a small, annotated XML corpus.	- Slides [I.5] - XAIRA Documentation (PTB PoS-tags, XAIRA Installer, Demo XML files)
28/11	Case study: - A review of Written and Spoken corpora (English/Italian) - Corpora@FICLIT: CORIS/CODIS, BoLC e DiaCORIS.	- Link1 (English), Link2 (Various languages) - Slides [I.6]
-	Laboratory session.

REFERENCES

[CL]
T. McEnery and A. Wilson (2001). Corpus Linguistics, EUP.

[SLP]
D. Jurafsky and J.H. Martin (2008). Speech and Language Processing, Prentice Hall. DRAFT