Part I - Techniques for corpus creation and management

DATE	TOPIC	MATERIALS
11/11	What is a corpus?	- Slides [I.2] - [CL] - Chapter 1
11/11	Corpus representativeness and annotations	- Slides [I.2] - [CL] - Chapters 2 and 3
14/11	Concordances, collocations and measures of words association	- Slides [I.2] - [CL] - Chapters 2 and 3
14/11	Regular Expressions	- Slides [I.3] - [SLP] - Chapter 2 - Reg. Exp. Quick Start. - Reg.Exp. Demo.
17/11	Corpus typology and design	- [Atkins et al. 1992]
18/11	Tokenisation and Sentence segmentation.	- Slides [I.4], - [Schmid, 2008]
18/11	Text Character Encoding.	- Slides [I.5],
21/11	Techniques for Text Retrieval Classic Information Retrieval	- Slides [I.2], [I.6], [I.7]
24/11	Corpus Querying with AntConc and Qwick.	- AntCont website (Local Copy) - English/Arabic Demo Corpora - Qwick instructions
25/11 28/11	Techniques for annotating texts with XML. Building and using a small, annotated XML corpus.	- Slides [I.8] - XAIRA Documentation (PTB PoS-tags, XAIRA Installer, Demo XML files)
1/12	Case study: - A review of Written and Spoken corpora (English/Italian) - Corpora@FICLIT: CORIS/CODIS, BoLC e DiaCORIS.	- Link1 (English), Link2 (Various languages) - Slides [I.9]

REFERENCES

[CL]
T. McEnery and A. Wilson (2001). Corpus Linguistics, EUP.

[SLP]
D. Jurafsky and J.H. Martin (in press). Speech and Language Processing, Prentice Hall. (3rd Edition DRAFT.)