Part I - Techniques for corpus creation and management

DATE
TOPIC
MATERIALS
12/11
What is a corpus?
- Slides [I.2]
- [CL] - Chapter 1
12/11
15/11
Corpus representativeness and annotations
- Slides [I.2]
- [CL] - Chapters 2 and 3
18/11
Concordances, collocations and measures of words association - Slides [I.2]
- [CL] - Chapters 2 and 3
18/11
Regular Expressions - Slides [I.3]
- [SLP] - Chapter 2
- Reg. Exp. Quick Start.
- Reg.Exp. Demo.
19/11
Corpus typology and design - [Atkins et al. 1992]
19/11
Tokenisation and
Sentence segmentation.
- Slides [I.4],
- [Schmid, 2008]
-
Text Character Encoding. - Slides [I.5],
-
Techniques for Text Retrieval
Classic Information Retrieval
- Slides [I.2], [I.6], [I.7]
-
Corpus Querying with AntConc and Qwick. - AntCont website (Local Copy)
- English Demo Corpus
- Qwick instructions
-
Techniques for annotating texts with XML.
Building and using a small, annotated XML corpus.
- Slides [I.8]
- XAIRA Documentation
  (PTB PoS-tags, XAIRA Installer, Demo XML files)
-
Case study:
   - A review of Written and Spoken corpora (English/Italian)
   - Corpora@FICLIT: CORIS/CODIS, BoLC e DiaCORIS.
 
- Link1 (English), Link2 (Various languages)
- Slides [I.9]
- Laboratory session.


REFERENCES

[CL]
T. McEnery and A. Wilson (2001). Corpus Linguistics, EUP.

[SLP]
D. Jurafsky and J.H. Martin (2008). Speech and Language Processing, Prentice Hall. DRAFT