Part I - Techniques for corpus creation and management

DATE
TOPIC
MATERIALS
What is a corpus?
- Slides [I.2]
- [CL] - Chapter 1
Corpus representativeness and annotations
- Slides [I.2]
- [CL] - Chapters 2 and 3
Concordances, collocations and measures of words association - Slides [I.2]
- [CL] - Chapters 2 and 3
Tokenisation and
Sentence segmentation.
- Slides [I.3],
- [Schmid, 2008]
Techniques for Text Retrieval - Slides [I.2]
Regular Expressions - Slides [I.4]
- [SLP] - Chapter 2
- Reg. Exp. Quick Start.
- Reg.Exp. Demo.
16/11
17/11
Corpus Querying with AntConc and Qwick. - AntCont website
- Qwick instructions
20/11
23/11
Techniques for annotating texts with XML.
Building and using a small, annotated XML corpus.
- Slides [I.5]
- XAIRA Documentation
  (PTB PoS-tags, XAIRA Installer)
24/11 Case study:
   - A review of Written and Spoken corpora (English/Italian)
   - Corpora@FICLIT: CORIS/CODIS, BoLC e DiaCORIS.
 
- Link1 (English), Link2 (Various languages)
- Slides [I.6]
- Laboratory session.


REFERENCES

[CL]
T. McEnery and A. Wilson (2001). Corpus Linguistics, EUP.

[SLP]
D. Jurafsky and J.H. Martin (2008). Speech and Language Processing, Prentice Hall. DRAFT