Part I - Techniques for corpus creation and management

DATE
TOPIC
MATERIALS
13/11
What is a corpus?
- Slides [I.2]
- [CL] - Chapter 1
13/11
Corpus representativeness and annotations
- Slides [I.2]
- [CL] - Chapters 2 and 3
14/11
Concordances, collocations and measures of words association - Slides [I.2]
- [CL] - Chapters 2 and 3
14/11
Regular Expressions - Slides [I.3]
- [SLP] - Chapter 2
- Reg. Exp. Quick Start.
- Reg.Exp. Demo.
17/11
Tokenisation and
Sentence segmentation.
- Slides [I.4],
- [Schmid, 2008]
17/11
Text Character Encoding. - Slides [I.5],
20/11
27/11
Techniques for Text Retrieval
Search Engines Indexing
Classic Information Retrieval
- Slides [I.2], [I.6], [I.7]
21/11
Corpus Querying with AntConc and Qwick. - AntCont website (Local Copy)
- English Demo Corpus
- Qwick instructions
27/11
28/11
Techniques for annotating texts with XML.
Building and using a small, annotated XML corpus.
- Slides [I.8]
- XAIRA Documentation
  (PTB PoS-tags, XAIRA Installer, Demo XML files)
4/12
Case study:
   - A review of Written and Spoken corpora (English/Italian)
   - Corpora@FICLIT: CORIS/CODIS, BoLC e DiaCORIS.
 
- Link1 (English), Link2 (Various languages)
- Slides [I.9]
- Laboratory session.


REFERENCES

[CL]
T. McEnery and A. Wilson (2001). Corpus Linguistics, EUP.

[SLP]
D. Jurafsky and J.H. Martin (2008). Speech and Language Processing, Prentice Hall. DRAFT