Part I - Techniques for corpus creation and management

DATE
TOPIC
MATERIALS
14/11
What is a corpus?
- Slides [I.2]
- [CL] - Chapter 1
14/11
Corpus representativeness and annotations
- Slides [I.2]
- [CL] - Chapters 2 and 3
14/11
Concordances, collocations and measures of words association - Slides [I.2]
- [CL] - Chapters 2 and 3
15/11
Regular Expressions - Slides [I.4]
- [SLP] - Chapter 2
- Reg. Exp. Quick Start.
- Reg.Exp. Demo.
15/11
Tokenisation and
Sentence segmentation.
- Slides [I.3],
- [Schmid, 2008]
18/11
Text Character Encoding. - Slides [I.3b],
18/11
21/11
Techniques for Text Retrieval
Search Engines Indexing
- Slides [I.2], [I.2b]
21/11
Corpus Querying with AntConc and Qwick. - AntCont website (Local Copy)
- Qwick instructions
- English Demo Corpus
22/11
25/11
Techniques for annotating texts with XML.
Building and using a small, annotated XML corpus.
- Slides [I.5]
- XAIRA Documentation
  (PTB PoS-tags, XAIRA Installer, Demo XML files)
28/11
Case study:
   - A review of Written and Spoken corpora (English/Italian)
   - Corpora@FICLIT: CORIS/CODIS, BoLC e DiaCORIS.
 
- Link1 (English), Link2 (Various languages)
- Slides [I.6]
- Laboratory session.


REFERENCES

[CL]
T. McEnery and A. Wilson (2001). Corpus Linguistics, EUP.

[SLP]
D. Jurafsky and J.H. Martin (2008). Speech and Language Processing, Prentice Hall. DRAFT