Syllabus
- Part I: Techniques for corpus creation and management
- Corpora and their construction: representativeness
- Annotations and querying. Web as a corpus.
- Concordances, collocations and measures of words association
- Regular Expressions.
- Tokenisation and sentence splitting.
- Methods for Text Retrieval and Classic Information Retrieval.
- XML corpora.
- Corpus querying packages.
- Case studies:
- Written and spoken corpora (Italian/English): a review.
- Corpora@FICLIT: CORIS/CODIS, BoLC and DiaCORIS.
- Part II: AI and Machine Learning for Textual Analysis.
- Introduction to AI & Machine Learning.
- Supervised and Unsupervised Machine Learning.
- Evaluation of Machine Learning systems
- Organisation of a Machine Learning experiment.
- Practical ML experiments on texts with Orange.