Projects
Here you can find some suggestions for projects provided by the teacher. Students may propose alternative projects provided that they are in close connection with the course topics and they are approved by teacher.
It is better to contact the teacher BEFORE starting the project.
- Use the open-source software package NVIDIA-NeMo to build a small isolated-word speech recognition system. Define an appropriate problem, create the audio samples for the training and test set, realize the system learning phase by fine-tuning standard models provided by NVIDIA, and evaluate the performance. Contact the teacher for further clarifications and remember that this project is computationally demanding (you need a suitable GPU).
- Create a small chatbot by using Google DialogFlow CX. Identify a suitable problem and design the dialogue flow inside the application.
- Make a program that can handle tokenization rules specified by regular expressions and exception lists (abbreviations, acronyms, etc.). Then complete the project by writing the necessary rules for proper tokenization of a language other than English considering also emoticons and emojis.
- Use the HFST package to create a morphological analyzer/generator portion for a language of your choice, other than English.
- Implement an application for parts of the speech tagging based on deep neural networks using the datasets used for EVALITA 2007 (contact the teacher to get these corpora).
- Expand a Lexicalised Tree-Adjoining grammar for Italian provided by the teacher and test it by using the LTAG parser contained in the Linux Live lab.
- Download the program for WordNet creation and management. Create a WordNet for a specific Italian domain.
- Implement the word-sense disambiguation algorithms discussed during the lessons (the naive-bayes supervised method, the Lesk-based lexical algorithm and the non supervised method based on the EM algorithm.
- Experience the behavior of some packages for creating a Word Spaces or Word embeddings, evaluating their performance by considering specific linguistic phenomena.
- Fine-tune a transformer model to solve a specific task (e.g. PoS-tagging, NER, WSD) and carefully evaluate it w.r.t. the state-of-the-art using a standard bencjmark.
- Design a suitable prompt for an online LLM (e.g. ChatGPT) to solve a specific task (e.g. PoS-tagging, NER, WSD) and carefully evaluate it w.r.t. the state-of-the-art by using a standard bencjmark.