Alice Fedotova

Hi, I’m Alice! 👋🏻

I’m an Estonian Language Technology Researcher and Startup Founder specialized in NLP. Previously, I was a Research Fellow and a Research Intern at the University of Bologna, where I worked on EPTIC, the European Parliament Translation and Interpreting Corpus, and on Multimodal Classification of Medical Television Programs.

My research interests include Conversational Semantic Search, Question Answering Systems, Training, Domain Adaptation, and Evaluation of cutting-edge LLMs, and Multimodal Content Analysis. I’m currently a member of AILC, the Italian Association of Computational Linguistics. Check out my CV for more info about my background, and feel free to contact me on LinkedIn or to send me any (anonymous) feedback 🙂!

Experience

Startup Founder, 2024- Ongoing
Own Pre-Seed Startup
Research Fellow, 2023 - 2024
University of Bologna

Education

MA in Translation and Technology
University of Bologna
BA in Languages and Technologies for Intercultural Communication
University of Bologna

🗞️ Updates and highlights from the CLiC-it 2024 conference in Pisa, Italy

🗞️ Updates and highlights from the CLiC-it 2024 conference in Pisa, Italy

An overview of the main themes and research directions presented by leading researchers in the field of computational linguistics.

Dec 6, 2024

Constructing EPTIC: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription

Constructing EPTIC: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription

This paper presents a novel pipeline for constructing multimodal and multilingual parallel corpora, with a focus on evaluating state-of-the-art automatic speech recognition tools for verbatim transcription. Our findings indicate that current technologies can streamline corpus construction, with fine-tuning showing promising results in terms of transcription quality compared to out-of-the-box Whisper models.

Dec 6, 2024

🏆 Paper accepted at CLiC-it 2024, the Tenth Italian Conference on Computational Linguistics

🏆 Paper accepted at CLiC-it 2024, the Tenth Italian Conference on Computational Linguistics

Constructing EPTIC: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription

Sep 23, 2024

Expanding the European Parliament Translation and Interpreting Corpus: A Modular Pipeline for the Construction of Complex Corpora

The present paper introduces an expanded version of the European Parliament Translation and Interpreting Corpus (EPTIC), a multimodal parallel corpus comprising speeches delivered at the European Parliament along with their official interpretations and translations (see Bernardini et al., 2016; Bernardini et al., 2018). Constructing multimodal and parallel corpora for translation and interpreting studies (TIS) has been acknowledged as a “formidable task” (Bernardini et al., 2018), which – if automated, as we propose – involves a number of subtasks such as automatic speech recognition (ASR), multilingual sentence alignment, and forced alignment, each of which poses its own challenges. Yet tackling these subtasks also offers a unique way to evaluate state-of-the-art natural language processing (NLP) tools against a unique, multilingual benchmark. In this paper we discuss the development of a modular pipeline adaptable for each of these subtasks and address the broader implications of this work for the field of corpus construction.

Sep 15, 2024

🎉 Paper accepted at JTDH, the 14th Conference on Language Technologies and Digital Humanities

🎉 Paper accepted at JTDH, the 14th Conference on Language Technologies and Digital Humanities

Expanding the European Parliament Translation and Interpreting Corpus: A Modular Pipeline for the Construction of Complex Corpora

Jul 5, 2024

CLiC-it 2024 Updates

CLiC-it 2024 Updates

Highlights from the CLiC-it 2024 conference, held in Pisa, Italy.

Feb 7, 2024

Decoding Medical Dramas: Identifying Isotopies through Multimodal Classification

Decoding Medical Dramas: Identifying Isotopies through Multimodal Classification

Classifying audiovisual content using unimodal and multimodal transformer-based models. The study compares two classification strategies: a single multiclass classifier and a one-vs-the-rest approach, examining their performance in both unimodal and multimodal settings. Results show the multiclass multimodal approach achieves the best performance, with an F1 score of 0.723, outperforming the unimodal text-based one-vs-the-rest method.

Dec 23, 2023

✅ Started working on EPTIC, the European Parliament Translation and Interpreting Corpus

✅ Started working on EPTIC, the European Parliament Translation and Interpreting Corpus

The aim of the project is to design a pipeline to expand the existing data and experiment with speech recognition models

Oct 1, 2023