Alice Fedotova

Hi, I’m Alice! 👋🏻

I’m an Independent Researcher and Data Scientist using AI and ML to turn raw information into insights that inspire action and combat misinformation. Previously, I was a Research Fellow at the University of Bologna, where I worked on EPTIC, the European Parliament Translation and Interpreting Corpus.

My research interests include Text Mining, Data Labeling and Benchmarking, and Multimodal Content Analysis. I’m currently a member of AILC, the Italian Association of Computational Linguistics, and ITARDD, the Italian Association for Harm Reduction.

Check out my CV for more info about my background, and feel free to contact me on Telegram or to send me any (anonymous) feedback 🙂! Also, have a listen to my mixes if you’re into electronic music production and/or alternative DJ sets.

Experience

  • Data Scientist, 2025 - Ongoing
    Independent Researcher
  • Research Fellow, 2023 - 2024
    University of Bologna

Education

  • MA in Translation and Technology
    University of Bologna
  • BA in Languages and Technologies for Intercultural Communication
    University of Bologna
🇮🇹 Expanding my research focus - bridging academia and harm reduction activism in Italy
🇮🇹 Expanding my research focus - bridging academia and harm reduction activism in Italy

Expanding my research experience from the University of Bologna by working with Italian Harm Reduction groups to promote data-driven public health strategies across Italy and Europe.

Jul 1, 2025

🗞️ Updates and highlights from the CLiC-it 2024 conference in Pisa, Italy
🗞️ Updates and highlights from the CLiC-it 2024 conference in Pisa, Italy

An overview of the main themes and research directions presented by leading researchers in the field of computational linguistics.

Dec 6, 2024

Constructing EPTIC: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription
Constructing EPTIC: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription

This paper presents a novel pipeline for constructing multimodal and multilingual parallel corpora, with a focus on evaluating state-of-the-art automatic speech recognition tools for verbatim transcription. Our findings indicate that current technologies can streamline corpus construction, with fine-tuning showing promising results in terms of transcription quality compared to out-of-the-box Whisper models.

Dec 6, 2024

🏆 Paper accepted at CLiC-it 2024, the Tenth Italian Conference on Computational Linguistics
🏆 Paper accepted at CLiC-it 2024, the Tenth Italian Conference on Computational Linguistics

Constructing EPTIC: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription

Sep 23, 2024

Expanding the European Parliament Translation and Interpreting Corpus: A Modular Pipeline for the Construction of Complex Corpora

The present paper introduces an expanded version of the European Parliament Translation and Interpreting Corpus (EPTIC), a multimodal parallel corpus comprising speeches delivered at the European Parliament along with their official interpretations and translations (see Bernardini et al., 2016; Bernardini et al., 2018). Constructing multimodal and parallel corpora for translation and interpreting studies (TIS) has been acknowledged as a “formidable task” (Bernardini et al., 2018), which – if automated, as we propose – involves a number of subtasks such as automatic speech recognition (ASR), multilingual sentence alignment, and forced alignment, each of which poses its own challenges. Yet tackling these subtasks also offers a unique way to evaluate state-of-the-art natural language processing (NLP) tools against a unique, multilingual benchmark. In this paper we discuss the development of a modular pipeline adaptable for each of these subtasks and address the broader implications of this work for the field of corpus construction.

Sep 15, 2024

🎉 Paper accepted at JTDH, the 14th Conference on Language Technologies and Digital Humanities
🎉 Paper accepted at JTDH, the 14th Conference on Language Technologies and Digital Humanities

Expanding the European Parliament Translation and Interpreting Corpus: A Modular Pipeline for the Construction of Complex Corpora

Jul 5, 2024

CLiC-it 2024 Updates
CLiC-it 2024 Updates

Highlights from the CLiC-it 2024 conference, held in Pisa, Italy.

Feb 7, 2024

Decoding Medical Dramas: Identifying Isotopies through Multimodal Classification
Decoding Medical Dramas: Identifying Isotopies through Multimodal Classification

Classifying audiovisual content using unimodal and multimodal transformer-based models. The study compares two classification strategies: a single multiclass classifier and a one-vs-the-rest approach, examining their performance in both unimodal and multimodal settings. Results show the multiclass multimodal approach achieves the best performance, with an F1 score of 0.723, outperforming the unimodal text-based one-vs-the-rest method.

Dec 23, 2023

✅ Started working on EPTIC, the European Parliament Translation and Interpreting Corpus
✅ Started working on EPTIC, the European Parliament Translation and Interpreting Corpus

The aim of the project is to design a pipeline to expand the existing data and experiment with speech recognition models

Oct 1, 2023