• Bio
  • Papers
  • Talks
  • News
  • Experience
  • Projects
  • Teaching
  • Projects
  • Experience
  • Projects
    • sttr-data-analysis
    • ilgiornale-scraping
    • parallel-wikipedia
  • News
    • ๐Ÿ—ž๏ธ Updates and highlights from the CLiC-it 2024 conference in Pisa, Italy
    • ๐Ÿ† Paper accepted at CLiC-it 2024, the Tenth Italian Conference on Computational Linguistics
    • ๐ŸŽ‰ Paper accepted at JTDH, the 14th Conference on Language Technologies and Digital Humanities
    • โœ… Started working on EPTIC, the European Parliament Translation and Interpreting Corpus
  • Publications
    • Constructing EPTIC: A Modular Pipeline and an Evaluation of ASR for Verbatim Transcription
    • Expanding the European Parliament Translation and Interpreting Corpus: A Modular Pipeline for the Construction of Complex Corpora
    • A Corpus for Sentence-Level Subjectivity Detection on English News Articles
    • Decoding Medical Dramas: Identifying Isotopies through Multimodal Classification
  • Recent & Upcoming Talks
    • CLiC-it 2024 Updates
  • Teaching
    • Linear and Logistic Regression
    • Sketch Engine Crash Course

parallel-wikipedia

Feb 4, 2023 ยท 1 min read
Go to Project Site

A semi-automatic approach to in-domain parallel corpora extraction from Wikipedia.

Last updated on Dec 28, 2024
Sentence Similarity Corpus Construction
Alice Fedotova
Authors
Alice Fedotova
Language Technology Researcher

← ilgiornale-scraping May 20, 2023

2025 ยฉ Alice Fedotova. This work is licensed under CC.

Published with Hugo Blox Builder โ€” the free, open source website builder that empowers creators.