LUCDH Lunchtime Speaker Series: MacBERTh: A Historically Pre-Trained Language Model for English (1450-1950)

Abstract

Researchers who interpret and analyse historical textual material are well-aware that languages are subject to change over time, and that the way in which concepts and discourses of class, gender, norms and prestige function in different time periods. As such, it is quite important that the interpretation of textual/linguistic material from the past is not approached from a present-day point-of-view, which is why NLP models pre-trained on present-day language data are less than ideal candidates for the job. In this talk, Fonteyn and Manjavacas Arevalo present MacBERTh – a transformer-based language model pre-trained on historical English – and exhaustively assess its benefits on a large set of relevant downstream tasks. Our experiments highlight that, despite some differences across target time periods, pre-training on historical language from scratch outperforms models pre-trained on present-day language and later adapted to historical language.

Date
Mar 2, 2022 11:00 AM — 12:00 PM
Location
P.J. Vethbuilding, room 1.07 (Digital Lab) & Online