LUCDH Lunchtime Speaker Series: MacBERTh: A Historically Pre-Trained Language Model for English (1450-1950)

Name: LUCDH Lunchtime Speaker Series: MacBERTh: A Historically Pre-Trained Language Model for English (1450-1950)
Start: 2022-03-02T11:00:14Z
End: 2022-03-02T12:00:00Z
Location: P.J. Vethbuilding, room 1.07 (Digital Lab) & Online

Lauren Fonteyn & Enrique Manjavacas

Abstract

Researchers who interpret and analyse historical textual material are well-aware that languages are subject to change over time, and that the way in which concepts and discourses of class, gender, norms and prestige function in different time periods. As such, it is quite important that the interpretation of textual/linguistic material from the past is not approached from a present-day point-of-view, which is why NLP models pre-trained on present-day language data are less than ideal candidates for the job. In this talk, Fonteyn and Manjavacas Arevalo present MacBERTh – a transformer-based language model pre-trained on historical English – and exhaustively assess its benefits on a large set of relevant downstream tasks. Our experiments highlight that, despite some differences across target time periods, pre-training on historical language from scratch outperforms models pre-trained on present-day language and later adapted to historical language.

Date

Mar 2, 2022 11:00 AM — 12:00 PM

Event

LUCDH Lunch lecture

Location

P.J. Vethbuilding, room 1.07 (Digital Lab) & Online