MacBERTh and GysBERT

MacBERTh and GysBERT

Language Models for Historical English and Dutch

PDI-SSH 2020

MacBERTh and GysBERT models

MacBERTh and GysBERT are language models (more specifically, BERT models) pre-trained on historical textual material (date range: 1450-1950).

Researchers who interpret and analyse historical textual material are well-aware that languages are subject to change over time, and that the way in which concepts and discourses of class, gender, norms and prestige function in different time periods. As such, it is quite important that the interpretation of textual/linguistic material from the past is not approached from a present-day point-of-view, which is why NLP models pre-trained on present-day language data are less than ideal candidates for the job. That’s where our historical models can help.

At present, a model pre-trained on historical English (1450-1950) has been published in the huggingface repository. The release of a Dutch historical model, called GysBERT, is planned for 2022.

How to cite:

We have written up a paper describing the English model and its evaluation, which will be published soon. We will add the citation details as soon as they are known.

Past & Upcoming Talks

Related Publications

Have you worked with any of the MacBERTh models? Get in touch with the project team to list your publication here.