How to cite:
MacBERTh (English): When using the English model (MacBERTh), please cite the following paper (BibTeX can be found using the ‘cite’ button in ‘Project Publications’):
GysBERT (Dutch): When using the Dutch model (GysBERT), please cite the following paper (BibTeX can be found using the ‘cite’ button in ‘Project Publications’):
MacBERTh and GysBERT are language models (more specifically, BERT models) pre-trained on historical textual material (date range: 1450-1950).
Researchers who interpret and analyse historical textual material are well-aware that languages are subject to change over time, and that the way in which concepts and discourses of class, gender, norms and prestige function in different time periods. As such, it is quite important that the interpretation of textual/linguistic material from the past is not approached from a present-day point-of-view, which is why NLP models pre-trained on present-day language data are less than ideal candidates for the job. That’s where our historical models can help.
MacBERTh, a model pre-trained on historical English (1450-1950), has been published in the huggingface repository. GysBERT, a model pre-trained on historical Dutch (1500-1950), has been published in the huggingface repository.
Because of great efforts by the corpus-linguistic community, a large number of historical texts have been digitized (and sometimes even …
Machine-based exploration of culturally relevant datasets (e.g. newspapers, periodicals, correspondence or annals) often involves …
Linguistic variation and change have long been studied as properties of community-level language, i.e. the “shared system of …
The following teams have used and evaluated MacBERTh: