About DIRT

The DIRT corpus consists of Dutch-language reality TV series such as De MolChateau Meiland, and Temptation Island. These are unscripted shows that feature relatively spontaneous, informal Dutch speech.

The first version of the DIRT corpus was created by Ulrike Vogl and Gauthier Delaby in 2021, as part of a student project for the course “Dutch Linguistics: The Contemporary Dutch Language System” and a research track for Dutch bachelor’s students on “Language Use in Reality TV.” For this project, episodes of various reality shows were transcribed following a specific transcription protocol (Ghyselen et al. 2020). The corpus is a work in progress, regularly supplemented with newly transcribed material. Currently, the corpus contains approximately 200,000 words.

The corpus is enriched with metadata, providing information about the speakers’ regional background, gender, education, and age. It includes both older and contemporary shows, featuring Dutch as spoken in both Belgium and the Netherlands.