Dutch Parallel corpus (DPC)

A multifunctional & multilingual Corpus

Aligned parallel corpora form an indispensable resource for a wide range of multilingual applications, a.o. machine translation (especially corpus-based MT such as statistical and example-based MT), computer-assisted translation tools, cross-lingual information extraction, multilingual terminology extraction, and computer-assisted language learning.

Since high-quality parallel corpora with Dutch as the central language do not exist or are not accessible for the research community due to copyright restrictions, the compilation of aligned parallel corpora is one of the priorities of the STEVIN program.

In this project, we want to construct a 10-million-word, high-quality, sentence-aligned parallel corpus for the language pairs Dutch-English and Dutch-French. As the corpus will be bidirectional (Dutch as source and target language), the corpus can also be used as a comparable corpus (to compare texts originally written in Dutch with translated Dutch texts). A part of the corpus will be trilingual and will contain Dutch texts translated into both English and French. The corpus will be enriched with linguistic annotations.

To guarantee the quality of the corpus and its multifunctional availability for the wide research community, each step in compiling, structuring and annotating the corpus will be validated by a user group of specialists in linguistics and language technology. Dutch being the pivotal language, we will collaborate closely with the researchers of the D-COI project, who are compiling a 50-million-word pilot corpus of contemporary written Dutch.

In order to make the corpus accessible for the whole research community, copyright issues are handled in close collaboration with the TST-centrale who is in charge of the distribution of the corpus.

 

Funding

financed by the Nederlandse Taalunie (Dutch Language Union) within the framework of the STEVIN-programme, a multi-year programme stimulating research in Dutch language and speech technology.

In order to realise all this, the knowledge and expertise of the following partners are bundled:

 

Scroll to top