Pilotgrant DutchSemCor (“Dutch corpus with word senses from the Cornetto database”): NWO (Jan.-Aug. 2008)
Project coordinator of the Pilotgrant by NWO Geesteswetenschappen. The pilotgrant was used to prepare an NWO ‘middel-groot investeringsproject’ submitted in September 2008. The goal of that final project is to deliver a corpus that is fully sense-tagged with senses, ontology tags, and domain tags from the Cornetto database. This corpus will play a key role in language technology research for Dutch and also in linguistic and cognitive research that relates linguistic form to meaning.
Combining the best of both worlds, the corpus will be tagged using a combination of automatic techniques and manual editing. Automatic tagging techniques include on the one hand supervised methods, which can be trained on already tagged subcorpora as training data, enabling them to tag other subcorpora, and on the other hand unsupervised techniques that rely on other sources such as the Cornetto database itself. It is to be expected that the manual editing of the corpus will feed back in the form of adaptations to the semantic database Cornetto.