Building a Corpus of Creative Paraphrases

Master Thesis

Creative language is ubiquitous in everyday life, and while deep learning models are adept at handling many types of speech, many aspects of creativity remain challenging. This is largely due to the lack of sufficient data: creative components such as metaphor, irony, humor, and sarcasm are notoriously difficult to annotate, and thus current resources are insufficient. This work aims to explore data collection for creative data, the necessity of good data sources, and methods for overcoming difficulties in creative data collection.