Uncovering Plagiarism, Authorship and Social Software Misuse at CLEF 2013

UKP participated in the first edition of the Author Profiling Task in the PAN Lab at CLEF 2013.

The author profiling task aims at revealing certain categorical information about the author, rather than reveal his/her exact identity. Such categories can be his/her age, his/her gender, but also the native country, degree of education or other socio-demographic information. Beside its obvious applications in marketing, author profiling can be beneficial also in the educational domain, e.g. in large scale screenings of pupils, where it can help to reveal the exceptional talents. In the PAN Challenge task, we examined hundreds of thousands of online documents, e.g. chat lines or blog posts, pointing out that age and gender profiling are not independent problems. Utilizing a wide range of stylistic and content features and a large number of readability measures, we demonstrate the high predictive abilities of the parts of speech, the punctuation and the amount of emotions and slang used in the text, independently of the topic discussed. Our system placed 4th among 20 systems in Spanish and 15th among 21 systems in English in terms of accuracy.

A detailed description of our system is available in the following paper:

Can We Hide in the Web? Large Scale Simultaneous Age and Gender Author Profiling in Social Media – Notebook for PAN at CLEF 2013

Lucie Flekova and Iryna Gurevych

In: CLEF 2013 Labs and Workshops – Notebook Papers, p. to appear, September 2013.