On Tuesday, February 9th 2016, Prof. Dr. Ansgar Scherp (University of Kiel, Germany) will give a guest lecture at 11:30 in S2|02 B002 (Hochschulstr. 10).
Title: Knowledge Discovery in Social Media and Scientific Digital Libraries
The talk presents selected results of our research in the area of text and data mining in social media and scientific literature. (1) First, we consider the area of classifying microblogging postings like tweets on Twitter. Typically, the classification results are evaluated against a gold standard, which is either the hashtags of the tweets’ authors or manual annotations. We claim that there are fundamental differences between these two kinds of gold standard classifications and conducted an experiment with 163 participants to manually classify tweets from ten topics. Our results show that the human annotators are more likely to classify tweets like other human annotators than like the tweets’ authors (i. e., the hashtags). This may influence the evaluation of classification methods like LDA and we argue that researchers should reflect the kind of gold standard used when interpreting their results. (2) Second, we present a framework for semantic document annotation that aims to compare different existing as well as new annotation strategies. For entity detection, we compare semantic taxonomies, trigrams, RAKE, and LDA. For concept activation, we cover a set of statistical, hierarchy-based, and graph-based methods. The strategies are evaluated over 100,000 manually labeled scientific documents from economics, politics, and computer science. (3) Finally, we present a processing pipeline for extracting text of varying size, rotation, color, and emphases from scholarly figures. The pipeline does not need training nor does it make any assumptions about the characteristics of the scholarly figures. We conducted a preliminary evaluation with 121 figures from a broad range of illustration types.
Ansgar Scherp is Professor for Knowledge Discovery with the Institute of Computer Science, Faculty of Engineering at Kiel University, Germany and associated with Leibniz Information Centre for Economics in Kiel since January 2014. Ansgar Scherp was Juniorprofessor for Media Informatics and was member of the Research Group on Data and Web Science of the University of Mannheim from August 2012 to December 2013. Since April 2013, he was also associated professor with the Institute for Enterprise Systems (InES) in Mannheim. Prior to that he worked as Juniorprofessor for Semantic Web with the University of Koblenz-Landau in the Institute for Information Systems Research since April 2011 and lead the focus group on Interactive and Multimedia Web at the Institute for Web Science and Technologies (WeST) at the same university since May 2008. He has studied computer science at the University of Oldenburg, Germany and finished his PhD with the thesis title “A Component Framework for Personalized Multimedia Applications” at the University of Oldenburg, Germany with distinction in 2006. Afterwards, Mr. Scherp has been EU Marie Curie Fellow with Prof. Ramesh Jain at the Donald Bren School of Information and Computer Sciences, University of California, Irvine, USA in Los Angeles between November 2006 to October 2007. Subsequently, he has lead the University of Koblenz-Landau's activities in the EU Integrated Project WeKnowIt from 2008 to 2011. Here, he has been leading the work packages on knowledge management and mass intelligence and has been member of the project management board and steering board committee. Mr. Scherp has been scientific leader of the EU project SocialSensor, where the University of Koblenz-Landau lead the work package on user modeling and presentation. In December 2011, he has received his Venia Legendi (Habilitation) with the thesis title “Semantic Media Management: Process Innovation along the Value Chain of Media Companies” (in German) from the University of Koblenz-Landau, Germany.