LKE/KDSL Guest Lecture: Denilson Barbosa

2014/08/25 by

Denilson Barbosa, associate professor at the University of Alberta, will be visiting on Monday, 1 September 2014. He will give a guest lecture at 11:00 in S2|02 C110.

Title: Discerning Intelligence from Text at the UofA

Abstract: This talk covers two projects at the University of Alberta related to Information Extraction from text.

In the first part, I will give a brief overview of some Open-domain relation extraction tools we have developed and present in more detail our approach for Entity Linking, which are two crucial steps in Information Extraction from text. Entity Linking consists of linking mentions to named entities in a document to their referent entities in an existing Knowledge Base. Current approaches rely on lexical and statistical features which are rich for popular entities but sparse for the unpopular ones. As a result, these methods favour entities with high popularity, and perform poorly on entities with lower popularity. We developed a novel EL method based on a unified semantic representation of entities and documents---the probability distribution of entities being visited during a random walk on an entity graph. Our approach provides a fine-grained representation using the high dimensional space of Wikipedia entities which can overcome the feature sparsity issue in most current approaches.

The second part covers an effective method for detecting controversial content in Wikipedia, motivated by the goal of improving its (currently manual) editorial process. Controversy arises from disagreement among editors over time, and affect several popular topics, such as religion, history, and politics, to name a few, making this manual process inadequate and error-prone. As it turns out, disagreement, bias and conflict are expressed quite differently in Wikipedia compared to other social media, posing new challenges and rendering previous work ineffective. On the other hand, part of the social process of editing articles in Wikipedia is captured through the edit history, opening the door for novel approaches. We describe a controversy model that builds on the interaction history of the editors, not only predicting controversy but also shedding light on the social processes that cause articles to become controversial. We inspect the collaboration history of all pairs of editors collaborating on an article to infer their attitude towards one another, resulting in a social network capturing part of the editorial process. We derive network features rooted in social theories, and apply a classifier to detect controversy.

Biography: Denilson Barbosa is an Associate Professor and the Director of the Science Internship Program at the Department of Computing Science, University of Alberta, currently on sabbatical leave at the Max Planck Institute for Informatics in Saarbrücken, Germany. He received a PhD from the University of Toronto (2005), working on Web data management.

He has worked on databases, Web and natural language processing, with recent emphasis on information extraction from semi-structured and unstructured data. He was a principal investigator and the Leader of the Data Quality Theme of the NSERC Business Intelligence Network. He received the CS Distinguished Teaching Award (2010), an Alberta Ingenuity New Faculty Award, an IBM Faculty Award, and the Best Paper Award at the 2010 IEEE conference on Data Engineering. His undergraduate students received a Best Undergraduate Poster Award at the 2012 ACM SIGMOD Conference.