Text Analytics

Text Analytics: Large-Scale Knowledge Bases and Knowledge Graphs on the Web

Course Content

Update: Registration via Moodle is now open!

Type a query about an art work like “Mona Lisa” to Google, and it will provide you with an explanation about this work, the artist, the date of its painting, the painting techniques, etc. How does this work? What technology and models are behind that? Such questions and related issues will be handled in this research seminar.

The availability of knowledge bases at large scale has given rise to completely new search approaches that leverage the associated semantics. For instance, search engines like Google are going much beyond finding pages that match query terms. Thanks to the new Knowledge Graph technology, Google is now able to discover “entities” and facts about them.

The explosive amounts of information available on the Web motivated research on automatic methods for extracting world and domain knowledge from Web resources and storing them in specialized stores called knowledge bases. Due to the advances in open-domain information extraction, natural language processing (NLP) and machine learning, large knowledge bases (KBs) with millions of facts about our world became available. Distinguished efforts in this direction include approaches based on Wikipedia, such as YAGO, DBpedia, and Freebase and most recently Wikidata. Other approaches handle the entire Web, e.g. NELL and PROSPERA, while others extract knowledge for specific domains, e.g., Rexa. Additionally, there is a growing interest in building large-scale lexical semantic resources (LSR) by linking a wide range of such scattered resources to increase their coverage. Notable endeavors in this direction are the large-scale sense-linked resources UBY and BabelNet. The high availability of such resources opened the door for applications that exploit the associated semantics, as we showed in the above example about Google Knowledge Graph.

The content of this seminar will highlight novel methods for automated construction of knowledge bases as well as large-scale LSRs and investigate related applications. We will deeply analyze most current approaches and discuss their strengths, limitations and possible improvements. Furthermore, we will discuss areas of innovative applications.

Participants of the seminar will be presented the foundations of the available knowledge bases and LSRs, the NLP and information extraction methods as well as current applications.

Organization

Lecture: Thursday 13:30-15:10, Room S105/22

The first class will be held on October 15th 2015.

Additional material will be distributed via the Moodle eLeaning platform. The required passcode will be announced during the first lecture.

Literature

  • Suchanek, Fabian M., Gjergji Kasneci, and Gerhard Weikum. “Yago: a core of semantic knowledge.” Proceedings of the 16th international conference on World Wide Web. ACM, 2007.
  • Bizer, Christian, et al. “DBpedia-A crystallization point for the Web of Data.”Web Semantics: science, services and agents on the world wide web 7.3 (2009): 154-165.
  • Vrandečić, Denny, and Markus Krötzsch. “Wikidata: a free collaborative knowledgebase.” Communications of the ACM 57.10 (2014): 78-85.
  • Mitchell, Tom. Never-ending learning. CARNEGIE-MELLON UNIV PITTSBURGH PA, 2010.
  • Suchanek, Fabian, et al. “Advances in automated knowledge base construction.” SIGMOD Records journal, March (2013).
  • Nakashole, Ndapandula, Martin Theobald, and Gerhard Weikum. “Scalable knowledge harvesting with high precision and high recall.” Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 2011.
  • R. Navigli and S. Ponzetto. BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network. Artificial Intelligence, 193, Elsevier, 2012, pp. 217-250.
  • Gurevych, I., Eckle-Kohler, J., Hartmann, S., Matuschek, M., Meyer, C. M., & Wirth, C. (2012, April). Uby: A large-scale unified lexical-semantic resource based on LMF. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 580-590). Association for Computational Linguistics.

Timetable

The first sessions will feature introductory lectures on knowledge bases, large-scale LSR and NLP. The program for the remainder of the seminar will be determined according to the number of participants and the topics to be discussed.

Date Lecture
15.10.15 Welcome
22.10.15 Introduction to NLP
29.10.15 Introduction to Knowledge Bases
05.11.15 Knowledge Base Construction
12.11.15 Entity Linking
19.11.15
Relation Extraction I
26.11.15 Relation Extraction II
03.12.15 Knowledge Base Population
10.12.15 Knowledge Base Alignment
17.12.15 Applications
07.01.16 Applications
14.01.16 Wrap up

Expectations

Each student is expected to

  • attend the seminar sessions and actively contribute to the discussion in the seminar
  • prepare a presentation on a topic/tool relevant for the seminar
  • perform this presentation and be able to answer questions from the audience
  • prepare a term paper on the topic/tool

Teaching Staff

  • Dr. Hatem Mousselly Sergieh
  • Prof. Dr. Iryna Gurevych

We do not have fixed office hours. Please register via email if you need an appointment.