Natural Language Processing and the Web
Lecture and practice class

Natural Language Processing and the Web

This lecture will present natural language processing (NLP) methods to automatically process large amounts of unstructured text from the web and analyze the use of web data as a resource for other NLP tasks.


  • Lecture: Thursday 09:50-11:30, Online (Zoom) starting October 19th
  • Practice class: Thursday 16:15-17:55, Online (Zoom) starting October 28th
  • Exam
    • Date/Time: (not fixed yet)
    • Room: To be determined
  • RegistrationTUCaN
  • Moodle Course: NLP4Web 21/22
    • Link: NLP4Web 21/22
    • (no key/password necessary)
    • Lecture and exercise material will be provided in the associated Moodle course. Some lectures and additional discussions will take place live, the recordings will also be made available afterwards. You can also find information on this in the Moodle course.
  • Requirements
    • To pass, each student has to take the written exam at the end of the semester.
    • There will also be a practice class with programming exercises in Python. These exercises can contribute to your overall grade.

Teaching Staff

  • Prof. Dr. Iryna Gurevych
  • Dr. Thomas Arnold
  • Max Glockner

We currently do not have fixed office hours, so please contact us by mail to get an appointment.

Course content

Search Engines, Spelling Correction, automatic Question Answering, Translation – the Web is both application area and valuable resource for many useful, everyday applications. This lecture will present Natural Language Processing (NLP) methods to automatically explore the World Wide Web, perform Web Mining and gain insights into open research problems. In our practice sessions, we introduce state-of-the-art NLP toolkits and work on functional NLP projects.

Processing of unstructured web content

  • Introduction
  • NLP Basics – Tokenisation, Part of Speech Tagging, Chunking, Stemming, Lemmatization, Semantic and Syntactic Analysis
  • Web contents and their characteristics, Web Genre Identification

NLP applications for the web

  • Information retrieval – introduction to the basics of information retrieval
  • Web information retrieval – natural language interfaces for web information retrieval
  • Crowdsourcing
  • Argument Mining
  • Question answering (QA): Factoid QA, Knowledge Base QA, Community QA


  • Daniel Jurafsky, James H. Martin, Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 3nd edition, 2019
  • Jacob Eisenstein, Introduction to Natural Language Processing, 2018
  • W. Bruce Croft, Donald Metzler, Trevor Strohman, Search Engines. Information Retrieval in Practice, 2015