Automated extraction of real-world knowledge from books and its usage in intelligent recommendation systems
For the e-book recommendation systems it can be very helpful to know answers to high-level content questions that readers may have, for example “What is the main hero like?”, “Is the story complicated?” or “Is the book suitable for children?”. The idea of this project is to leverage real-world knowledge resources in order to facilitate estimating answers to such questions with a machine learning system. To reach this goal, the initial research focus lies in identifying suitable approaches to integrate semantic knowledge into the text classification algorithms.
- Create tools which aid in acquiring additional knowledge from book content
- Develop methods for retrieving real-world knowledge from text, e.g. information about the main characters and their relations
- Investigate novel models to assess similarity of books based on the new knowledge available
- As a final step, such information could be integrated into a live recommendation system
Our research contributions include three consecutive steps/tasks:
1. In a first step, our system will extract semantic information from books (e.g., the actions of people involved or the relations between them), The methods will benefit from large lexical-semantic resources, such as Wikipedia, in combination with the information from book data.
2. This information is then used as input data for the classification system, which predicts characteristics of complex text (for example: How dark is the personality of the main character? How many parallel narratives are contained in the book?).
3. These predictions form the basis of the recommendation system for the end user (book reader). Instead of just working with superficial characteristics such as book titles and user behavior, we use complex content understanding to create a model of possible user preferences.
- Prof. Dr. Iryna Gurevych, Principal Investigator
- Lucie Flekova, Doctoral Researcher
The results of the research were successfully used in the VisADoc project.