The continually growing data volume yields a paradigm shift in science, industry, and society: Instead of coding all information processing steps manually, we research data-driven approaches that autonomously recognize structures in data and use them to amend their knowledge. Managing huge amounts of data and obtaining relevant information are among the most central topics of today’s society. Humanities research can particularly profit from data-driven approaches to interactively identify and validate new hypotheses in digital data collections.
Machine Learning (ML)
Classification, regression and clustering are the three classic tasks in machine learning. Motivated by the human brain, artificial multi-layered neural networks learn a classifier from single attribute-value tables. However, classic machine learning approaches (including deep learning) cannot be applied if the data spans multiple tables or entire relational databases. Intelligent databases employ statistical relational learning to efficiently generate hypotheses that do not only consider uncertainty, but also context and links between multiple relations. They cover many aspects of artificial intelligence ranging from deductive reasoning to machine learning and optimal decision making. Recent works focus predominantly on the accuracy of the predictions. But for many applications (e.g., in the humanities), we need to interpret the patterns learned from the data, such that we can understand, explain, and justify the automatic predictions. Such interpretable models are a major quality factor to establish trust into the learned methods. We aim at learning interpretable rule-based, preference-based, or (deep) probabilistic graphical models and evaluating them in multiple applications. We put a particular focus on techniques for interacting dynamically with learning algorithms that can also be used by laypeople.
Natural Language Processing (NLP)
A major challenge in natural language processing is the preparation of textual data and the automatic adaptation of methods to varying domains, genres, user groups, and languages. A flexible representation, processing, and presentation of language-related data plays a crucial role in many contexts, since textual data is very heterogeneous (e.g., the genre mix in the Web) and consumed in many different ways (e.g., language learners have totally different demands for feedback than journalists). We closely cooperate with application partners to formalize language-related tasks and research data-driven methods to solve them. Cross-lingual and language-independent methods are particularly important in an increasingly globalized world. Textual data found in the humanities is especially demanding and requires innovative and robust approaches that are able to learn general methods from just a few data points that could yet be analyzed and validated by experts. Additionally, we need interpretable models that allow lay users to understand what a model has learned and why it decided in a certain way as well as to interactively correct errors in the model.
Data Management (DM)
To allow laypeople using data-driven methods, we need to automatically assist the entire data science process from creating and cleaning the data, to learning a model and efficiently use and interact with the learned models. Such Systems ML approaches particularly enable interactive user–model interactions to explore new patterns based on appropriate visualizations or explain what a model learned. But intelligent approaches can learn from the user interaction as well by understanding what was important or which predictions were wrong. A major challenge of interactive systems is the efficient storage and processing of large heterogeneous data.
Data Science is among the fasted growing areas of computer science. Technische Universität Darmstadt conducts excellent foundational research in machine learning, natural language processing, and data management as well as its manifold applications in the humanities and social sciences at the universities Darmstadt, Frankfurt, and Mainz. Beyond that, also the research areas on Computational Engineering and Robotics, Visual Computing, and the Centre for Cognitive Science depend on Data Science methods.
Participating research groups:
Algorithmics (Prof. Dr. Karsten Weihe)
Data Management (Prof. Dr. Carsten Binnig)
Machine Learning (Prof. Dr. Kristian Kersting)
Ubiquitous Knowledge Processing (Prof. Dr. Iryna Gurevych)