Semantic Information Retrieval (SIR)
Integrating semantic relatedness into information retrieval to overcome the problem of term mismatch in query and documents.
Feel free to download our SIR Flyer
An often occurring problem in information retrieval (IR) is the gap between the vocabulary used in formulating the user's information need (topic) and the vocabulary used in writing the documents of the collection to be queried. An example for this problem is the domain of electronic career guidance where an IR system helps young people to decide which profession to choose by automatically computing a ranked list of professions according to the user's interests. The IR system compares a short essay written by the user with descriptions of professions written by domain experts. Typically, people seeking career advice use different words for describing their professional preferences as those employed in the professionally prepared descriptions of professions. Therefore, lexical semantic knowledge and soft matching, i.e. matching semantically related terms, must be especially beneficial to such a system.
Improve the performance of IR on domain specific document collections:
- increase recall (by closing the vocabulary gap)
- increase precision (especially for the first 10 ranks)
- Integrating semantic relatedness into IR models
- Combining linguistic knowledge sources, e.g. German wordnet, and Web 2.0 knowledge sources, e.g. Wikipedia ==> broad coverage
- Darmstadt Knowledge Processing Repository: UIMA components for NLP, IR, and semantic relatedness measures.
- Wikipedia API & Wiktionary API: Programmatic access to locally stored Wikipedia and Wiktionary data.
- Dextract: Software for semantic relatedness experiments.
In 2006 the SIR project team offered a Seminar on Unstructured Information Management at the University of Tübingen.
The Division of Computational Linguistics at the University of Tübingen is co-applicant of the SIR project. Their research focus is on further development of the GermaNet ontology using the BERUFEnet corpus.
In cooperation with the German Federal Agency for Employment (Bundesagentur für Arbeit), we employ semantic information retrieval algorithms to realize electronic career guidance. Using a natural language essay of the person seeking advice, relevant professions are found based on their natural language descriptions.
This project is funded by Deutsche Forschungsgemeinschaft (German Research Foundation).
- Dr. Iryna Gurevych, Principal Investigator
- Prof. Dr. Max Mühlhäuser, Principal Investigator
- Christof Müller, Project Coordinator
- Torsten Zesch, Doctoral Researcher