Unstructured Information Management: Theory and Applications

Unstructured Information Management: Theory and Applications

Abstract

In recent years, Unstructured Information Management has developed from the subject of academic research to a mature, industrial strength technology. It encompasses a wide range of applications, such as

  • Semantic search
  • Discovery of trends, patterns and relationships from data
  • Information extraction
  • Advanced question answering applications

In 2005, IBM released the Unstructured Information Management Architecture (UIMA), a software architecture supporting the development, integration, and deployment of unstructured information management applications. The UIMA specifies an architecture and framework for developing text analysis engines, collection processing engines, and advanced search capabilities.

In this seminar, the students will learn:

  • Theoretical foundations of unstructured information management
  • Unstructured Information Management Architecture UIMA (IBM, 2005)
  • About the steps involved in building an UIMA-based application
  • Combination of UIMA modules and deployment of the final application

Furthermore, the students will get hands-on experience in practical classes. In a student project, they will design and implement a specific unstructured information management application.

Course dates

  • 20.10.2006
  • 21.10.2006
  • 17.11.2006 9:30-11:00 & 12:00-13:30
  • 18.11.2006 11:00-12:30 & 13:30-15:00
  • 15.12.2006
  • 16.12.2006

Pre-Requisites

  • Programming skills in Java

Literature

  • IBM Systems Journal. Unstructured Information Management. Volume 43, Number 3, 2004.
  • Design and implementation of the UIMA Common Analysis System. T. Götz and O. Suhre. IBM Systems Journal, Volume 43, Number 3, 2004, Unstructured Information Management.
  • Create a UIMA component Web service, Part 1: Create a UIMA application using Eclipse. Nicholas Chase.
  • UIMA Java Framework on Sourceforge

Registration

Registration is closed.

Basic Topics – Students

Wrapping a Parser (BitPar)

Tokenizer / Sentence Splitter – Anne Brock/Laura Kassner

Annotate Wikipedia articles -Desislava Zhekova

Integrate Wikipedia & GermaNet as UIMA resources -Heike Johannsen, Miguel Hormazabal, Ljubomir Zlatkov Visualize annotations

Extended Topics – Students

Named Entitiy Recognition – Aleksandar Savkov Jonathan Khoo/Sladjana Pavlovic

WSD – Bela Usabaev

Sentiment Detection -Maria Tchalakova

Explicit Semantic Analysis (Gabrilovich&Markovich) – Teodora Toncheva

Blogparser – Niels Ott/Ramon Ziai

FAQ

I see that there are two versions available from IBM--1.4 and 2.0 beta.

which should we use?

A:

The two versions are not very different and should be compatible in almost all points.

Further details in the section on backward compatibility in

dl.alphaworks.ibm.com/technologies/uima/UIMA_SDK_Users_Guide_Reference_2.0.pdf

So, it's up to you.

1.4 is likely to contain fewer bugs and it is installed in the SFS computer pool.

2.0 beta has some new features that can make your life easier depending on your task.

Take a look on the new “flow controller”.

I installed Eclipse, Java and UIMA, but it does not work anyway. What should I do?

A:

That's a difficult one. Try do remove Eclipse, Java and UIMA and reinstall (in that order):

  • Java 5.0 V9
  • Eclipse 3.2.1
  • UIMA 2.0.1 beta

If you have different Java versions installed, make sure that Eclipse uses version 5.0.