Cross-lingual Link Discovery at NTCIR-9

UKP participated in the Cross-lingual Link Discovery Task (CrossLink) at the 9th NTCIR Workshop (NTCIR-9) held on 6–9 December 2011 at the National Center of Sciences in Tokyo, Japan.

CrossLingual Link Discovery (CLLD) is a task of discovering potential links between cross-lingual documents. In particular, the task at NTCIR-9 was to find valid anchor texts from a new English Wikipedia page and retrieve the corresponding target Wiki pages in Chinese, Japanese, and Korean languages. The UKP team developed a CLLD framework consisting of anchor selection, anchor ranking, anchor translation, and target discovery subtasks. For anchor selection, anchor ranking, and target discovery, we have largely utilized the state-of-the-art monolingual approaches that had been previously developed at our lab. For anchor translation, we utilize a translation resource constructed from Wikipedia in addition to exploring a number of methods that have been widely used for short phrase translation.

Our formal runs performed very competitively compared to other participants’ systems. Our system came first in the English-to-Chinese and the English-to-Korean File-to-File with manual assessment and Anchor-to-File with Wikipedia ground truth assessment evaluations using Mean-Average-Precision (MAP) measure.

Details of the task and full results for all systems can be found in the task proceedings. Our system is described in detail in the following paper:

  • Jungi Kim and Iryna Gurevych. UKP at CrossLink: Anchor Text Translation for Cross-lingual Link Discovery. In Proceedings of the 9th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access, pages 487–494, December 2011.