Trust, but Verify! Better Entity Linking through Automatic Verification by Benjamin Heinzerling, Michael Strube, and Chin-Yew Lin has been accepted at EACL 2017.
Building on work done during a research internship at Microsoft Research Asia, the paper introduces a method for automatically verifying the results produced by an entity linking system, i.e., deciding whether a given entity mention has been linked to the correct entry in a knowledge base such as Wikipedia, or not.
The authors propose three knowledge-rich coherence measures that take into account geographic, temporal, and entity type information, and show that they can be used effectively to improve entity linking results.
For example, consider this beginning of a news article:
DUBLIN 1996-08-31 Result of the Tattersalls Breeders Stakes , a race for two-year-olds run over six furlongs at The Curragh…
While this article is clearly about a horse race in Ireland, current entity linking systems will happily link the entity mention Breeders Stakes to the Wikipedia entry about a Canadian horse race of the same name, since this is the best match, and the Irish horse race doesn't have a Wikipedia entry.
For computational reasons, current systems do not consider geographic, temporal or other aspects of coherence directly. Instead they optimize a generic notion of coherence based on semantic relatedness. In cases like the example above, this can be misleading: Even though the Canadian horse race shows high semantic relatedness to Tattersalls (a race horse auctioner based in the UK and Ireland) and The Curragh (a famous horse race track in Ireland), low geographic coherence is evidence for a potential error.
The proposed automatic verification method avoids the computational issues limiting current systems and allows incorporating knowledge-rich aspects such as geographic coherence, which would be infeasible otherwise.