WikiMwe: a Multiword Expression Resource from Wikipedia

S. Hartmann, G. Szarvas, and I. Gurevych (2011):

Mining Multiword Terms from Wikipedia, in M.T. Pazienza & A. Stellato (Eds.): Semi-Automatic Ontology Development: Processes and Resources, pp. 226-258, Hershey, PA, USA: IGI Global.

Resource Download

  • WikiMwe v1.0 English (2011-08-18, .tar.bz2, 42MB)
  • WikiMwe DTD (2011-08-18, .dtd, 1.2KB)
  • Evaluation data: gold-standard of 2500 WikiMwe terms (2011-06-25, .csv, 124KB)
  • Complete Resource Download is here.

(Coming soon: WikiNe – named entity resource extracted from Wikipedia)

WikiMwe is based on Wikipedia, and therefore available under the Creative Commons Attribution/Share-Alike License (CC-BY-SA).


WikiMwe is a large resource of English multiword expressions mined from Wikipedia. It contains over 350,000 multiword units of size 2-4, including

  • technical terminology,
  • non-compositional multiword expressions, and
  • collocations.

For each entry, POS and frequency information and pointwise mutual information (PMI) scores are included. Additionally, we provide definitional and category information for many entries, to facilitate the application of the resource in theoretical (semantic similarity, domain disambiguation) and applied (terminology extraction) NLP research.

Coming Soon

We are currently working on WikiMwe resources for other languages (starting with German) and on the development of a software package for the language-independent extraction of multiword expressions from Wikipedia. We will make these resources available in the future.

Please contact me if you have any questions regarding the resource: Silvana Hartmann.