DKPro Text Classification

DKPro Text Classification

DKPro TC (Text Classification) is a UIMA-based text classification framework built on top of DKPro Core, DKPro Lab and several machine learning frameworks (e.g. the Weka Machine Learning Toolkit). It is intended to alleviate supervised machine learning experiments with any kind of textual data.

DKPro TC comes with

  • Getting-started example code for standard text collections, e.g. the Reuters-21578 Text Categorization corpus, in Java and Groovy
  • many generic feature extractors, e.g. n-grams, POS-tags etc.
  • convenient parameter optimization capabilities
  • comprehensive reporting with support for many standard performance measures
  • support for single- and multi-label classification as well as pair-wise document classification.

Downloads

DKPro TC on GitHub