Educational Text Analytics

Educational Text Analytics: Automatically Grading Text Responses


In this project, we worked towards automatically grading text responses using text analytics. We participated in the 2013 SemEval challenge on textual entailment, applied automatic grading methods to a novel corpus of children's essays, and published a survey article on the state of the art of short answer grading methods, described as follows.

This work is related to the work on language learning exercises.


Textual Entailment Challenge

The International Workshop on Semantic Evaluation (SemEval) provides a forum for researchers to compare the performance of semantic evaluation systems against one another. At SemEval 2013, a particular competition of relevance is the “Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge” for grading free text questions comprising explanations and definitions. The main task involves categorizing a 1-2 sentence answer based on reference answers as either correct or incorrect. Other variants of the task introduce additional categories including “incomplete”, “contradictory”, and “irrelevant”. A final task explores notions of partial entailment for detecting if specific concepts are in the student answers. Our research in this challenge will utilize the text similarity framework from the DKPro library of natural language processing components developed by the Ubiquitous Knowledge Processing Lab. Furthermore, we will also integrate the BIUTEE textual entailment system from the Natural Language Processing Lab at Bar-Ilan University. These groups are collaborating on a combined submission.

Child Essay Grading

A corpus of 645 first grade essays written by children has been collated by the Primary School Pedagogy and Didactics group at the University of Bamberg as part of the DFG project “Narrative Schreibkompetenz in Klasse 1” (NaSch1, English title: “Narrative Writing Skills in the First Class”). The essays are responses to the book “Lucy rettet Mama Kroko” (Doucet & Wilsdorf, 2005, English title: “Lucy rescues mother crocodile”), whereby the students write a letter from the point of view of Lucy to the crocodile. This corpus provides a novel challenge for automatic grading in that the generally short answers are raw in form, whereby students of this age are encouraged to sound out the letters of a word when writing, instead of using real words. In addition, the texts are graded according to 40 criteria including formal criteria (e.g.: word mapping, letter mapping, and word count), linguistic criteria (e.g.: syntax, vocabulary, and coherence), content criteria (e.g.: topics, originality, and understanding), and structural criteria (e.g.: salutation, body sections, and closing). It is envisaged that this corpus will form the basis of empirical work for automatic grading research in the future.

Survey on Short Answer Grading

Automatic grading is needed for efficient and consistent assessment of classroom exercises and college admission tests, especially on the large scale. There exist many types of questions where automatic grading methods can be applied as represented by the graphic below. For multiple-choice questions requiring recognition from a list, automatic grading is a solved problem, as there is only a single correct response to each question to be considered. Today there is momentum in automatic grading for free text questions: here a deeper understanding of the material is required as the student must recall rather than select the answer.

There are unique differences between short answers and essays concerning automatic grading in that: (1) they have less content; (2) they focus on closed-ended questions as opposed to open-ended questions; and (3) there is the trend to focus more on content than style in the assessment. A current survey paper collaboration with the Web Technology and Information Systems group at Bauhaus-Universität Weimar has identified short answer grading as an active research field with around 20 systems described in the literature in the last decade or so. In this respect, the analysis is focusing on the data, method, evaluation measure, and result research dimensions for the purpose of publishing recommendations in a survey paper for the field as a whole.



  • Prof. Dr. Iryna Gurevych
  • Dr. Oliver Ferschke

Former project members

  • Dr. Steven Burrows