Exercise Difficulty for Language Learning

Predicting and Manipulating Exercise Difficulty for Language Learning


In a labor market that is increasingly globalized, knowledge of one or even more than one foreign language is more relevant than ever before. Due to increased mobility, multilingual skills are also required for private communication as friendships stretch across geographical and linguistic borders.

At the same time, learners experience that the acquired basic foreign language skills deteriorate quickly if they are not trained and improved on a regular basis. However, the static time frame of conventional language courses is often not compatible with the learners’ unstable working conditions and lifestyles. Therefore, many learners turn to online portals for self-directed learning. These portals are becoming increasingly more popular although the provided contents are rather inflexible and limited. So far, adaptive technologies that individually adjust contents to the learners‘ proficiency level, their speed of progress and their learning style are at an early stage of development. In order to generate adaptive exercises with varying difficulty, we need to be able to measure difficulty automatically. In this project, we have developed measures for predicting and manipulating the difficulty of texts, words and exercises for language learners.

Text Difficulty

Research on text difficulty is commonly approximated under the concept of readability. The readability of a text is a measure of the text complexity. Higher readability scores indicate that a text can be comprehended more easily. A manual is expected to have a higher level of readability than a philosophical thesis, for example. Traditional approaches for measuring readability only take the average sentence length and the average word length into account. This is an insufficient approach because it does not take other factors such as lexical-semantic difficulty (e.g. choice of words), syntactic difficulty (e.g. grammatical constructions), and discourse difficulty (e.g. cohesion and coherence) into account. In addition, most readability approaches focus on native speakers of English.

We discuss the range of readability features and their applicability to language learners (L2 readability) and to other languages in:

  • Lisa Beinborn and Torsten Zesch and Iryna Gurevych: Towards fine-grained readability measures for self-directed language learning, in: Proceedings of the Swedish Language Technology Conference: Workshop on NLP for CALL, Vol. 80 (2), S. 11-19, Lund, Schweden, Oktober 2012.
  • Lisa Beinborn and Torsten Zesch and Iryna Gurevych: ‘Readability for foreign language learning: The importance of cognates’, in: International Journal of Applied Linguistics, Vol. 165 (2), S. 136–162, 2014.

Most of the discussed features have been implemented and are available in dkpro-tc-readability.

Word Difficulty

For language learners, the difficulty of single words can lead to severe comprehension problems. The process of learning the basic syntactic structures of an L2 can be considered to be more or less completed at a certain point, but vocabulary acquisition is a continuous process that remains important even for advanced learners. However, assessing whether a learner knows a word is not trivial because word knowledge consists of many factors such as knowledge of the spoken form, the written form, grammatical behavior, collocation behavior, and many others. In this project, we focused on two aspects that are particularly relevant for language learners: cognateness and spelling difficulty.


Cognates are words that are similar in different languages, e.g. “elegance“ in English and “Eleganz“ in German. These words can be particularly helpful for language learners when attempting to comprehend an unknown text. Even if a learner has never seen a foreign word before, she might be able to guess the meaning due to the similarity to words in another language. A list of cognate pairs would thus constitute an important resource for automated exercise generation.

Cognates often follow regular production patterns, e.g. the pairs “ignorance-Ignoranz“, “tolerance-Toleranz” and “redundance-Redundanz” are similar to the “elegance-Eleganz” example. Such regularities enable the application and modification of methods from the field of statistical machine translation for cognate production.

Workflow for cognate production:

This approach is described in:

  • Lisa Beinborn and Torsten Zesch and Iryna Gurevych: ‘Cognate Production using Character-based Machine Translation’, in: Proceedings of the Sixth International JointConference on Natural Language Processing (IJCNLP), S. 883-891, Nagoya, Japan, Oktober 2013.

Data and models can be found here.

2. Spelling Difficulty

Cognates facilitate comprehension, but they often lead to spelling errors. We developed an approach to predict the spelling difficulty of words based on word familiarity features and phonetic features. We evaluated the approach on spelling errors extracted from learner corpora and found that the L1 of the learners has a strong influence on spelling difficulty.

3. The approach is described in:

Lisa Beinborn and Torsten Zesch and Iryna Gurevych: ‘Predicing the Spelling Difficulty of Words for Language Learners’, in: Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications held in conjunction with NAACL 2016: to apear, San Diegao , California, USA, June 2016.

The data is available here.

Exercise Difficulty

Text and word difficulty have a strong influence on the difficulty of text-based exercises. In addition to these content factors, the format of the exercise also plays a role. In this project, we developed a model to predict the difficulty of text-completion exercises. A text-completion exercise is a text in which some words have been completely or partially replaced by a gap. In order to solve a text-completion exercise, the learner needs to fill in the gaps. Exercises differ with respect to the gap format which has an influence on the candidate ambiguity and the deletion rate which has an influence on the item dependencies.

  • Difficulty Model:

This difficulty model has been applied to C-tests, X-tests and cloze tests in English, French and German. The approach and the results can be found in:

  • Lisa Beinborn and Torsten Zesch and Iryna Gurevych: ‘Predicting the Difficulty of Language Proficiency Tests’, in: Transactions of the Association for Computational Linguistics (TACL), Vol. 2 (1), S. 517–529, November 2014.
  • Lisa Beinborn and Torsten Zesch and Iryna Gurevych: ‘Candidate Evaluation Strategies for Improved Difficulty Prediction of Language Tests’, in: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications held in conjunction with NAACL 2015: S. 1–11, Denver, Colorado, USA, Juni 2015.

The data and models can be found here.

Manipulating Difficulty

Based on the difficulty prediction approaches, we can manipulate the difficulty of exercises. In this project, we evaluated two manipulation directions: content selection and distractor substitution.

1. Content selection

We provide a web demo for our difficulty prediction approach that allows test designers to instantaneously approximate the difficulty of a C-test for a chosen text. In a second step, we applied this approach on a text corpus to select appropriate texts for exercises. An evaluation with human experts shows that automatic content selection can be a useful tool, but topic preferences should be taken into account. The generated C-tests and the evaluation results can be found here.

2. Distractor manipulation

The difficulty of cloze exercises is mainly determined by the choice of the distractors. We analyze whether our evaluation strategies for candidate ambiguity can be used to find substitutions for distractors that increase or decrease the difficulty of the exercise. In addition, we provide a substitution dataset that contains noun synonyms extracted from the lexical resource Uby that have been enriched with cognateness and spelling difficulty information. The dataset can be found here.



  • Prof. Dr. Iryna Gurevych
  • Prof. Dr. Torsten Zesch
  • Lisa Beinborn