Text as a Product

Text as an Instance of a Language System

Contrastive comparison of non-canonical grammatical constructions between English and German


Descriptions of natural language grammars tend to focus on the canonical constructions of a language, yet actual usage also displays constructions that are in different ways marked and thus deviate from the canonical form. The inventory of all permissible constructions provides insight into the way in which the possibilities offered by the language system are exploited in actual language use. Non-canonical constructions are an understudied phenomenon which is partly due to the fact that they are less frequent than canonical constructions; it must, furthermore, be taken into consideration that their usage is determined by the context which further restricts the amount of evidence available.

Non-canonical constructions are language specific and thus determined by the range of range of possibilities offered by the language system. Typological features of languages such as freedom of word order etc. must thus be assumed to have an impact on the range of constructions that are marked and deviate from the canonical form.

Corpora provide the necessary empirical foundation for the exploration of properties of text instances and classes of text realizing a spectrum of varieties such as functional, regional and social varieties etc. Corpora are equally indispensable for the investigation of the language system, especially for the description and modelling of grammatical and lexical phenomena and the relations between them as well as the relations between the usage and system.


The project aims to validate the hypothesis that natural language grammars constitute systems of constructions that are centered on a set of canonical constructions of a particular language which are complemented by a set of peripheral non-canonical constructions. The initial hypotheses are:

  • Each sentence structure is associated with a distinctive domain of linguistic function.
  • Canonical sentence structures are associated with more versatile domains of function than non-canonical constructions.
  • Canonical and non-canonical constructions together form different systems of construction for different languages.


  • Collect a broad spectrum of corpora in order to find a sufficient number of instances of these structures
  • Merge corpora into a uniform resource
  • Describe structures like inversion, extraposition, cleft sentences in English and equivalents in German using patterns over automatically identifiable features like parts-of-speech and parses
  • Extract structures from the corpora
  • Compare non-canonical structures vs. canonical structures within a language
  • Compare non-canonical structures between English and German


  • Technische Universität Darmstadt
  • Goethe Universität Frankfurt am Main


  • Prof. Dr. Iryna Gurevych, Principal Investigator, UKP Lab, TU Darmstadt
  • Prof. Dr. Gert Webelhuth, Principal Investigator, IEAS, Goethe Universität Frankfurt am Main
  • Dr. Sabine Bartsch, Principal Investigator
  • Daniela Schröder, Project Staff (Frankfurt)
  • Janina Radó (Frankfurt)
  • Pia Weber, Project Staff (Frankfurt)
  • Richard Eckart de Castilho, Project Staff (Darmstadt)
  • Erik-Lân Do Dinh, Student assistant (Darmstadt)


The LOEWE Research Center “Digital Humanities” is funded by the Hessian excellence program “Landes-Offensive zur Entwicklung Wissenschaftlich-ökonomischer Exzellenz” (LOEWE).