To provide medical students with automatic and individual feedback concerning their diagnostic reasoning of a (virtual) patient, given in terms of a diagnostic essay, the content of these essays needs to be automatically assessed. Medical terms mentioned in the essay play a key role for this assessment, in particular, diagnoses, medical tests, and test results (including symptoms observed during the examination of the patient or reported by the patient).
Supervised machine learning algorithms for automatically identifying medical terms would require large amounts of annotated essays, which are costly to create, as medical experts are needed. Furthermore, for German diagnostic essays, there are no datasets available that could be leveraged. The goal of this thesis is thus to investigate unsupervised machine learning approaches for the automatic identification of medical terms in diagnostic essays.