Visualising Complex Data in Education

Visualising Complex Data in Education

Motivation

The project aims at providing support to educational information and educational research in the assessment of complex data.

Goal

The project aims at providing support to educational information and educational research in the assessment of complex data. Focus is placed on natural language data processing as found in many forms in the field of education, such as free text questions in studies or publications.

Method

The project is methodologically based on visual data analysis, wherein automatic processes are used in close connection with visualization techniques. Interactive user surfaces enable users to govern automatic procedures and generate and test hypotheses. Human beings are thus linked to machines and it is possible to approach aspects that might not be processed from a purely computer-based nor manual perspective.

Visual exploration of scientific literature regarding discriminating topics

This part is based on the idea of creating new ways of accessing scientific document collections by enabling users to assess different collections of publications regarding their similarities and disparities. To this end, the topics featuring in the documents are automatically determined. A subsequent analysis is concerned with identifying which of these topics are characteristic for individual subgroups of the collection – and how the contents of such documents can be delimited from the rest of the collection. Visualisation of the findings from the automatic procedure will open up new perspectives to the user regarding the data, and it can reveal insights into correlations and differences that have so far remained undisclosed.

Exploiting knowledge from keywords and search queries

Indexing of resources is a central task in the field of information and documentation. Keywords that have been methodically applied by experts can be used to extract domain knowledge of semantic correlations between terms. Such knowledge can subsequently be used to support users in their search, for example by suggesting related documents that might also be interesting in the context of a search query. Furthermore, it is possible to apply the procedures to the assessment of search queries and come to a better understanding of user interests – for instance, information portals might thus be further optimised.

Particularly in cases where it is yet unclear what exactly is being searched in the data, it is difficult to conduct fully automatized analyses. Hence, the project aims to develop visual tools for analysis that allow for interactive exploration of the data collection. Term networks will be set up on the basis of calculating significant co-occurrences, representing relations among terms. Further visual attributes such as “colour of node“ or “size of node“, strength of links and background shading can be applied to code metadata such as term frequency or strength of relations between terms.

Analysis of texts regarding semantic text attributes

It is often necessary to explore texts not only at the level of terms (e.g. in the case of a search query), but higher level text attributes need to be taken into account to answer a question. Examples for such higher level text attributes are characteristics such as legibility of a text, reliability of websites or the appropriateness of contents for a specific age group of readers.

The German Research Foundation finances the project “Feature-based Visualization and Analysis of Natural Language Documents (VisADoc)”, which in co-operation with the University of Constance (Prof. Daniel A. Keim) investigates how such text characteristics can be rendered measurable.

The analysis of plots in fictional texts presents another example: Focus was in this case placed on investigating the dynamics of relationships amongst different characters in a story. Visualisation of global networks of relations is generally based on static graphs – which do not provide any insight into the dynamic evolvement of a network in the process of a text. UKP-DIPF will thus be concerned with developing advanced visualisation techniques enriching adjacency matrices with pixel-based techniques. Thereby, the development of a social network will be made visible with a high degree of detail.

Partners

Data Analysis and Visualization Group, Dept. Computer Science, University of Constance

People

  • Prof. Dr. Iryna Gurevych
  • Dr. Daniela Oelke