Mastering chess better with AlphaVile

PhD students Johannes Czech and Jannis Blüml are researching artificial intelligence (AI) and chess

2024/10/21 by

Johannes Czech and Jannis Blüml work at the Artificial Intelligence and Machine Learning Lab (AIML Lab) at TU Darmstadt. Together with Professor Kristian Kersting, head of the AIML Lab, and Hedinn Steingrimsson, a scientist specialising in AI reliability at Rice University in Houston, Texas, and Safe System 2, they published a research paper on artificial intelligence (AI) in the context of chess in August. This week they are presenting their paper at ECAI, one of the major European conferences for AI research, in Santiago de Compostela.

Chess game against against CrazyAra BOT on the chess demonstrator

Johannes Czech's research focuses on improving AlphaZero, a self-learning computer program that plays chess, go and shogi. Jannis Blüml studies how information can be optimally represented. In their current paper, 'Representation Matters for Mastering Chess: Improved Feature Representation in AlphaZero Outperforms Switching to Transformers', they investigate to what extent the so-called transformer architecture is suitable for AlphaZero, using chess as an example. They also evaluated the influence of changes to the input and output representations.

The Darmstadt team worked with Hedinn Steingrimsson. His research and expertise lie in the field of powerful, reliable and trustworthy modern AI architectures based on neural networks, including neural hybrid architectures such as AlphaVile. The new AlphaVile architecture is a flagship of the successful collaboration. Steingrimsson, who has been a chess grandmaster since 2007, also contributed his many years of experience in chess to the project. Chess has several advantages for basic AI research. On the one hand, the game is complicated enough for this, but on the other hand it is simple enough to be able to check the moves. Chess is deterministic by nature. When you make a move in chess, you know exactly what will happen," says Steingrimsson.

Additional information significantly improves playing strength

Blüml describes their approach as follows: If you want to teach an AI to play chess, the question arises as to what such an AI looks like, i.e. which neural network architecture is used. Research is currently focusing on transformers and large language models. These are large language models like ChatGPT that have a lot of knowledge. On the other hand, the question is what information is given to the architecture. Czech adds: “Transformer models are often used in the language domain, and they can also be very large. Our motivation was to see how these models work with chess. And we found that the classical transformer architecture did not work any better. The architecture was not designed for chess inference, i.e. how fast the output of the neural network is generated.

Their next approach: more efficient transformer modules mixed with special convolutional networks. These convolutional networks, which are very common in image processing, are often used in chess to recognise certain patterns. We have therefore developed a hybrid solution that combines the advantages of convolutional networks for pattern recognition with the capabilities of transformers for long-term planning,” says Steingrimsson. They also modified the input representation to make the architecture more efficient for the network.

There are certain types of information that the network would have to calculate independently. For example, counting how many pawns White has at the moment and how many pawns Black has at the moment. Then it would subtract and see that black has two more pawns. Such global information can also be provided to the network at the same time to give it more capacity for later computations. The network doesn't have to extract the information on its own,“ explains Czech. We have given the network all the situations that we, as chess players, consider important. It's just a few more zeros and ones,” says Blüml. We noticed that this additional information helped a lot in terms of playing strength.

Transformer architectures are resource-intensive

The researchers therefore do not fully share the view that the search for features no longer plays a role in the representation of deep neural networks. With our research we want to show that the search for an improved architecture is important. Feature engineering, i.e. the study of which features or characteristics are important and how they can be represented, should not be neglected,' concludes Blüml. In the field of artificial intelligence, we are currently seeing two directions. On the one hand, the huge transformer architectures such as OpenAI's ChatGPT and Google's Gemini, which are expensive and consume a lot of energy. And on the other hand, hybrid architectures like our AlphaVile. In these, we try to combine the positive features of transformers and convolutional networks to make them powerful, computationally efficient and fast,“ Steingrimsson describes, and Blüml adds: ”Each architecture has its strengths and weaknesses, and the transformer models are not a panacea for all problems. It is important to match both the architecture and the problem representation to the problem.

Die Forscher teilen daher nicht ganz die Ansicht, dass die Suche nach Merkmalen bei der Repräsentation von tiefen neuronalen Netzen keine Rolle mehr spielt. Mit unserer Forschung wollen wir zeigen, dass die Suche nach einer verbesserten Architektur wichtig ist. Das Feature Engineering, also die Untersuchung, welche Merkmale oder Eigenschaften wichtig sind und wie sie dargestellt werden können, sollte nicht vernachlässigt werden“, schließt Blüml. Im Bereich der künstlichen Intelligenz sehen wir derzeit zwei Richtungen. Auf der einen Seite riesige Transformer-Architekturen wie ChatGPT von OpenAI oder Gemini von Google, die teuer sind und viel Energie verbrauchen. Und auf der anderen Seite hybride Architekturen wie unser AlphaVile. Bei diesen versuchen wir, die positiven Eigenschaften von Transformatoren und Faltungsnetzwerken zu kombinieren, um sie leistungsfähig, recheneffizient und schnell zu machen“, beschreibt Steingrimsson, und Blüml ergänzt: „Jede Architektur hat ihre Stärken und Schwächen, und die Transformator-Modelle sind kein Allheilmittel für alle Probleme. Es ist wichtig, sowohl die Architektur als auch die Problemdarstellung an das Problem anzupassen.

Further information

echnical paper ‘Representation Matters for Mastering Chess: Improved Feature Representation in AlphaZero Outperforms Switching to Transformers’

The AIML lab aims to make computers learn as much about the world as humans can, as quickly and flexibly as possible. This raises many fascinating scientific problems. To investigate these, the scientists are developing new methods of machine learning (ML) and artificial intelligence.

The European Conference on Artificial Intelligence (ECAI) will take place from 19 to 24 October 2024 in Santiago de Compostela.

The Safe System 2 Foundation focuses on the socially responsible use of AI. It contributes to the development of safer AI models with System 2 capabilities, including improved logical reasoning and long-term planning tasks.