Learn more efficiently and save resources

Research team develops method for accelerating reinforcement learning

2025/12/05 by Claudia Staub

Robots can learn to perform tasks. However, this learning process often requires large amounts of data and computing time. Researchers at TU Darmstadt have now developed an algorithm that works efficiently even with complex tasks. The research is part of the Cluster of Excellence “Reasonable Artificial Intelligence (RAI)”.

Picture: Yasemin Sevincli

Like humans, robots can learn through trial and error. They experiment and receive feedback: correct decisions lead to a reward and incorrect ones lead to a punishment. This enables them to develop a strategy that maximises rewards, leading to continuous improvement. This method is known as reinforcement learning and enables robots to learn to solve tasks independently.

One disadvantage of this type of learning is that the system must collect a very large number of interactions in order to learn from them — often amounting to several million. This is time-consuming and expensive, it leads to wear and tear on the robots and it also prevents them from being used to solve complex tasks.

Presentation at AI conference

Researchers led by Daniel Palenicek from the Intelligent Autonomous Systems (IAS) group in the Department of Computer Science at TU Darmstadt have developed an algorithm that makes it possible to stabilise and accelerate the complex training process. They achieved this by resolving a common problem known as ‘loss of plasticity’. Similar to a human being who has become ‘stuck’ in a certain way of thinking and can no longer absorb new information, intensive training causes AI to become ‘resistant to learning’ based on early experiences and unable to learn from new data.

To counteract this and maintain the AI's learning ability, the researchers integrated a combination of two different normalisation methods. Together, these have a regulating and stabilising effect on training and help to maintain learning ability and ultimately significantly increase the data efficiency of a wide range of tasks.

The approach of reducing data volumes and interactions for reinforcement learning is an essential component of the RAI Cluster of Excellence. Researchers here are working on developing a new generation of AI systems based, among other things, on sensible use of resources and continuous improvement. ‘We are trying to reduce the amount of data required through the design of our algorithms,’ says Palenicek. ‘This saves interactions with the real system as well as time, computing power and, ultimately, energy and CO2.’

The study ‘Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalisation’ will be presented on 5 December at the renowned Conference on Neural Information Processing Systems (NeurIPS) in San Diego (USA).

Publication

Daniel Palenicek, Florian Vogt, Joe Watson, Jan Peters: “Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization“, in: Advances in Neural Information Processing Systems 38 (NeurIPS 2025)

About RAI

The RAI Cluster of Excellence, led by Darmstadt Technical University, is dedicated to developing a new generation of AI systems based on sensible use of resources, data protection and continuous improvement. With four research areas, multidisciplinary teams are working to shape the future of AI.