Deep Neural Networks
The inherent parallelism in the training and deployment of neural networks makes them a prime candidate for parallel computing. Our research in this area targets both the optimization of neural networks and using them as a tool to understand and improve the performance of arbitrary (parallel) programs. Recently, we started developing tuning methods for deep neural networks, mainly targeting non-functional requirements such as inference speed, energy consumption, and network size under given accuracy constraints. Specific projects aim at efficient design-space exploration methods and the programmability, efficiency, and performance portability of low-level network operations. At the same time, we are using neural networks to support compiler optimization or improve our automatic performance-modeling tool chain.
- Arya Mazaheri, Johannes Schulte, Matthew Moskewicz, Felix Wolf, Ali Jannesari: Enhancing the Programmability and Performance Portability of GPU Tensor Operations. In Proc. of the 25th Euro-Par Conference, Göttingen, Germany, volume 11725 of Lecture Notes in Computer Science, pages 213–226, Springer, August 2019, (best paper award). [PDF] [DOI] [BibTeX]