How can tensor computation runtimes (originally designed for AI) accelerate large-scale OLAP queries over fast RDMA networks and NVMe storage?
2025/11/12
How can tensor computation runtimes (originally designed for AI) accelerate large-scale OLAP queries over fast RDMA networks and NVMe storage? In our new paper co-authored by Jigao Luo, Nils Boeschen, Muhammad El-Hindi, and Carsten Binnig, we introduce PystachIO, a query engine built on PyTorch that leverages existing networking and storage primitives and optimizes them for high-bandwidth processing of distributed analytical queries over large datasets. In the paper, we present our insights on achieving the necessary overlap of computation and communication, minimizing synchronization points, and reducing intermediate data sizes to alleviate memory pressure.