Copyright Notice:
The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Publications of SPCL
L. Gianinazzi, T. Ben-Nun, M. Besta, S. Ashkboos, Y. Baumann, P. Luczynski, T. Hoefler: | ||
Energy-Optimal and Low-Depth Algorithmic Primitives for Spatial Dataflow Architectures (In Proceedings of the 39th IEEE International Parallel and Distributed Processing Symposium (IPDPS'25), presented in Milano, Italy, IEEE Press, Jun. 2025) AbstractSpatial dataflow architectures, characterized by large arrays of processing elements communicating over a spatially localized on-chip network, offer unprecedented parallelism but introduce unique challenges for algorithm design. Unlike traditional shared-memory parallel systems, these architectures rely on local interconnects, where the distance between elements directly impacts communication efficiency. This necessitates new algorithmic approaches that balance parallelism and spatial locality. Fundamental operations like Parallel Scans, Rank Selection, and Sorting, which form the backbone of many algorithms—including those in graph neural networks and scientific computing—must be carefully adapted to minimize communication overhead. To address these challenges, we adopt the Spatial Computer Model, which quantifies communication costs through two key metrics: energy, representing the total distance traveled by messages (a proxy for network load), and depth, indicating the largest chain of dependent messages (critical for parallelism). In this work, we present the first energy-optimal algorithms for parallel scans, rank selection, and sorting within this model, achieving poly-logarithmic depth while maintaining tight upper and lower bounds on energy and distance. We demonstrate the applicability of these algorithms to the critical problem of sparse matrix-vector multiplication, which is central to scientific workloads and machine learning models. Our results lay the groundwork for designing energy-efficient and scalable algorithms on spatial dataflow architectures, highlighting the potential for further exploration of sparse algorithms and neural networks optimized for these systems.Documentsdownload article:![]() download slides: ![]() | ||
BibTeX | ||
|