Copyright Notice:

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Publications of SPCL

D. Klocke, C. Frauen, J. Frederik Engels, D. Alexeev, R. Redler, R. Schnur, H. Haak, L. Kornblueh, N. Brueggemann, F. Chegini, M. Roemmer, L. Hoffmann, S. Griessbach, M. Bode, J. Coles, M. Gila, W. Sawyer, A. Calotoiu, Y. Budanaz, P. Mazumder, M. Copik, B. Weber, A. Herten, H. Bockelmann, T. Hoefler, C. Hohenegger, B. Stevens:

 Computing the Full Earth System at 1km Resolution

(In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'25), presented in St. Louis, MO, USA, Nov. 2025)
Gordon Bell Prize for Climate Modelling

Abstract

The increasing complexity of machine learning models and the proliferation of diverse hardware architectures (CPUs, GPUs, accelerators) make achieving optimal performance a significant challenge. Heterogeneity in instruction sets, specialized kernel requirements for different data types and model features (e.g., sparsity, quantization), and architecture-specific optimizations complicate performance tuning. Manual optimization is resource-intensive, while existing automatic approaches often rely on complex hardware-specific heuristics and uninterpretable intermediate representations, hindering performance portability. We introduce PerfLLM, a novel automatic optimization methodology leveraging Large Language Models (LLMs) and Reinforcement Learning (RL). Central to this is PerfDojo, an environment framing optimization as an RL game using a human-readable, mathematically-inspired code representation that guarantees semantic validity through transformations. This allows effective optimization without prior hardware knowledge, facilitating both human analysis and RL agent training. We demonstrate PerfLLM's ability to achieve significant performance gains across diverse CPU (x86, Arm, RISC-V) and GPU architectures.

Documents

download article:
access preprint on arxiv:
 

BibTeX

@inproceedings{2025gbclimate,
  author={Daniel Klocke and Claudia Frauen and Jan Frederik Engels and Dmitry Alexeev and Rene Redler and Reiner Schnur and Helmuth Haak and Luis Kornblueh and Nils Brueggemann and Fatemeh Chegini and Manoel Roemmer and Lars Hoffmann and Sabine Griessbach and Mathis Bode and Jonathan Coles and Miguel Gila and William Sawyer and Alexandru Calotoiu and Yakup Budanaz and Pratyai Mazumder and Marcin Copik and Benjamin Weber and Andreas Herten and Hendryk Bockelmann and Torsten Hoefler and Cathy Hohenegger and Bjorn Stevens},
  title={{Computing the Full Earth System at 1km Resolution}},
  year={2025},
  month={11},
  booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'25)},
  location={St. Louis, MO, USA},
  doi={10.1145/3712285.3771789},
}