SPCL - ATLAHS

ATLAHS Network Simulator Toolchain

Network simulators play a crucial role in evaluating the performance of large-scale systems. However, existing simulators rely heavily on synthetic microbenchmarks or narrowly focus on specific domains, limiting their ability to provide comprehensive performance insights. In this work, we introduce ATLAHS, which stands for Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage, a flexible, extensible, and open-source toolchain designed to trace real-world applications and accurately simulate their workloads. ATLAHS leverages the GOAL format [1] to model communication and computation patterns in AI, HPC, and distributed storage applications.

Source Code

The source code of the entire ATLAHS toolchain is available on GitHub: https://github.com/spcl/atlahs.

Trace Collection

Along with the toolchain, we also release a set of application traces collected from large-scale systems. The traces can be accessed at this link. The traces are in both the raw and GOAL formats. The raw traces for HPC applications are collected using liballprof [2], a profiling library that can be used to collect traces of MPI applications. The raw traces for AI applications are collected using Nsight Systems, and are stored as nsys-report files. A detailed description of the systems and applications used for trace collection are as follows:

1. AI Applications

The traces are collected from the system with the following configuration:

System: CSCS Alps
Number of nodes: 2,688
Node configuration: 4 NVIDIA Grace Hopper Superchips
Network topology: Dragonfly

App	Configuration
DLRM	4 GPUs 4 Nodes
Llama 7B	16 GPUs 4 Nodes
	64 GPUs 16 Nodes
	128 GPUs 32 Nodes
Llama 70B	256 GPUs 64 Nodes
MoE (Mistral) 8x7B	64 GPUs 16 Nodes
MoE 8x13B	128 GPUs 32 Nodes
MoE 8x70B	256 GPUs 64 Nodes

2. HPC Applications

System: CSCS Fat Tree Test-bed Cluster
Number of nodes: 188
Node configuration: 20-core Intel Xeon E5-2660 v2 CPU, 32 GB DDR3 RAM, ConnectX-3 56 Gbit/s NIC, Centos 7.3
Network topology: Fat Tree (18 Mellanox SX6036 switches)

Application	Configuration
CloverLeaf	128 Procs 8 Nodes
HPCG	128 Procs 8 Nodes
	512 Procs 32 Nodes
	1024 Procs 64 Nodes
LULESH	128 Procs 8 Nodes
	432 Procs 27 Nodes
	1024 Procs 64 Nodes
LAMMPS	128 Procs 8 Nodes
	512 Procs 32 Nodes
	1024 Procs 64 Nodes
ICON	128 Procs 8 Nodes
	512 Procs 32 Nodes
	1024 Procs 64 Nodes
OpenMX	128 Procs 8 Nodes
OpenMX	512 Procs 32 Nodes

References

ICPP'09	[1] T. Hoefler, C. Siebert, A. Lumsdaine:
		Group Operation Assembly Language - A Flexible Way to Express Collective Communication In ICPP-2009 - The 38th International Conference on Parallel Processing, presented in Vienna, Austria, IEEE, ISBN: 978-0-7695-3802-0, Sep. 2009, (acceptance rate 32%, 71/220)

LSAP'10	[2] T. Hoefler, T. Schneider, A. Lumsdaine:
		LogGOPSim - Simulating Large-Scale Applications in the LogGOPS Model In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, presented in Chicago, Illinois, pages 597--604, ACM, ISBN: 978-1-60558-942-8, Jun. 2010, LSAP'10 Best Paper Award