ATLAHS Network Simulator Toolchain

Network simulators play a crucial role in evaluating the performance of large-scale systems. However, existing simulators rely heavily on synthetic microbenchmarks or narrowly focus on specific domains, limiting their ability to provide comprehensive performance insights. In this work, we introduce
ATLAHS, which stands for
Application-centric Network Simulator
Too
lchain for
AI,
HPC, and Distributed
Storage, a flexible, extensible, and open-source toolchain designed to trace real-world applications and accurately simulate their workloads. ATLAHS leverages the GOAL format [1] to model communication and computation patterns in AI, HPC, and distributed storage applications.
Source Code
The source code of the entire ATLAHS toolchain is available on GitHub:
https://github.com/spcl/atlahs.
Trace Collection
Along with the toolchain, we also release a set of application traces collected from large-scale systems. The traces can be accessed at
this link. The traces are in both the raw and GOAL formats. The raw traces for HPC applications are collected using
liballprof [2], a profiling library that can be used to collect traces of MPI applications. The raw traces for AI applications are collected using
Nsight Systems, and are stored as nsys-report files. A detailed description of the systems and applications used for trace collection are as follows:
1. AI Applications
The traces are collected from the system with the following configuration:
- System: CSCS Alps
- Number of nodes: 2,688
- Node configuration: 4 NVIDIA Grace Hopper Superchips
- Network topology: Dragonfly
The applications and their configurations are as follows:
App |
Configuration |
DLRM |
4 GPUs 4 Nodes |
Llama 7B |
16 GPUs 4 Nodes |
64 GPUs 16 Nodes |
128 GPUs 32 Nodes |
Llama 70B |
256 GPUs 64 Nodes |
MoE (Mistral) 8x7B |
64 GPUs 16 Nodes |
MoE 8x13B |
128 GPUs 32 Nodes |
MoE 8x70B |
256 GPUs 64 Nodes |
2. HPC Applications
The traces are collected from the system with the following configuration:
- System: CSCS Fat Tree Test-bed Cluster
- Number of nodes: 188
- Node configuration: 20-core Intel Xeon E5-2660 v2 CPU, 32 GB DDR3 RAM, ConnectX-3 56 Gbit/s NIC, Centos 7.3
- Network topology: Fat Tree (18 Mellanox SX6036 switches)
The applications and their configurations are as follows:
Application |
Configuration |
CloverLeaf |
128 Procs 8 Nodes |
HPCG |
128 Procs 8 Nodes |
512 Procs 32 Nodes |
1024 Procs 64 Nodes |
LULESH |
128 Procs 8 Nodes |
432 Procs 27 Nodes |
1024 Procs 64 Nodes |
LAMMPS |
128 Procs 8 Nodes |
512 Procs 32 Nodes |
1024 Procs 64 Nodes |
ICON |
128 Procs 8 Nodes |
512 Procs 32 Nodes |
1024 Procs 64 Nodes |
OpenMX |
128 Procs 8 Nodes |
512 Procs 32 Nodes |
References