Netgauge - Effective Bisection Bandwidth Measurement

Netgauge Effective Bisection Bandwidth Measurement Description:

The ebb pattern in Netgauge allows the approximative measurment of the effective bisection bandwidth as defined in [1]. The benchmark generates several random bisection patterns and performs bisection communication on those patterns. It then reports the average bandwidth.

General Usage

The benchmarks use Netgauge's high-performance timers for different architectures. Users should make sure that the configure script detected the timer correctly and that it works reliably (no frequency scaling etc.). General help: mpirun -n 1 ./netgauge -x ebb --help Benchmarking the effective bisection bandwidth in 64 InfiniBand nodes (one typically wants large messages for bandwidth measurements, however, the benchmark also allows small messages):
$ mpirun -n 64 ./netgauge -s 1048576-1048576 -x ebb -r 10
# Info:   (0): Netgauge v2.2 MPI enabled (P=64) (./netgauge -s 1048576-1048576 -x ebb -r 10 )
# initializing x86-64 timer (takes some seconds)
size: 1048576, round 0: num: 64 average: 65525.545105 us (320.051057 MiB/s)
size: 1048576, round 1: num: 64 average: 65419.781957 us (320.568479 MiB/s)
size: 1048576, round 2: num: 64 average: 65292.660184 us (321.192611 MiB/s)
size: 1048576, round 3: num: 64 average: 67542.892781 us (310.491884 MiB/s)
size: 1048576, round 4: num: 64 average: 68092.770270 us (307.984532 MiB/s)
size: 1048576, round 5: num: 64 average: 63865.466484 us (328.370263 MiB/s)
size: 1048576, round 6: num: 64 average: 63695.839034 us (329.244741 MiB/s)
size: 1048576, round 7: num: 64 average: 65396.951463 us (320.680392 MiB/s)
size: 1048576, round 8: num: 64 average: 74078.957820 us (283.096855 MiB/s)
size: 1048576, round 9: num: 64 average: 69648.529044 us (301.104995 MiB/s)
# Info:   (0): ---- bucket data ----
size: 1048576 54673.559333 (383.577002 MiB/s): 640
size: 1048576 num: 640 average: 66855.939414 (313.682228 MiB/s)
The last line indicates an average bandwidth of 313 MiB/s which is the effective bisection bandwidth (of course, a real measurement would require many more patterns, e.g., 100,000). This output can be used to compute statistics (e.g., using R), however, netgauge also supports simple statistics (counting the number of seen bandwidthds in buckets). This can be done by supplying the --buckets parameter:
$ mpirun -n 64 ./netgauge -s 1048576-1048576 -x ebb -r 1000 -b 50
... (output as before)
size: 1048576, round 999: num: 64 average: 66695.577922 us (314.436439 MB/s)
# Info:   (0): ---- bucket data ----
size: 1048576 54543.972000 (384.488317 MiB/s): 41834
size: 1048576 62528.967800 (335.388872 MiB/s): 11706
size: 1048576 70513.963600 (297.409462 MiB/s): 4769
size: 1048576 78498.959400 (267.156662 MiB/s): 1802
size: 1048576 86483.955200 (242.490297 MiB/s): 2570
size: 1048576 94468.951000 (221.993785 MiB/s): 751
size: 1048576 102453.946800 (204.692163 MiB/s): 253
size: 1048576 110438.942600 (189.892437 MiB/s): 255
size: 1048576 118423.938400 (177.088520 MiB/s): 24
size: 1048576 126408.934200 (165.902198 MiB/s): 36
size: 1048576 num: 64000 average: 62274.659839 (336.758483 MiB/s)
We see that 41834 of the 64000 (P*runs) connections had 384 MiB/s while 36 connections were heavily congested and had only 165 MiB/s. The relative effective bisection bandwidth can be determined by repeating the experiment with 2 processes. This resulted in 379 MiB/s in our run, thus, the effective bisection bandwidth on 64 nodes of our system is 336/379=0.89 . For InfiniBand systems, you can also use our ORCS simulator to simulate the effective bisection bandwidth.

References

Cluster'08
[1] T. Hoefler, T. Schneider, A. Lumsdaine:
 Multistage Switches are not Crossbars: Effects of Static Routing in High-Performance Networks In Proceedings of the 2008 IEEE International Conference on Cluster Computing, presented in Tsukuba, Japan, IEEE Computer Society, ISSN: 1552-5244, ISBN: 978-1-4244-2640, Oct. 2008, (acceptance rate 30%, 28/92)