Copyright Notice:

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Publications of SPCL

T. Bonato, A. Kabbani, D. De Sensi, R. Pan, Y. Le, C. Raiciu, M. Handley, T. Schneider, N. Blach, A. Ghalayini, D. Alves, M. Papamichael, A. Caulfield, T. Hoefler:

 FASTFLOW: Flexible Adaptive Congestion Control for High-Performance Datacenters

(arXiv:2404.01630. Sep. 2024)

Abstract

The increasing demand of machine learning (ML) workloads in datacenters places significant stress on current congestion control (CC) algorithms, many of which struggle to maintain performance at scale. These workloads generate bursty, synchronized traffic that requires both rapid response and fairness across flows. Unfortunately, existing CC algorithms that rely heavily on delay as a primary congestion signal often fail to react quickly enough and do not consistently ensure fairness. In this paper, we propose FASTFLOW, a streamlined sender-based CC algorithm that integrates delay, ECN signals, and optional packet trimming to achieve precise, real-time adjustments to congestion windows. Central to FASTFLOW is the QuickAdapt mechanism, which provides accurate bandwidth estimation at the receiver, enabling faster reactions to network conditions. We also show that FASTFLOW can effectively enhance receiver-based algorithms such as EQDS by improving their ability to manage in-network congestion. Our evaluation reveals that FASTFLOW outperforms cutting-edge solutions, including EQDS, Swift, BBR, and MPRDMA, delivering up to 50% performance improvements in modern datacenter networks.

Documents

download article:
access preprint on arxiv:
 

BibTeX

@article{bonato2024fastflow,
  author={Tommaso Bonato and Abdul Kabbani and Daniele De Sensi and Rong Pan and Yanfang Le and Costin Raiciu and Mark Handley and Timo Schneider and Nils Blach and Ahmad Ghalayini and Daniel Alves and Michael Papamichael and Adrian Caulfield and Torsten Hoefler},
  title={{FASTFLOW: Flexible Adaptive Congestion Control for High-Performance Datacenters}},
  journal={arXiv:2404.01630},
  year={2024},
  month={09},
  doi={10.48550/arXiv.2404.01630},
}