Pathways: Asynchronous Distributed Dataflow for ML

By Paul Barham et al.
Published on June 10, 2022
Read the original document by opening this link in a new tab.

Table of Contents

ABSTRACT
INTRODUCTION
DESIGN MOTIVATION
PATHWAYS PROGRAMMING MODEL
PATHWAYS SYSTEM ARCHITECTURE
Resource Manager
Client

Summary

Pathways is a system designed to support distributed machine learning workloads, specifically targeting capabilities needed for future ML research. The system uses a sharded dataflow graph of asynchronous operators to efficiently schedule computations on thousands of accelerators while coordinating data transfers. Pathways enables exploration of new systems and research ideas while maintaining high performance for current models. The design allows for a single-controller model, making it easier to express complex parallelism patterns. The paper discusses the limitations of current distributed ML systems and how Pathways addresses these issues by combining the flexibility of single-controller frameworks with the performance of multi-controller systems. The system architecture includes a Resource Manager responsible for centralized device management and a client interface that allows users to express computations across virtual devices. Pathways programming model supports JAX and TensorFlow, enabling users to run unmodified code on distributed ML systems. Overall, Pathways aims to provide the capabilities needed to support future ML workloads efficiently and effectively.
×
This is where the content will go.