Scaling TensorFlow to 300 Million Predictions Per Second

By Jan Hartman et al
Published on Sept. 20, 2021
Read the original document by opening this link in a new tab.

Table of Contents

1. Introduction
2. Motivation
3. Challenges
3.1 Implementation
3.2 Serving
3.3 Optimizations
4. Conclusion

Summary

This document describes the process of transitioning machine learning models to the TensorFlow framework at a large scale. The key challenges faced include computer resources, prediction latency, and training throughput. By implementing autobatching in serving, CPU usage was halved and latencies were maintained. Various optimizations were made to improve the speed of the models, such as using a binary data format, optimizing algorithms, and utilizing the correct optimizer. The authors, Jan Hartman and Davorin Kopič, are experts in data science and machine learning, with backgrounds in computer science and artificial intelligence.
×
This is where the content will go.