This topic describes how to perform distributed training based on the distributed communication framework gRPC++.

To support larger-scale training and provide better performance, gRPC++ uses multiple optimization technologies to reduce E2E communication latency and improve server throughput. The technologies include the Sharing Nothing architecture, BusyPolling mechanism, user-mode zero-copy, and Send/Recv integration. For typical business scenarios, the training performance of gRPC++ is several times better than that of native TensorFlow.

Enable gRPC++-based distributed training

To use gRPC++ for distributed training, you must add protocol="grpc++" to tf.train.Server.
cluster = tf.train.ClusterSpec({"ps": ps_hosts, "worker": worker_hosts})

server = tf.train.Server(cluster,
                         job_name=FLAGS.job_name,
                         task_index=FLAGS.task_index,
                         protocol="grpc++")