Tensorflow 简明教程

TensorFlow - Distributed Computing

本章将重点介绍如何开始使用分布式 TensorFlow。目的是帮助开发人员理解不断重复出现的 TF 服务器等基本的分布式 TF 概念。我们将使用 Jupyter Notebook 来评估分布式 TensorFlow。下面提到了使用 TensorFlow 进行分布式计算的实现−

This chapter will focus on how to get started with distributed TensorFlow. The aim is to help developers understand the basic distributed TF concepts that are reoccurring, such as TF servers. We will use the Jupyter Notebook for evaluating distributed TensorFlow. The implementation of distributed computing with TensorFlow is mentioned below −

Step 1 − 导入分布式计算必需的重要模块−

Step 1 − Import the necessary modules mandatory for distributed computing −

import tensorflow as tf

Step 2 − 使用一个节点创建 TensorFlow 集群。让此节点负责一个名为“worker”且将在 localhost:2222 上操作一个任务的作业。

Step 2 − Create a TensorFlow cluster with one node. Let this node be responsible for a job that that has name "worker" and that will operate one take at localhost:2222.

cluster_spec = tf.train.ClusterSpec({'worker' : ['localhost:2222']})
server = tf.train.Server(cluster_spec)
server.target

以上脚本会生成以下输出:

The above scripts generate the following output −

'grpc://localhost:2222'
The server is currently running.

Step 3 − 可以通过执行以下命令来计算具有相应会话的服务器配置 −

Step 3 − The server configuration with respective session can be calculated by executing the following command −

server.server_def

上述命令生成以下输出 −

The above command generates the following output −

cluster {
   job {
      name: "worker"
      tasks {
         value: "localhost:2222"
      }
   }
}
job_name: "worker"
protocol: "grpc"

Step 4 − 使用 TensorFlow 启动以执行引擎为服务器的会话。使用 TensorFlow 创建一个本地服务器并使用 lsof 查找服务器的位置。

Step 4 − Launch a TensorFlow session with the execution engine being the server. Use TensorFlow to create a local server and use lsof to find out the location of the server.

sess = tf.Session(target = server.target)
server = tf.train.Server.create_local_server()

Step 5 − 查看此会话中可用的设备并关闭相应的会话。

Step 5 − View devices available in this session and close the respective session.

devices = sess.list_devices()
for d in devices:
   print(d.name)
sess.close()

上述命令生成以下输出 −

The above command generates the following output −

/job:worker/replica:0/task:0/device:CPU:0