弱鸡咋玩呢?试着让小鸡们联手?
目录
一、在每个小鸡上都安装ray环境
apt-get install python3-pip pip3 install -U ray
二、部署集群
选择一个小鸡作为头节点(head)
ray start --head --port=6379
header运行ray成功
Local node IP: IP-of-head-node
2020-12-23 16:12:46,527 INFO services.py:1092 -- View the Ray dashboard at http://localhost:8265
--------------------
Ray runtime started.
--------------------
Next steps
To connect to this Ray runtime from another node, run
ray start --address='IP-of-head-node:6379' --redis-password='5241590000000000'
Alternatively, use the following Python code:
import ray
ray.init(address='auto', _redis_password='5241590000000000')
If connection fails, check your firewall settings and network configuration.
To terminate the Ray runtime, run
ray stop
选择其他小鸡作为子节点(worker)
根据以上头节点的提示,在所有子节点上运行(注意替换IP-of-head-node)
ray start --address='IP-of-head-node:6379' --redis-password='5241590000000000'
三、测试子节点是否成功和头节点/构成集群
在头节点小鸡上新建一个Python脚本
vim ray_test.py
import ray
import time
# ray.init(address="auto")
ray.init(address='auto', _redis_password='5241590000000000')
@ray.remote
def f():
time.sleep(0.01)
return ray._private.services.get_node_ip_address()
if __name__ == "__main__":
ips = set(ray.get([f.remote() for _ in range(1000)]))
print(ips)
print(len(ips))
这里对官方的实例中的f函数做了修改,否则会报
AttributeError: module 'ray' has no attribute 'services'
错误。
在头节点小鸡上运行脚本
python3 ray_test.py
输出
2020-12-23 16:34:03,700 INFO worker.py:651 -- Connecting to existing Ray cluster at address: 192.3.231.zz:6379
{'67.198.228.xx', '104.168.89.yy', '192.3.231.zz'}
3
可见头节点成功连接上子节点,同时在子节点上运行了f()函数并获取到子节点的IP信息。输出结果表示集群中有三个小鸡,其中有一个是头结点,其余两个是子节点。
四、后记
集群是搭建起来了,接下来可以试着跑一些网络请求密集型的小程序。
参考:
1. 官网文档
2. Ray Distributed AI Framework Curriculum Offered on the Intel® AI DevCloud
3. Modern Parallel and Distributed Python: A Quick Tutorial on Ray
