超参优化工具总结(4)——Ray.tune

Edge Computing

Homepage:https://github.com/ray-project/ray/tree/master/python/ray/tune

特性：集成了多种超参优化方法，运行在Ray分布式计算框架上，拓展性强。Ray是一种高度集成的Automl框架，内容较多，我们在此只总结其结合超参优化的部分——Ray.tune

Tune 结构总览

Tune可接受用户定义的Python function或class，并根据从超参空间中取出的一组超参配置（hyperparameter configurations）对其进行评估；每组超参配置（hyperparameter configurations）组成的评估可称为一次Trail，并且Tune支持多个Trails并行运行。其中配置（configuration）可以从Tune中生成，也可以从用户指定的搜索算法中获得。而Trail由Schedulers进行安排和管理。

使用特色：

用不到10行代码即可完成多节点分布式超参数搜索
支持所有现有机器学习框架（framework），包括PyTorch, XGBoost, MXNet, 和Keras等
使用TensorBoard可视化结果。
能选择任何可扩展的先进算法，例如Population Based Training（PBT）， Vizier’s Median Stopping Rule, HyperBand/ASHA。

使用方法及配置

使用方法：

https://ray.readthedocs.io/en/latest/tune.html

基于pytoch利用small grid search 优化CNN的示例

import torch.optim as optim
from ray import tune
from ray.tune.examples.mnist_pytorch import get_data_loaders, ConvNet, train, test

#定义训练模型及优化器，搜索空间
def train_mnist(config):
    train_loader, test_loader = get_data_loaders()
    model = ConvNet()
    optimizer = optim.SGD(model.parameters(), lr=config["lr"])
    for i in range(10):
        train(model, optimizer, train_loader)
        acc = test(model, test_loader)
        tune.track.log(mean_accuracy=acc)

#调用ray.tune进行优化
analysis = tune.run(
    train_mnist, config={"lr": tune.grid_search([0.001, 0.01, 0.1])})

#输出结果
print("Best config: ", analysis.get_best_config(metric="mean_accuracy"))
#获取分析实验结果的dataframe
df = analysis.dataframe()
#利用tensoboard进行分析
tensorboard --logdir ~/ray_results

分布式使用方法入门

支持系统：macOS + linux

是否支持GPU：是

优化库基于：

ray
所有现有机器学习框架（framework），包括PyTorch, XGBoost, MXNet, 和Keras等

优化算法

先进算法，如Hyperband（最低限度地训练模型来确定超参数的影响）
AsyncHyperBand
基于群体的训练算法（Population Based Training，在共享超参数下同时训练和优化一系列网络）
Nevergrad（）
贝叶斯优化算法和Ax（AX 是PyTorch中贝叶斯优化的一个平台）
Hyperopt方法搜索（Tree-structured Parzen Estimators）和中值停止规则（如果模型性能低于中等性能则停止训练）
Grid Search and Random Search(only uses the default search space and variant generation process 其他都需单独定义)
BayesOpt
Scikit-Optimize
BOHB（Bayesian Optimization HyperBand）

适用范围：Deep Learning, Reinforcement Learning

分布式计算：Ray分布式计算框架（ray.remote执行远程进程操作）

分布式计算

Tune Distributed Experiments

本地集群

cluster_name: local-default
provider:
    type: local
    head_ip: YOUR_HEAD_NODE_HOSTNAME
    worker_ips: [WORKER_NODE_1_HOSTNAME, WORKER_NODE_2_HOSTNAME, ... ]
auth: {ssh_user: YOUR_USERNAME, ssh_private_key: ~/.ssh/id_rsa}
## Typically for local clusters, min_workers == max_workers.
min_workers: 3
max_workers: 3
setup_commands:  # Set up each node.
    - pip install ray torch torchvision tabulate tensorboard

云集群

cluster_name: tune-default
provider: {type: aws, region: us-west-2}
auth: {ssh_user: ubuntu}
min_workers: 3
max_workers: 3
# Deep Learning AMI (Ubuntu) Version 21.0
head_node: {InstanceType: c5.xlarge, ImageId: ami-0b294f219d14e6a82}
worker_nodes: {InstanceType: c5.xlarge, ImageId: ami-0b294f219d14e6a82}
setup_commands: # Set up each node.
    - pip install ray torch torchvision tabulate tensorboard"

搜索空间定义

hyperparameter_space = {
    ""lr"": tune.loguniform(0.001, 0.1),  
    ""dense_1"": tune.uniform(2, 128),
    ""dense_2"": tune.uniform(2, 128),
}（tensorflow） //定义超参和它的搜索空间
def generate_hyperparameters():
    return {
        ""learning_rate"": 10**np.random.uniform(-5, 1),
        ""batch_size"": np.random.randint(1, 100),
        ""momentum"": np.random.uniform(0, 1)
    }（torch）
"

输出示例

1.直接输出结果

以基于Keras对Mnist的优化结果为例：

超参空间为：

config={
            "threads": 2,
            "lr": tune.sample_from(lambda spec: np.random.uniform(0.001, 0.1)),
            "momentum": tune.sample_from(
                lambda spec: np.random.uniform(0.1, 0.9)),
            "hidden": tune.sample_from(
                lambda spec: np.random.randint(32, 512)),
        })

10个Trails，使用24GPUs，8GPUs的环境，每个Trail指定3CPUs+1GPUs，同时可以并行训练6个Trails。结果如下：

训练结果中可看出第3个Trail准确率最高，为0.992，其中Result放入路径:Result logdir: /root/ray_results/exp中，后续可用Tensorboard读取分析。

而每个Trail具体结果也进行输出：（以第八个一个Trail为例）

以基于Torch对Mnist的优化结果为例：

3个Trails，超参空间为：

config={"lr": tune.grid_search([0.001, 0.01, 0.1])})

2.分析结果：

输出结果可用Tensorboard读取，包括训练时每个trial用到的不同超参，以及每个trail loss随时间变化

输出~/ray_results/tune_iris（example）
可用tensorboard读取

更多的示例代码

Ray中GPU使用方法

https://ray.readthedocs.io/en/latest/using-ray-with-gpus.html#gpu-support

1.Ray对GPU的基本调用方法

通过Ray.init()，Ray会自动初始化并检测可用GPU的数量，即tensorflow会自动检测gpu并进行调用(通常会检测所有可用资源，包括CPU，Memory等）；同时，也可以通过ray.init(num_gpus=N) or ray start --num-gpus=N指定（也可通过num_cpus、memory、object_store_memory等对整个程序可调用的cpu数量，内存等进行限制），类似于CUDA_VISIBLE_DEVICES

如：本身有机器性能为：12 CPUs，12 GPUs

Ray.init()后，该程序总共可支配12 CPUs，12 GPUs

Ray.init(num_cpus=6,num_gpus=3)后，该程序可总共可支配6 CPUs，3 GPUs

Ray.init(num_cpus=15,num_gpus=3)后，该程序运行错误，因为超出可支配范围

2.指定每个Trail中GPU等资源使用情况(基于Tensorflow和keras)示例程序

通过给tune.run()传入参数resources_per_trial={}来指定每个Trial所占用的资源如下：

tune.run(
        num_samples=10,
        resources_per_trial={
            "cpu": 3,
            "gpu": 1,
        },
        )

其中num_samples指的是从搜索空间中采样的数量，即Trial的总数量；在机器仍具备可获资源进行Trial时，会并行进行多个Trail，如按以上设定，总可用12GPUs，12CPUs，那么会最多同时并行4个Trail，其余Trail在队列中等待。

3.Ray远程调用GPU(未成功复现，缺少相应远程调用环境)

Ray使远程功能可以在ray.remote() 装饰器中指定其GPU要求:@ray.remote(num_gpus=1)

在远程函数内部，对的调用ray.get_gpu_ids()将返回一个整数列表，指示允许远程函数使用哪些GPU。通常，不需要调用ray.get_gpu_ids()因为Ray会自动设置CUDA_VISIBLE_DEVICES环境变量。实际使用时Ray只能保留GPU，通过TensorFlow的GPU版本来调用GPU

基于ray.remote()分割GPU的例子：

import ray
import time

ray.init(num_cpus=4, num_gpus=1)

@ray.remote(num_gpus=0.25)
def f():
    time.sleep(1)

# The four tasks created here can execute concurrently.
ray.get([f.remote() for _ in range(4)])

原理是在多个任务共享一个GPU时，通过人为的限制，让每个任务所占用的GPU memory不超过所设比例；并且TensorFlow可以通过设定来限制其GPU Memory使用率

编辑于 2023-03-09 01:37・IP 属地未知

深度学习（Deep Learning）

机器学习

超参数