本文為您介紹如何使用AdagradDecay Optimizer進行超大規模訓練。
警告
公用雲GPU伺服器即將過保下線,您可以繼續提交CPU版本的TensorFlow任務。如需使用GPU進行模型訓練,請前往DLC提交任務,具體操作請參見建立訓練任務。
背景資訊
超大規模模型的訓練樣本通常在10億規模以上,且持續增量訓練時間在一個月以上。為解決該問題,PAI-TF推出AdagradDecay最佳化器。
開啟AdagradDecay Optimizer最佳化器
使用AdagradDecay Optimizer最佳化器進行超大規模訓練,需要定義tf.train.AdagradDecayOptimizer。AdagradDecay Optimizer的使用方法與TensorFlow原生Optimizer的使用方法相同,具體定義如下。
class AdagradDecayOptimizer(optimizer.Optimizer):
"""Optimizer that implements the Adagrad algorithm with accumulator decay.
Different from the original Adagrad algorithm, AdagradDecay performs decay
at given step with given rate. So that the accumulator will not be infinity.
"""
def __init__(self,
learning_rate,
global_step,
initial_accumulator_value=0.1,
accumulator_decay_step=100000,
accumulator_decay_rate=0.9,
use_locking=False,
name="AdagradDecay"):
"""Construct a new AdagradDecay optimizer.
Args:
learning_rate: A `Tensor` or a floating point value. The learning rate.
global_step: global step variable, used for calculating t%T .
initial_accumulator_value: A floating point value. Starting and baseline
value for the accumulators, must be positive. The accumulators will not
be less than it.
accumulator_decay_step: When global_step reaches times of
accumulator_decay_step, accumulator will be decayed with
accumulator_decay_rate. accumulator *= accumulator_decay_rate
accumulator_decay_rate: Decay rate as above described.
use_locking: If `True` use locks for update operations.
name: Optional name prefix for the operations created when applying
gradients. Defaults to "AdagradDecay".
Raises:
ValueError: If the `initial_accumulator_value`, `accumulator_decay_step`
or `accumulator_decay_rate` is invalid.
"""