# Project window. Using Optimizer on the CONFIG tab

Posted by Yoshiyuki Kobayashi

### 1 Specifying the name of the network used for optimization

Set Network to the name of the network created on the EDIT tab.

### 2 Specifying the name of the dataset used for optimization

Set data to the name of the dataset loaded on the DATASET tab.

### 3 Specifying the parameter update method

From the Config list, select Optimizer.

Select an updater from the following (“Adam” is used by default).

 Updater Update expression Adadelta $$g_t \leftarrow \Delta w_t\\ G_t \leftarrow G_{t-1} + g_t^2\\ w_{t+1} \leftarrow w_t – \frac{\eta}{\sqrt{G_t} + \epsilon} g_t$$ Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method https://arxiv.org/abs/1212.5701 Adagrad $$g_t \leftarrow \Delta w_t\\ G_t \leftarrow G_{t-1} + g_t^2\\ w_{t+1} \leftarrow w_t – \frac{\eta}{\sqrt{G_t} + \epsilon} g_t$$ John Duchi, Elad Hazan and Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf Adam $$m_t \leftarrow \beta_1 m_{t-1} + (1 – \beta_1) g_t\\ v_t \leftarrow \beta_2 v_{t-1} + (1 – \beta_2) g_t^2\\ w_{t+1} \leftarrow w_t – \alpha \frac{\sqrt{1 – \beta_2^t}}{1 – \beta_1^t} \frac{m_t}{\sqrt{v_t} + \epsilon}$$ Kingma and Ba Adam: A Method for Stochastic Optimization. https://arxiv.org/abs/1412.6980 Adamax $$m_t \leftarrow \beta_1 m_{t-1} + (1 – \beta_1) g_t\\ v_t \leftarrow \max\left(\beta_2 v_{t-1}, |g_t|\right)\\ w_{t+1} \leftarrow w_t – \alpha \frac{\sqrt{1 – \beta_2^t}}{1 – \beta_1^t} \frac{m_t}{v_t + \epsilon}$$ Kingma and Ba Adam: A Method for Stochastic Optimization. https://arxiv.org/abs/1412.6980 Momentum $$v_t \leftarrow \gamma v_{t-1} + \eta \Delta w_t\\ w_{t+1} \leftarrow w_t – v_t$$ Ning Qian On the momentum term in gradient descent learning algorithms http://www.columbia.edu/~nq6/publications/momentum.pdf Nag $$v_t \leftarrow \gamma v_{t-1} – \eta \Delta w_t\\ w_{t+1} \leftarrow w_t – \gamma v_{t-1} + \left(1 + \gamma \right) v_t$$ Yurii Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k2) RMSprop $$g_t \leftarrow \Delta w_t\\ v_t \leftarrow \gamma v_{t-1} + \left(1 – \gamma \right) g_t^2\\ w_{t+1} \leftarrow w_t – \eta \frac{g_t}{\sqrt{v_t} + \epsilon}$$ Geoff Hinton Lecture 6a : Overview of mini-batch gradient descent http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf Sgd $$w_{t+1} \leftarrow w_t – \eta \Delta w_t$$

θ: Parameter to be updated

η, α: Learning Rate, Alpha (learning rate)

γ, β1, β2: Momentum, or Decay, Beta1, Beta2 (Decay parameters)

ε: Epsilon (a small value used to prevent division by zero)

### 4 Setting the Weight Decay (L2 regularization) strength

Specify the weight decay coefficient in Weight Decay.

### 5 Gradually decaying the learning rate

Specify the rate for the decay of the learning rate in Learning Rate Multiplier. Specify the interval at which the decay of the learning rate happens in number of mini-batches (NNabla) in “LR Update Interval” (NNabla only). For example, to multiply the learning rate by 0.99 every mini-batch, set Learning Rate Multiplier to 0.9999 and LR Update Interval to 1. To make the learning rate 10 times smaller every 20 epochs, set Learning Rate Multiplier to 0.1 and LR Update Interval to (number of training data samples÷size_of_a_mini_batch)×20= nr_of_minibatches×20

### 6 Updating parameters once every several mini-batches

Specify the parameter update interval in Update Interval. For example, to calculate four gradients using mini-batches containing 64 data samples and then update the parameters using these gradients every four mini-batches, set Batch Size to 64 and Update Interval to 4.

Notes

In order to perform optimization using multiple training networks, the Update Interval must be set to 1.

### 7 Adding a new optimizer

Click the hamburger menu () or right-click the Config list to open a shortcut menu, and click Add Optimizer.

### 8 Renaming an optimizer

1. Click the hamburger menu () or right-click the Config list to open a shortcut menu, and click Rename.
2. Alternatively, on the Config list, double-click the optimizer you want to rename.
3. Type the new name, and press Enter.

### 9 Deleting an optimizer

1. From the Config list, select the optimizer you want to delete.
2. Click the hamburger menu () or right-click the Config list to open a shortcut menu, and click Delete.
3. Alternatively, press Delete on the keyboard.