# Coordinate Descent Algorithm

(Redirected from coordinate descent)

## References

### 2018

• (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/coordinate_descent Retrieved:2018-3-11.
• Coordinate descent is an optimization algorithm that successively minimizes along coordinate directions to find the minimum of a function. At each iteration, the algorithm determines a coordinate or coordinate block via a coordinate selection rule, then exactly or inexactly minimizes over the corresponding coordinate hyperplane while fixing all other coordinates or coordinate blocks. A line search along the coordinate direction can be performed at the current iterate to determine the appropriate step size. Coordinate descent is applicable in both differentiable and derivative-free contexts.

### 2018b

• (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Coordinate_descent#Description Retrieved:2018-3-11.
• Coordinate descent is based on the idea that the minimization of a multivariable function $F(\mathbf{x})$ can be achieved by minimizing it along one direction at a time, i.e., solving univariate (or at least much simpler) optimization problems in a loop.[1] In the simplest case of cyclic coordinate descent, one cyclically iterates through the directions, one at a time, minimizing the objective function with respect to each coordinate direction at a time. That is, starting with initial variable values : $\mathbf{x}^0 = (x^0_1, \ldots, x^0_n)$ ,

round $k+1$ defines $\mathbf{x}^{k+1}$ from $\mathbf{x}^k$ by iteratively solving the single variable optimization problems : $x^{k+1}_i = \underset{y\in\mathbb R}{\operatorname{arg\,min}}\; f(x^{k+1}_1, \dots, x^{k+1}_{i-1}, y, x^k_{i+1}, \dots, x^k_n)$ [2]

for each variable $x_i$ of $\mathbf{x}$ , for $i$ from 1 to $n$ .

Thus, one begins with an initial guess $\mathbf{x}^0$ for a local minimum of $F$ , and gets a sequence $\mathbf{x}^0, \mathbf{x}^1, \mathbf{x}^2, \dots$ iteratively.

By doing line search in each iteration, one automatically has : $F(\mathbf{x}^0)\ge F(\mathbf{x}^1)\ge F(\mathbf{x}^2)\ge \dots.$ It can be shown that this sequence has similar convergence properties as steepest descent. No improvement after one cycle of line search along coordinate directions implies a stationary point is reached.

This process is illustrated below.

### 2017a

1. Cite error: Invalid <ref> tag; no text was provided for refs named wright