### HKopt: Nonparametric Bayesian Quasi-Newton Optimization

**Paper:** [pdf] [BibTeX]

Hennig, P. &
Kiefel, M.: *Quasi-Newton methods: a new direction*

Proceedings of the 29th International Conference on Machine
Learning (ICML), Edinburgh, June 2012 (Please cite this work
when you use this algorithm)

**Code:** After download, refer to the README.txt file

HK_opt_1.0.zip

Quasi-Newton algorithms are arguably the most popular class of nonlinear numerical optimization methods, used widely in numerical applications not just in machine learning. Their defining property is that they iteratively build estimators for the Hessian of the objective function, from observations of its gradient, at each iteration searching for a local minimum along a line search direction that is an estimate of the eponymous Newton-Raphson search direction. The most widely known members of this family are the BFGS [1-4], and L-BFGS [5] algorithms. Others are Broyden's method [6], the SR1 formula [7,8], and the DFP formula [8,9].

A probabilistic analysis reveals that the popular quasi-Newton algorithms can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates certain aspects of the classic algorithms; for example it becomes clear that they deliberately weaken prior knowledge (such as the symmetry of the Hessian) in return for lower computational cost. In a second step, this new understanding also lights the way to a novel, nonparametric quasi-Newton method, which is able to make more efficient use of available information while remaining within a multiplicative constant of the computational cost of its predecessors.

Along with the paper on this work, we are also publishing a matlab implementation of this new algorithm. Its interface is modelled on, and borrows heavily from, the minimize.m implementation by Carl Rasmussen. Details on the code can be found in an accompanying readme.txt

*This is a research-grade implementation of a novel
optimization algorithm. While there is considerable theoretical
understanding suggesting that this is very good nonlinear optimization
method, the numerical implementation is nontrivial. At any rate, it is
hard to design a general optimizer that works on most optimization
problems. We strongly encourage you to try out our algorithm and read
our paper. But we do not guarantee that this code will always work. As
the research and development process continues, we will update this
algorithm as new insights become available. If you think you have
found an interesting fail case of this algorithm, consider writing us, ideally with a simple reduction
case.*

[1] Broyden, C.G.: *A new double-rank minimization
algorithm.* Notices American Math. Soc. 16:670, 1969

[2] Fletcher, R.: *A new approach to variable metric
algorithms.* The Computer Journal 13(3):317, 1970

[3] Goldfarb, D.: *A family of variable metric updates
derived by variational means*. Mathematics of Computation
24(109):23-26, 1970

[4] Shanno, D.F.: *Conditioning of quasi-Newton methods for
function minimization*. Mathematics of Computation
24(111):657-656, 1970

[5] Nocedal, J.: *Updating quasi-Newton matrices with
limited storage.* Mathematics of Computation, 35 (151):
773-782, 1980

[6] Broyden, C.G.: *A class of methods for solving nonlinear
simultaneous equations.* Mathematics of Computation
19(92):577-593, 1965

[7] Davidon, W.C.: *Variable metric method for optimization.*
Tech. Report, Argonne National Laboratories, Ill. 1959

[8] Broyden, C.G.: *Quasi-Newton methods and their
application to function minimization.* Mathematics of
Computation 21(368):45, 1967

[9] Fletcher, R. & Powell, M.J.D.: *A rapidly convergent
descent method for minimization*. The Computer Journal
6(2):163-168, 1963