Probabilistic-Optimization.org

HKopt: Nonparametric Bayesian Quasi-Newton Optimization

Paper: [pdf] [BibTeX]
Hennig, P. & Kiefel, M.: Quasi-Newton methods: a new direction
Proceedings of the 29th International Conference on Machine Learning (ICML), Edinburgh, June 2012 (Please cite this work when you use this algorithm)
Code: After download, refer to the README.txt file
HK_opt_1.0.zip

Quasi-Newton algorithms are arguably the most popular class of nonlinear numerical optimization methods, used widely in numerical applications not just in machine learning. Their defining property is that they iteratively build estimators for the Hessian of the objective function, from observations of its gradient, at each iteration searching for a local minimum along a line search direction that is an estimate of the eponymous Newton-Raphson search direction. The most widely known members of this family are the BFGS [1-4], and L-BFGS [5] algorithms. Others are Broyden's method [6], the SR1 formula [7,8], and the DFP formula [8,9].

A probabilistic analysis reveals that the popular quasi-Newton algorithms can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates certain aspects of the classic algorithms; for example it becomes clear that they deliberately weaken prior knowledge (such as the symmetry of the Hessian) in return for lower computational cost. In a second step, this new understanding also lights the way to a novel, nonparametric quasi-Newton method, which is able to make more efficient use of available information while remaining within a multiplicative constant of the computational cost of its predecessors.

Along with the paper on this work, we are also publishing a matlab implementation of this new algorithm. Its interface is modelled on, and borrows heavily from, the minimize.m implementation by Carl Rasmussen. Details on the code can be found in an accompanying readme.txt

This is a research-grade implementation of a novel optimization algorithm. While there is considerable theoretical understanding suggesting that this is very good nonlinear optimization method, the numerical implementation is nontrivial. At any rate, it is hard to design a general optimizer that works on most optimization problems. We strongly encourage you to try out our algorithm and read our paper. But we do not guarantee that this code will always work. As the research and development process continues, we will update this algorithm as new insights become available. If you think you have found an interesting fail case of this algorithm, consider writing us, ideally with a simple reduction case.

[1] Broyden, C.G.: A new double-rank minimization algorithm. Notices American Math. Soc. 16:670, 1969

[2] Fletcher, R.: A new approach to variable metric algorithms. The Computer Journal 13(3):317, 1970

[3] Goldfarb, D.: A family of variable metric updates derived by variational means. Mathematics of Computation 24(109):23-26, 1970

[4] Shanno, D.F.: Conditioning of quasi-Newton methods for function minimization. Mathematics of Computation 24(111):657-656, 1970

[5] Nocedal, J.: Updating quasi-Newton matrices with limited storage. Mathematics of Computation, 35 (151): 773-782, 1980

[6] Broyden, C.G.: A class of methods for solving nonlinear simultaneous equations. Mathematics of Computation 19(92):577-593, 1965

[7] Davidon, W.C.: Variable metric method for optimization. Tech. Report, Argonne National Laboratories, Ill. 1959

[8] Broyden, C.G.: Quasi-Newton methods and their application to function minimization. Mathematics of Computation 21(368):45, 1967

[9] Fletcher, R. & Powell, M.J.D.: A rapidly convergent descent method for minimization. The Computer Journal 6(2):163-168, 1963

Probabilistic Optimization

information efficient search and optimization

Algorithms & Publications

People

HKopt: Nonparametric Bayesian Quasi-Newton Optimization