Optimisation for Machine Learning

The module will cover various theoretical ideas from convex optimisation in high dimensional space, with particular emphasis on those ideas that find application in modern ML. Those include Legendre duality, Lagrange multipliers, and quasi-Newton methods like BFGS and L-BFGS. Focus will then shift to stochastic optimisation, in particular stochastic gradient descent and mini-batch gradient descent with momentum, as well as Nesterov theory. Popular practical stochastic methods will also be covered, which include Adagrad, RMSProp, Adadelta and Adam. The module will then touch on more advanced topics like weight perturbation, conjugate exponential families and stochastic variational inference, and optimisation on Riemannian manifolds.

Upon completion of the module the student will be able to:

  • summarise and critically assess current trends in optimisation for ML, and where the relevant methods are applied;
  • formulate and interpret the mathematical theory of convex optimisation in the context of ML;
  • select and design appropriate methods for a particular optimisation problem in ML;
  • apply relevant optimisation methods to practical ML problems, and explain observed outcomes.