Stochastic Gradient Descent
- 11:30AM at REC 307
- Prof. Raghu Pasupathy, Purdue University
- Stochastic Gradient Descent
- Guang Lin
Stochastic Gradient Descent (SGD), also known as stochastic approximation, refers to certain simple iterative structures used for solving stochastic optimization and root finding problems. The identifying feature of SGD is that, much like in gradient descent for deterministic optimization, each successive iterate in the recursion is determined by adding an appropriately scaled gradient estimate to the prior iterate. Owing to several factors, SGD has become the leading method to solve optimization problems arising within large-scale machine learning and ``big data" contexts such as classification and regression. This talk covers the basics of SGD with an emphasis on modern developments. The talk starts with examples where SGD is applicable, and then details important flavors of SGD and reported convergence rate calculations. I will present some numerical examples to aid intuition.