Non-asymptotic Analysis of Momentum for Quadratic Functions
(Polyak’s) Momentum is one of the most popular optimization techniques. When minimizing $f(x)$, gradient descent with momentum is simply
\[x_{t+1}\gets x_t - \eta\nabla f(x_t) + \gamma(x_t-x_{t-1}).\]It is well known that when $f$ is quadratic, i.e. when $f(x)=\frac{1}{2}x^TAx-b^Tx$, gradient descent with momentum converges linearly assuming that $A$ is positive definite. In this case the algorithm is a linear fixed point iteration, so one only needs to analyze the spectrum of the iteration matrix. For instance see the analysis in this and this lecture note.
SGD, Growth Condition, and Finite Sums
Recently, I came across this paper by Vaswani et al. and some other related work showing the linear convergence of SGD. The only additional assumption is, essentially “zero training loss”. This differs so much from the $\tilde\Theta(1/T)$ rate I learnt in class, that I decide it’s worth writing a blog post about.
Calculating the analemma
日行迹始末
(计算日行迹也算是我少年时期的梦想:我在初中时就想定量地解出曲线,在高一时曾经尝试过未果;最终我在高三时又想起了这个问题并轻松地完成了计算。这篇文章基于我高三时的计算以及大一时写的一篇微信推送)
日行迹(analemma) 指的是太阳一周年中每天同一时刻的位置所连成的轨迹。由于计时系统是和平太阳(假象的匀速运动的太阳)的运动对应的,日行迹也反映着真太阳与平太阳的偏差。
Hello world
Hello world!
This is the very first post of this little site. Hope that it will become helpful for its owner and its readers, if any.