The idea behind
Gradient Descend is simple, say we have a single valued function
$$f:\mathbb{R}^n\to \mathbb{R}$$
and we are standing at a point $x $ in the domain $\mathbb{R}^n$, and we are about to take a step in the
direction
of some unit vector $v$ with step size $\delta$.
When we want to
maximize our output, we go in the direction of $v=\nabla f(x)$.
And similarly, if we want to
minimize our output, we go in the opposite direction of the gradient,
$-\nabla f(x)$. For the mathematical justification, see my notes at
🖇️.