I realize SGD is used for large data sets where the iterative solution could be approximated by a random sample's gradient instead of the sum over all samples. My question is suppose I have a function of mutiple variables say 'd', i,e just one sample, could one use stochastic gradient descent for just the function? I am asking this because in one of the homework problems in Gilbert Strang's Data science course asks you to compute a single step of gradient descent for a function of two variables, it is explicitly mentioned, full gradient descent not stochastic? I wonder why? Login To add answer/comment