Chain Rule of Calculus
If is differentiable at and is differentiable at (), then the composite function = ∘ defined by = is differentiable at and ′ is given by:
Accumulation
- Forward Accumulation
- Backward Accumulation (backdrop)
Two different ways to do calculations.
Forwards has more steps. Backward only requires the lost functions of the direvatives to get the results.
We only care about the node, and not the entire network for accumulations during the calculation.
Accumulation is not learning.
xx udklip 3 – backward propogation visualization
Generalization to Vectors
xx udklip
xx udklip 1
Vectors follows the same logic.
Backpropropagation methods
- Symbol to number differentation – takes computentioal graph and a set of numerical values (for the inputs to the graph) and returns a set of numerical values that descrie gradient at the input values. Relevant libs: Torch and Caffe.
- Symbol to symbol differentation – takes computentioal graph and returns graph with additional nodes that provides a symbolic description of desired derivatives. Relevant libs: Theano and Tensorflow
Disabling eager excecution in tensorflow makes it use symbl to symbol.