SMOKE-YFullstack Robotics |
Interpreting loss functions:
X, Y = [BATCH_SIZE, FEATURES]
len(X) = BATCH_SIZE
1/len(X) makes sure that the error doesn't depend on the length of X, which is induced by the sum function
which is present in most loss fn.
MSE(X, Y) = (1/len(X)) * sum((X-Y) ** 2)
(X-Y) ** 2 results in a huge value if the difference between X and Y vectors are huge. Also, removes the sign and allows
you to pass X, Y to the function in any order.
NLL(X) = -1 * (1/len(X) * sum(ln(Q(X_i))), where X_i is the correct label
Surprise(Q(x)) = -ln(Q(x)) becomes +inf when x=0 and is 0 when x=1. Hence, when we pass the probability that the model predicted on the correct
label, we get a high value when its closer to 0. We have to pass probabilty(0 -> 1), hence the model's output must go through softmax.
KL(X) = sum(P(x) * ln(P(X)/Q(X)))
Entropy(X) = -1 * sum(P(X) * ln(P(X))) = sum(P(X) * -1*ln(P(X))) = sum(P(x) * Surprise(P(X))). Entropy is a weighted surprise sum. If a
surprising event happens often in a system, it has high entropy.
Cross Entropy(X) = -1 * sum(P(X) * ln(Q(X))) = sum(P(X) * Surprise(Q(X))). An average surprise you will get by observing
probability distribution P, while believing in Q.
If we expand the ln in KL(X), we see that KL measures the entropy caused by the incorrect model, as the entropy caused by the system is subtracted away from the cross entropy function.