SMOKE-Y

Fullstack Robotics

home articles art bookmarks images

loss_fn

Interpreting loss functions:


X, Y = [BATCH_SIZE, FEATURES]
len(X) = BATCH_SIZE

Mean Squared Error

MSE(X, Y) = (1/len(X)) * sum((X-Y) ** 2) 
(X-Y) ** 2 results in a huge value if the difference between X and Y vectors are huge. Also, removes the sign and allows you to pass X, Y to the function in any order.

Negative Likelihood Loss

NLL(X) = -1 * (1/len(X) * sum(ln(Q(X_i))), where X_i is the correct label
Surprise(Q(x)) = -ln(Q(x)) becomes +inf when x=0 and is 0 when x=1. Hence, when we pass the probability that the model predicted on the correct label, we get a high value when its closer to 0. We have to pass probabilty(0 -> 1), hence the model's output must go through softmax.

KL-Divergence

KL(X) = sum(P(x) * ln(P(X)/Q(X)))
Entropy(X) = -1 * sum(P(X) * ln(P(X))) = sum(P(X) * -1*ln(P(X))) = sum(P(x) * Surprise(P(X))). Entropy is a weighted surprise sum. If a surprising event happens often in a system, it has high entropy. Cross Entropy(X) = -1 * sum(P(X) * ln(Q(X))) = sum(P(X) * Surprise(Q(X))). An average surprise you will get by observing probability distribution P, while believing in Q.

If we expand the ln in KL(X), we see that KL measures the entropy caused by the incorrect model, as the entropy caused by the system is subtracted away from the cross entropy function.