home articles art bookmarks papers images

loss_fn

Interpreting loss functions:


X, Y = [BATCH_SIZE, FEATURES]
len(X) = BATCH_SIZE

1/len(X) makes sure that the error doesn't depend on the length of X, which is induced by the sum function which is present in most loss fn.
P(X) is true distribution and Q(X) is predicted distribution.

Mean Squared Error

MSE(X, Y) = (1/len(X)) * sum((X-Y) ** 2)

(X-Y) ** 2 results in a huge value if the difference between X and Y vectors are huge. Also, removes the sign and allows you to pass X, Y to the function in any order.

Negative Likelihood Loss

NLL(X) = -1 * (1/len(X) * sum(ln(Q(X_i))), where X_i is the correct label

Surprise(Q(x)) = -ln(Q(x)) becomes +inf when x=0 and is 0 when x=1. Hence, when we pass the probability that the model predicted on the correct label, we get a high value when its closer to 0. We have to pass probabilty(0 -> 1), hence the model's output must go through softmax.

KL-Divergence

KL(X) = sum(P(x) * ln(P(X)/Q(X)))

Entropy(X) = -1 * sum(P(X) * ln(P(X))) = sum(P(X) * -1*ln(P(X))) = sum(P(x) * Surprise(P(X))). Entropy is a weighted surprise sum. If a surprising event happens often in a system, it has high entropy. Cross Entropy(X) = -1 * sum(P(X) * ln(Q(X))) = sum(P(X) * Surprise(Q(X))). An average surprise you will get by observing probability distribution P, while believing in Q.

If we expand the ln in KL(X), we see that KL measures the entropy caused by the incorrect model, as the entropy caused by the system is subtracted away from the cross entropy function.

SMOKE-Y

home articles art bookmarks papers images

loss_fn

Mean Squared Error

Negative Likelihood Loss

KL-Divergence