What are the differences between all these cross-entropy losses in Keras and TensorFlow?

There is just one cross (Shannon) entropy defined as: H(P||Q) = – SUM_i P(X=i) log Q(X=i) In machine learning usage, P is the actual (ground truth) distribution, and Q is the predicted distribution. All the functions you listed are just helper functions which accepts different ways to represent P and Q. There are basically 3 … Read more

Cost function training target versus accuracy desired goal

How can we train a neural network so that it ends up maximizing classification accuracy? I’m asking for a way to get a continuous proxy function that’s closer to the accuracy To start with, the loss function used today for classification tasks in (deep) neural nets was not invented with them, but it goes back … Read more

NaN loss when training regression network

Regression with neural networks is hard to get working because the output is unbounded, so you are especially prone to the exploding gradients problem (the likely cause of the nans). Historically, one key solution to exploding gradients was to reduce the learning rate, but with the advent of per-parameter adaptive learning rate algorithms like Adam, … Read more

What function defines accuracy in Keras when the loss is mean squared error (MSE)?

There are at least two separate issues with your question. The first one should be clear by now from the comments by Dr. Snoopy and the other answer: accuracy is meaningless in a regression problem, such as yours; see also the comment by patyork in this Keras thread. For good or bad, the fact is … Read more