# Logistic (Log) Loss Function

(Redirected from Log Loss)

A Logistic (Log) Loss Function is a convex loss function that is defined as the negative log-likelihood of a logistic model that returns predicted probabilities for its training data.

## References

### 2021a

• (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Loss_functions_for_classification#Logistic_loss Retrieved:2021-3-7.
• The logistic loss function can be generated using (2) and Table-I as follows : \begin{align} \phi(v) &= C[f^{-1}(v)]+\left(1-f^{-1}(v)\right)\, C'\left[f^{-1}(v)\right] \\ &= \frac{1}{\log(2)}\left [\frac{-e^v}{1+e^v}\log\frac{e^v}{1+e^v}-\left(1-\frac{e^v}{1+e^v}\right)\log\left(1-\frac{e^v}{1+e^v}\right)\right ]+\left(1-\frac{e^v}{1+e^v}\right) \left [\frac{-1}{\log(2)}\log\left(\frac{\frac{e^v}{1+e^v}}{1-\frac{e^v}{1+e^v}}\right)\right] \\ &=\frac{1}{\log(2)}\log(1+e^{-v}). \end{align} The logistic loss is convex and grows linearly for negative values which make it less sensitive to outliers. The logistic loss is used in the LogitBoost algorithm.

The minimizer of I[f] for the logistic loss function can be directly found from equation (1) as : f^*_\text{Logistic}= \log\left(\frac{\eta}{1-\eta}\right)=\log\left(\frac{p(1\mid x)}{1-p(1\mid x)}\right). This function is undefined when p(1\mid x)=1 or p(1\mid x)=0 (tending toward ∞ and −∞ respectively), but predicts a smooth curve which grows when p(1\mid x) increases and equals 0 when p(1\mid x)= 0.5 .

It's easy to check that the logistic loss and binary cross entropy loss (Log loss) are in fact the same (up to a multiplicative constant \frac{1}{\log(2)} ). The cross entropy loss is closely related to the Kullback–Leibler divergence between the empirical distribution and the predicted distribution. The cross entropy loss is ubiquitous in modern deep neural networks.

### 2017b

from math import log
def log_loss(predicted, target):
if len(predicted) != len(target):
print 'lengths not equal!'
return
target = [float(x) for x in target]   # make sure all float values
predicted = [min([max([x,1e-15]),1-1e-15]) for x in predicted]  # within (0,1) interval
return -(1.0/len(target))*sum([target[i]*log(predicted[i]) + \
(1.0-target[i])*log(1.0-predicted[i]) \
for i in xrange(len(target))])
if __name__=='__main__':  # if you run at the command line as 'python utils.py'
actual = [0, 1, 1, 1, 1, 0, 0, 1, 0, 1]
pred = [0.24160452, 0.41107934, 0.37063768, 0.48732519, 0.88929869,
0.60626423, 0.09678324, 0.38135864, 0.20463064, 0.21945892]
print log_loss(pred,actual)


### 2016

def log_loss(solution, prediction, task = 'binary.classification'):
Log loss for binary and multiclass.
[sample_num, label_num] = solution.shape
eps = 1e-15

   pred = np.copy(prediction) # beware: changes in prediction occur through this
sol = np.copy(solution)
if (task == 'multiclass.classification') and (label_num>1):
# Make sure the lines add up to one for multi-class classification
norma = np.sum(prediction, axis=1)
for k in range(sample_num):
pred[k,:] /= sp.maximum (norma[k], eps)
# Make sure there is a single label active per line for multi-class classification
# For the base prediction, this solution is ridiculous in the multi-label case

   # Bounding of predictions to avoid log(0),1/0,...
pred = sp.minimum (1-eps, sp.maximum (eps, pred))
# Compute the log loss
pos_class_log_loss = - mvmean(sol*np.log(pred), axis=0)
if (task != 'multiclass.classification') or (label_num==1):
# The multi-label case is a bunch of binary problems.
# The second class is the negative class for each column.
neg_class_log_loss = - mvmean((1-sol)*np.log(1-pred), axis=0)
log_loss = pos_class_log_loss + neg_class_log_loss
# Each column is an independent problem, so we average.
# The probabilities in one line do not add up to one.
# log_loss = mvmean(log_loss)
# print('binary {}'.format(log_loss))
# In the multilabel case, the right thing i to AVERAGE not sum
# We return all the scores so we can normalize correctly later on
else:
# For the multiclass case the probabilities in one line add up one.
log_loss = pos_class_log_loss
# We sum the contributions of the columns.
log_loss = np.sum(log_loss)
#print('multiclass {}'.format(log_loss))
return log_loss