The following is taken from a paper

"Since a human can simultaneously perform multiple actions (e.g., sit and drink), our output layer consists of binary sigmoid classifiers for multilabel action classification (i.e. the predicted action classes do not compete). The training objective is to minimize the binary cross entropy losses between the ground-truth action labels and the scores predicted by the model.

Similar questions and discussions