The input dimension are (100,104,1) in shape and each value could be either 1 or 0. This is basically a multilabel multiclass problem where output needs to be mapped to a 104 bit vector. 104 bit representing 104 classes and input dimensions could mapped to multiple-class at a given time. For that One hot shot isn't an option. I had used sequential neural network with relu as the activation for the hidden layers and sigmoid in the output layer(104 sigmoid function in output layer). The NN Model is predicting same values for every inputs while testing but during training the loss went down to quite low values. Binary cross-entropy has been used as the loss function.