1) The test set is not representative and is different from your training set by being unbalanced, covering only easy classes (in the case of supervised problems) etc.
2) If you use stochastic regularization techniques such as dropout which are usually turned off for the evaluation step, then it can be normal to obtain a smoother, less noisy and in some points lower test loss curve.
I'm also assuming you're evaluating the test loss on the entire test set, otherwise it is natural to get batches of test data where your model can perform better than the training data batch.
Some tools report the training loss as an average over the epoch, and calculate the validation loss at the end of the epoch, so while it is training, validation loss is lower.