Not sure what you mean by 'combine CNN/LSTM with stemming', but if you have only limited training data but a reasonable stemmer, or a dictionary of stems, you could add input features to the network indicating the position of a stem (start and end) as recognized by the stemmer. This could give a boost in learning if the stemmer covers more cases than your training data, and still use the flexibility of the CNN/RNN's adaptability to new cases.