Concept in DL: https://pathmind.com/wiki/neural-network
Text recognition Example: https://au.mathworks.com/help/deeplearning/examples/create-simple-deep-learning-network-for-classification.html;jsessionid=7018848c980ee0578db631fd32eb
you did not mention, is it sensor-based action recognition or camera-based action recognition. In the situation of sensor-based action recognition, a lot of factors involved to improve the accuracy in activity recognition, such as sample rate, window size, feature extraction. In terms of deep learning, initially, you can utilze sample rate and window sizing to improve the accuracy. The more you use the sample size rate and window size, the more you will get the accuracy but the resource will be more utilized. At the time of training, you can try different parameters of the neural network like a different number of hidden layers, a number of neurons, weight optimization through different optimization techniques (Genetic algorithm, Particle Swarm Optimization etc). This parameter optimization techniques can be tried on both recognitions (image-based, sensor-based).