I found so many papers aplying the Deep Deterministic Policy Gradient (DDPG) algorithm implementing a critic neural network (NN) architecture where the action vector skips the first layer. That is, the state vector is connected to the first layer, but the actions are connected directly to the second layer of the critic NN.

Actually, in the original DDPG paper ("CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING", Lillicrap 2016) they do that. But they do not explain why.

So... why is this? Which are the advantages of this architecture?

Thanks in advance.

Similar questions and discussions