I highly recommend the following PhD thesis from Gilles Louppe, creator of the RF-package in sklearn: "Understanding Random Forests: From Theory to Practice". He also maintains a github repository for his thesis, where you can find additional material such as code examples.
According to this, Random Forests can be seen as general framework for building ensembles of Decision Trees. The latter could be (for instance) ID3, CART or C4.5.
CART is the norm, but RF is generic and can be implemented with any decision tree algorithm. To add to this, any tree algorithm will be modified to incorporate the feature randomisation done at each node split. Typically only $\sqrt{m}$ or $log_2 m$ where $m$ is the total number of features are drawn randomly at each node split for assessing the goodness of the feature at the split point. Also trees are allowed to grow bushy.
In my experience, the specifics of the tree learning algorithm don't matter very much when you do bagging or random forests. The split criterion already doesn't matter much for a single tree, and when you move to ensembles tree pruning is often unneeded.