There are over 1200 gene families. Why is it not so simple to align and trace these genes to obtain the consensus first gene?
Let's forget the chemistry for a moment and assume that the process of duplication, divergence, deletion, hybridization, etc.. have all occurred as transformation of code.
If we align the most related families of genes we can get an estimated consensus of the super-super-families. However, due to other factors like shuffling and frame shifts, it becomes more difficult to get to the original consensus at the beginning of life.
If the theory of RNA as a replicator is correct, then there has to be some ancestor between RNA-exclusive and protein-coding genes.
To make my question more clear: what is really our challenge.. is it really that we do not have enough data or gene sequence of ancestral origins? or is it computationally not plausible yet?
https://www.genenames.org/cgi-bin/genefamilies/