We have been analyzing an immunoglobulin locus of a veterinary species, where the exact locus structure is not available. We know that in four individual animals we have X different sequences total, including both paralogous sequences and allelic variants, as well as the same counts for each individual animal. And we can of course say that Y percent of the sequences are found in all of the animals, Z percent in only three, and so on.
Immunoglobulins are tricky in that the allelic variants of one gene are sometimes more different than two paralogous genes, so one can't really tell if two sequences originate from one or two loci.
My question is, can I estimate from these data how many loci there are, and how many allelic variants per loci in average? This is probably ridiculously easy for anyone with decent probability maths, but I'm just a poor biochemist with little mathematic or genetics background...
I'm happy to provide additional details by email (you may email me at [email protected]).