There are different formats for genetic data:
Let we have 100 datasets with 500K SNPs (7-8M after imputation) and 1000 individuals in each dataset. For each dataset it could be its own set of SNPs.
What is the optimal format to store such data? What is the fastest tool to merge them and produce PCA plots?