Hello everybody!
I'm working in a candidate gene association study, and I'm planning to use PLINK for that purpose. My samples have been sequenced, but the report just provided me with the list of SNPs for each individual in an XLS file. Due to PLINK requires PED and MAP files, I was trying to prepare both from scratch. However, I have some questions.
1. Regarding the MAP file: I only identified 10 SNPs for my study. Since I have the marker ID, I need to complete the details related to chromosome, genetic distance, and physical position. Both the chromosome and physical position can be easily found in well-recognised browsers (e.g. UCSC, or NCBI). What about the genetic distance? Using as an example, the rs501192 and rs2043211, is it correct to complete as follows?
11 rs501192 0 105029658
19 rs2043211 0 48234449
where 11/19 are the chromosomes, rs501192/rs2043211 are the marker IDs, 0 the genetic distance, and 105029658/48234449 the physical positions.
2. Regarding the PED file: as I described, the sequencing provider only gave me the individual SNP for each sample. Due to my sample is constituted by "unrelated" individuals, the information related to family, mother, and father IDs are all cero, right? Can I use any individual ID? My samples are labelled as numeric/alphanumeric codes. Can I use them without changes? Or do you recommend to use a particular label? Regarding the phenotype, can I use any pair of numbers? Using as an example two SNPs (rs501192, rs2043211)
0 U0019 0 0 1 1 C T T T
where 0 is the family ID, U0019 is the individual ID, 0 and 0 are the father/mother IDs, 1 male gender, 1 severe case, C T the pair of alleles for rs501192, and T T the pair of alleles for rs2043211.
I hope you can help me soon! Have a nice day!
Kind regards
B