I have been trying to interpret the BCFTools output file for a single member of a small family. My aim is to find homozygous region with high confidence. With default command which is: bcftools roh --AF-dflt 0.4 file.vcf. A block of output is:
ROH sample1 chr1 26373756 1 25.2
ROH sample1 chr1 26375332 1 23.8
ROH sample1 chr1 26375577 1 23.8
ROH sample1 chr1 26375819 1 23.8
ROH sample1 chr1 26377071 1 24.4
ROH sample1 chr1 26377595 1 23.7
ROH sample1 chr1 26377880 1 23.7
ROH sample1 chr1 26378002 1 25.0
ROH sample1 chr1 26380805 1 25.0
ROH sample1 chr1 26381851 1 21.5
ROH sample1 chr1 26386297 1 21.1
ROH sample1 chr1 26399230 1 16.8
ROH sample1 chr1 26455275 1 9.8
ROH sample1 chr1 26455277 0 51.7
ROH sample1 chr1 26470060 0 51.8
ROH sample1 chr1 26481589 0 48.3
ROH sample1 chr1 26485291 0 80.0
ROH sample1 chr1 26486092 0 99.0
ROH sample1 chr1 26488019 0 85.1
ROH sample1 chr1 26489720 0 99.0
ROH sample1 chr1 26491018 0 99.0
ROH sample1 chr1 26492987 0 79.8
ROH sample1 chr1 26496383 0 99.0
ROH sample1 chr1 26496651 0 99.0
ROH sample1 chr1 26496870 0 99.0
ROH sample1 chr1 26497340 0 99.0
ROH sample1 chr1 26504196 0 84.3
ROH sample1 chr1 26505265 0 99.0
ROH sample1 chr1 26506753 0 99.0
ROH sample1 chr1 26506790 0 99.0
ROH sample1 chr1 26508293 0 99.0
ROH sample1 chr1 26510336 0 99.0
ROH sample1 chr1 26510940 0 99.0
ROH sample1 chr1 26511139 0 99.0
ROH sample1 chr1 26511700 0 99.0
My questions are very basic.
Can anyone help me in understanding the output. What does the 6th column signifies? And how can I convert the output into a range of homozygous region? Can I write script and fit logic to extract the region information.
Another problem I faced is regarding calculating alternate allele frequencies on the fly. I have a collection of 110 VCF files from our population which I want to use for this calculation. I have tried a couple of options which didn't work. The software seems to require a single VCF files of a population. So if I use VCFTools Merge option, will it work? What I did so far is made a list in a file of available VCFs and fed with the option -e. But I am getting an error. The reason I want to use these 110 VCFs, is to see whether population specific biasness exists or not?