I am trying to intersect methylation data with promoter.bed. This promoter file I made from UCSC refseq gene and gene prediction.bed.

1 chr1 134212401 134213001 NM_001195025 0 +

2 chr1 134212401 134213001 NM_028778 0 +

3 chr1 8349519 8350119 NM_001290392 0 -

4 chr1 8349519 8350119 NM_001290390 0 -

5 chr1 8349519 8350119 NM_027671 0 -

6 chr1 33510352 33510952 NM_008922 0 -

7 chr1 25124020 25124620 NM_175642 0 -

8 chr1 8349519 8350119 NM_001290393 0 -

9 chr1 58714664 58715264 NM_175370 0 -

10 chr1 75482099 75482699 NM_178884 0 -

11 chr1 125228414 125229014 NM_199021 0 -

12 chr1 109050237 109050837 NM_198680 0 -

13 chr1 75492612 75493212 NM_001310679 0 -

14 chr1 167718511 167719111 NM_001113391 0 +

15 chr1 184494837 184495437 NM_130890 0 +

16 chr1 176102606 176103206 NM_011465 0 +

17 chr1 167718511 167719111 NM_031162 0 +

18 chr1 167718511 167719111 NM_001113393 0 +

19 chr1 167718511 167719111 NM_001113392 0 +

20 chr1 167718511 167719111 NR_103716 0 +

after intersection  i calculated the average methylation of each unique id .

When i merge the files to get methylation from each file in a single file having the ids and methylation column from each file, it shows the duplication after record number 7725 as follow:

gene           chr        str       end              file1   file2

NR_002841 chr6 47604619 47605219 100 100

NR_002841 chr6 47604619 47605219 100 100

NR_002841 chr6 47604619 47605219 100 100

NR_002841 chr6 47621924 47622524 100 100

NR_002841 chr6 47621924 47622524 100 100

NR_002841 chr6 47621924 47622524 100 100

NR_002841 chr6 47621924 47622524 100 100

NR_002841 chr6 47621924 47622524 100 100

i dont know why it is doing like this. although i am using a merge function that worked well uptil merging of 10 files but after it i am having problem like this:

Interestingly, dupplicated regions also include(same id different strt and end sites):

NR_002841 chr6 47621924 47622524 100 100

NR_002841 chr6 47604619 47605219 100 100

NR_002841 chr6 47604619 47605219 100 100

NR_002841 chr6 47621924 47622524 100 100

NR_002841 chr6 47621924 47622524 100 100

NR_002841 chr6 47604619 47605219 100 100

NR_002841 chr6 47604619 47605219 100 100

NR_002841 chr6 47608940 47609540 100 100

NR_002841 chr6 47608940 47609540 100 100

NR_002841 chr6 47604619 47605219 100 100

NR_002841 chr6 47604619 47605219 100 100

NR_002841 chr6 47604619 47605219 100 100

More Ahsan Raza's questions See All
Similar questions and discussions