I am trying to intersect methylation data with promoter.bed. This promoter file I made from UCSC refseq gene and gene prediction.bed.
1 chr1 134212401 134213001 NM_001195025 0 +
2 chr1 134212401 134213001 NM_028778 0 +
3 chr1 8349519 8350119 NM_001290392 0 -
4 chr1 8349519 8350119 NM_001290390 0 -
5 chr1 8349519 8350119 NM_027671 0 -
6 chr1 33510352 33510952 NM_008922 0 -
7 chr1 25124020 25124620 NM_175642 0 -
8 chr1 8349519 8350119 NM_001290393 0 -
9 chr1 58714664 58715264 NM_175370 0 -
10 chr1 75482099 75482699 NM_178884 0 -
11 chr1 125228414 125229014 NM_199021 0 -
12 chr1 109050237 109050837 NM_198680 0 -
13 chr1 75492612 75493212 NM_001310679 0 -
14 chr1 167718511 167719111 NM_001113391 0 +
15 chr1 184494837 184495437 NM_130890 0 +
16 chr1 176102606 176103206 NM_011465 0 +
17 chr1 167718511 167719111 NM_031162 0 +
18 chr1 167718511 167719111 NM_001113393 0 +
19 chr1 167718511 167719111 NM_001113392 0 +
20 chr1 167718511 167719111 NR_103716 0 +
after intersection i calculated the average methylation of each unique id .
When i merge the files to get methylation from each file in a single file having the ids and methylation column from each file, it shows the duplication after record number 7725 as follow:
gene chr str end file1 file2
NR_002841 chr6 47604619 47605219 100 100
NR_002841 chr6 47604619 47605219 100 100
NR_002841 chr6 47604619 47605219 100 100
NR_002841 chr6 47621924 47622524 100 100
NR_002841 chr6 47621924 47622524 100 100
NR_002841 chr6 47621924 47622524 100 100
NR_002841 chr6 47621924 47622524 100 100
NR_002841 chr6 47621924 47622524 100 100
i dont know why it is doing like this. although i am using a merge function that worked well uptil merging of 10 files but after it i am having problem like this:
Interestingly, dupplicated regions also include(same id different strt and end sites):
NR_002841 chr6 47621924 47622524 100 100
NR_002841 chr6 47604619 47605219 100 100
NR_002841 chr6 47604619 47605219 100 100
NR_002841 chr6 47621924 47622524 100 100
NR_002841 chr6 47621924 47622524 100 100
NR_002841 chr6 47604619 47605219 100 100
NR_002841 chr6 47604619 47605219 100 100
NR_002841 chr6 47608940 47609540 100 100
NR_002841 chr6 47608940 47609540 100 100
NR_002841 chr6 47604619 47605219 100 100
NR_002841 chr6 47604619 47605219 100 100
NR_002841 chr6 47604619 47605219 100 100