Dear colleagues,
I am writing to you to request your assistance in evaluating the results of my research on DNA comparison and analysis. I am not an expert in genetic engineering and would like to receive expert feedback on my work.
The following tasks were performed as part of the study:
DNA comparison was performed for influenza viruses of segment A H1N1 H3N2, with the results presented in the bestmatch.json file. An example of element-wise comparison is provided in the pa_pb1.json file. The accuracy of the match is determined by weight, such as "w":0.472249629.
A search was conducted for identical segments in the DNA sequence. The original file is HA.seq. The results are presented in the following format: [{length, number of variations with this length, number of occurrences of these variations in the original larger sequence}].
[ {2, 25, 403380}, {3, 18, 114124}, {4, 16, 31748}, {5, 16, 7710}, {6, 16, 2893}, {7, 14, 685}, {8, 3, 282}, {9, 5, 137}, {10, 3, 3}, {11, 4, 5}, {12, 2, 2}, {13, 5, 135}, {14, 5, 6}, {15, 4, 132}, {16, 5, 6}, {17, 5, 134}, {18, 5, 6}, {19, 4, 132}, {20, 5, 134}, {21, 4, 5}, {22, 5, 132}, {23, 3, 130}, {24, 3, 130}, {25, 2, 2}, {26, 3, 129}, {27, 3, 129}, {28, 3, 129}, {29, 2, 128}, {30, 1, 127}, {50, 16, 32} ]
DNA was divided into "words." The results are presented in the HA_seq.json file.
I would be grateful if someone from the ResearchGate community could provide their professional insight into my results and assist in their analysis.
The source data was obtained from:
https://www.kaggle.com/datasets/premlert/influenza-a-h1n1-h3n2-segment
The result files can be downloaded via the link.
https://disk.yandex.ru/d/uB29TVo67Hdmzw
During the study, I used our data processing technology, KnoDL, which does not require knowledge of data structure, machine learning, or neural network technologies. All operations took an average of 1-2 minutes on a personal laptop.
Sincerely,
Dmitriy Pospelov