How to compare and make a tree of short and very diverse sequences?

02 February 2023 1 5K Report

Good day! I have a list (~10000) of unique DNA sequences about 10-20 bp.

I want to find out if they could evolve from one or several sequences, or emerged independently.

Some of the sequences have similar motifs and could be aligned, others haven't at all - so I can't just perform MSA and make a tree - the distance matrix contains many NAs.

I've tried using principal components analysis on k-mers (1-4) frequencies but it gives me nothing - the frequencies form one dense cloud of points with PC1 that have only ~4% explained variance.

And I found that universalmotif R package is capable of performing similar analysis using motif_comparison(), so I converted the sequences into sequence motif format (one for each), but when tried it on a short set of data - found that the algorithm works in a very strange way on list of motifs created each from only one sequence. Different methods gives the same result (added tree to the question) - the sequences that are different are placed near instead of sequences that are someway similar...

Karol Szafranski

Your could use the k-mer analysis (k={3,4} seems appropriate) to compute a similarity matrix. Would be the opposite of the distance matrix, which, as you say, is quite incomplete.

Badges
Science topic

More Igor V Gorbenko's questions See All

Why don't I have a PCR product with proofread polymerase?

Dear Colleagues, I have the following problem: I’m trying to amplify a cassette with a resistance gene (about 2 kb) from the yeast genome, so that I can then insert it into a plasmid. With Taq...

03 July 2024 105 11 View

Staining of isolated mitochondria tecnique?

Greetings, dear colleagues! Do you know any technique utilizing some ~ordinary molbiol lab reagents (not the commertial kits for isolated mitochondria staining) that will allow isolated...

07 June 2024 5,909 0 View

How proving Riemann Hypothesis will fasten the development advanced self learning algorythms?

If Proven how Riemann hypothesis will help the development of AI and maybe help us create real advanced self-councious algorythms?

05 June 2024 9,787 3 View

Which relevant sources can be used to find reports on the scope of donations to political parties within the EU?

We are working on gathering raw data and information about the scope/share of private donations and public funding to pilitical parties that sometimes in EU is a part of responsibilities of...

16 April 2024 5,178 0 View

Is it possible to know the dwell time for a given isotope/mass from a raw LA-ICP-MS trace elements data file?

I have a dataset of trace elements LA-ICP-MS analyses with raw data, which consists of a set of a single file for each spot analysis. Each of these files has some columns (masses) and plenty of...

24 March 2024 898 6 View

Why are p-values of Durbin-Watson statistics different for dwtest and durbinWatsonTest in R?

I am analyzing some time-series data. I wrote a script in R and used two methods from two different packages in R to calculate the DW statistics and respective p-values. Surprisingly, for the...

14 March 2024 6,127 1 View

How to normalize expression data with SEM?

Greetings. Probably the question is not complex at all, but can't find an answer. If I have RT-qPCR data of gene expression in a sample with multiple analitycal replicates - to compare it to data...

13 March 2024 7,640 1 View

How can I add my papers with my name with different spelling?

In particular, I have few papers with spelling Agranovskii. Thank you

05 February 2024 3,764 1 View

How to generate causal effect estimates from causal forest in the grf R package for continuous treatments?

I am using the grf R package (https://grf-labs.github.io/grf/reference/causal_forest.html) to obtain causal effect estimates for a continuous treatment variable. The package description says:...

15 January 2024 1,990 0 View

What does a 4-dimensional Euclidean space look like from the point of view of a 3-dimensional observer?

If we keep in mind that R^{4}=R^{+}xS^{3}=R^{3}*RP^{1} where * means a direct product with a singularity at the zero point of a 3-dimensional Euclidean space in which the projective line is...

30 December 2023 9,110 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How to confirm the site-directed mutagenesis result without performing NGS?

I'm cloning a fragment of 3200 nts into plasmid. The cloning was successful, however, 02 amino acids were mutated. Now I want to fix these 02 aa by site-directed mutagenesis technique using...

08 August 2024 4,645 2 View

I can't see the ssDNA band after performing asymmetric PCR. Is there any way to do this?

After performing symmetric PCR, PCR purification was performed. Afterwards, asymmetric PCR was performed using the PCR purification product as a template, but no ssDNA band was confirmed in the...

08 August 2024 1,668 3 View

Does crude extraction using NaOH and Tris work well with Fungi?

I'm trying to find a DNA extraction method for fungi that does not require equipment and heating. Is there anyone who can suggest an alternative option? Thank you

08 August 2024 4,733 2 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View