How do you create a SNPs alignment (fasta) from combined vcf without gaps where there is not vcf info?

More Ana Valero Rello's questions See All

Can I change concentration and brand of antibodies mid-experiment?

I am doing immunohistochemistry of the c-Fos protein and I have a budget problem. For context, the experiment has been done over 3 years prior to my arrival and I have 40 brains done and 100 left....

29 July 2024 5,961 2 View

A variable moderating multiple relation - how can I test hypotheses using amos?

How can I analyse the relationship between five independent variables and one dependent variable, with a moderating variable influencing each of these five relationships? How can I test hypotheses...

03 June 2024 6,221 1 View

Any experience in fibroblast centrifugation?

We are experiencing different problems pelleting fibroblast: a) From the resuspension of the comercial vials (after defrosting, we resuspended in 5ml complete DMEM + L-Glutamin + FBS +...

28 May 2024 3,529 1 View

Fibroblast trypsinization in Hanks medium with Ca and Mg?

We are setting up fibroblast cultures in a new location and we had to change the mediums suppliers. At the trypsinization step trypsin did not work and we realized our hanks medium has Ca and Mg....

28 May 2024 6,052 2 View

Are the calibration curves of NO radical scavenging assay asymtotes?

Hello. I am here looking for help with the NO radical scavenging assay. I was working on setting up the assay with sodium nitroprusside and the Griess reagent as they do in most works. However, I...

27 May 2024 5,013 1 View

How to separate glucose from the matrix prior to HPLC-ELSD aminoglucoside analysis?

Hi, we're trying to determine two amiglucosides antibiotics and one aminoacid in a powder mix whose principal ingredient is glucose: 60 % w/w. The antibiotics are amikazine and apramycin, the...

22 May 2024 5,636 0 View

Inconsistente results with cell line LX-2, any advice?

Hi there! I'm working on my master's thesis and for my project I'm testing a new curcumin derivative in liver fibrosis. I'm working with the cell line LX-2 and have been having problems since the...

21 May 2024 2,171 0 View

Will ultracentrifugation be replaced as the most widely used method for isolating extracellular vesicles?

Ultracentrifugation is the most widely used method for isolating extracellular vesicles (EVs), despite the significant structural damage that EVs suffer during the process.

16 May 2024 9,432 1 View

How many brain slices do we need to obtain 1 mg of protein for biochemical studies?

We are currently interested in performing a stimulation protocol on brain slices in order to perform western blotting or immunoprecipitation. However, we do not know how many 300 um slices would...

12 May 2024 6,527 1 View

¿What control can I use to inhibit the proinflammatory effect of LPS in macrophagues?

I want to evaluate the polarization to M2 phenotype in Raw Cells, I co-cultured the cell with LPS and my drug to see the effect of the drug to inhibit the proinflammatory effect of LPS, but I have...

05 May 2024 8,525 0 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Why after performing site directed mutagenesis ,I don't see any colony after transformation?

I want to introduce a point mutation (change in one nucleotide) into my gene of interest (DNA binding domain) I have designed primers as recommended on the Data sheet of the kit : -Both primers...

05 August 2024 9,059 3 View

Can we convert a thousand of FASTA sequence in numeric form in .csv format? If yes kindly send me the script for the same?

I have a .text file for various FASTA sequence , and i want to convert these sequences into a numeric file which will be in .csv format. OR I want to extract physiochemical properties(features)...

25 July 2024 3,650 2 View

Why do women let men define their sexuality for them? Can you imagine men ever allowing women to define male sexuality?

Of the 82 percent of women who said they masturbated, 95 percent could orgasm easily and regularly, whenever they wanted. (Shere Hite)

23 July 2024 2,807 4 View

Mass spectra averaging algorithm?

I am now developing a python module for ms2 database searching, would like to realize a function that similar to what Xcalibur did, choose multiple mass spectra and get an averaged spectra. But...

22 July 2024 3,975 1 View

How can we identify (in silico) the interacting amino acid residues or the nucleotides involved in the Protein-Protein / Protein-RNA interaction?

Hello! everyone, I am trying to study in silico Protein-Protein and Protein-RNA interactions. Now, is there any tool with which I can identify the interacting amino acid residues or the...

14 July 2024 950 2 View

Deletion of Nucleotides in Phosphorylated Primer?

I was performing a site directed mutagenesis of a plasmid, the goal was to change/insert 4 nucleotides into the original plasmid using primers as seen in the image: Now I now there may be some...

04 July 2024 5,221 2 View

How to download fasta files in "gene features" format from NCBI with multiple samples in R?

Hello, I was attempting to download some sequences data from NCBI. The format "gene features" could help me extract certain gene fractions using the names, but I found that it seems to be...

04 July 2024 9,875 1 View

Hi once i have requested a full text whats next?

I requested the full text of the book “Landscapes Ways of Imagining the World” de John R. Stilgoe, I am new in this so i dont know what to do next or how will i obtain the permission

23 June 2024 1,855 1 View

Anybody with experience with whole plasmid sequencing? Should I worry about plasmid oligomers?

I've sent my plasmids for whole plasmid sequencing using the Oxford nanopore sequencing method. All of them were assembled as monomers, but in the reads (see attached pictures), there are even...

19 June 2024 1,280 5 View

Abhijeet Singh

Since you are already having fasta file, I would recommend you to simply have your reference + your fasta sequences, do a codon aware alignment or any alignment of your choice.

After that, use a manual editing in Aliview, this will be curated and manually checked. Sure, one can write script to do this, but that would need time and also would require to check if it really does as it should.

Its just a suggestion, if you find script, you can use it anyway.

https://ormbunkar.se/aliview/

Shannon Ormond

Hi Ana,

When merging single vcf files into a combined vcf you can set the genotype of a sample with no genotype to ref using VCFtools vcf-merge '-R' flag:

vcf-merge -R {0 for haploid} input1.vcf.gz input2.vcf.gz > merged_output.vcf

I have used this bash script to create a fasta from this output vcf:

for samp in input1 input2

printf '>'$samp'\n'

bcftools query -s $samp -f '[%TGT]' merged_output.vcf

printf '\n'

done > alignment.fa

I got this script from:

https://github.com/samtools/bcftools/issues/693

However, the output fasta will contain all nucleotides of the reference only if your vcf files have all positions, otherwise it will be an alignment of only variable sites. You'll need to make a vcf with all sites if you want it to be the same length and line up to the reference, but this may not be necessary depending on what you are doing.

Yuriy Babin

1. if you've got Python interpreter installed, go to command line and type:

pip install sequman

2. using command line navigate to the directory where your fasta files are and type:

python -c "from sequman.sequman import fill_gaps; fill_gaps()"

this should create new files with nucleotides inserted from the reference sequence. these files will have trailing '_gaps_filled' sentence added to their original names.

the original files will remain intact.

note that the reference sequence for each file should be placed at the top of the file as the first record.

Ana Valero Rello

Many thanks Yuriy,

I'll try this last option since it seems to perfectly fit my data. I'll give feedback with my results.

Cheers!

Hassan Badrane

This topic is pretty old, but just for those who may have same question and google to this topic:

I found this script by Stephan Kamrad, to be very useful:

https://github.com/Bahler-Lab/alignment-from-vcf

It's an old script, I believe from 2014, so it requires Python 2.x, in addition to pysam (I tried this release pysam-0.16.0.1-py27ha863e18_1), and biopython.

The good thing is that is deals very well with indels, which are included in the fasta output (so not just the SNPs, like some other scripts). The bad thing is that you have to specify a contig (you have to do it contig by contig) and you will get the whole contig sequence alignment not just the variable sites (but this latter could be preferable in some situations). The other thing that I wish it could do is to include the Ref sequence, but you can eventually make a fake VCF file for the Ref, which will have all 0, and merge it along with the samples.

As to the missing data (-) in the above question by Ana, the solution suggested by some participants above is enough to correct that. For example bcftools you can use --missing-to-ref option to fill those gaps, like what was described for vcftools above. Good luck...