What are the basic computing skills needed to learn Next-Generation sequencing?

More Hasnahana Chetia's questions See All

How does one interpret polynomial quadratic regression of two variables ?

We have two variables X and Y. Analysis of different regression models shows that polynomial quadratic equation provides the best fit for Y with respect to X. Say, the equation is y = -0.0055x²...

01 February 2019 5,103 8 View

How to carry out quality control using Trimmomatic on SRA reads of SAGE-RNASeq?

I have carried out a FASTQC analysis on raw reads of SAGE-RNASeq experiment from SRA database. Numerous overrepresented sequences have turned up pertaining to Illumina single end adapters and...

11 December 2016 8,331 0 View

Improving RAM usage of GNU parallel blast?

Running a GNU parallel blast uses only ~2% of my memory per core. My system has a quad-core 24 GB RAM. 1.Can I affect the speed of blast process and reduce the time of run by alloting more memory...

08 September 2016 8,413 2 View

How can I use a proxy with authentication details for R or Rstudio?

I am using R 3.2 and Rstudio in Windows 7. Since, I am using a proxy internet connection, I always get the 407 proxy authentication error. I have tried setting environment variables with my proxy...

01 February 2016 6,689 2 View

Computational requirements to assemble a small mammalian genome (~600 Mb)?

Please provide a minimal advisable requirement of RAM and CPU Cores. Also, provide the time (in hours or days) required per such assembly, if experienced.

01 February 2016 8,691 9 View

Is it possible to predict long non-coding RNAs without a reference genome?

Is there a pipeline for such analysis?

01 February 2016 2,594 2 View

What are the most intriguing questions in the field of non-coding RNA studies?

The field of non-coding RNA research is one of the quickly emerging areas in the world of science. Few years ago, we had just tRNA and rRNA; then suddenly we had ribozymes and now we have the...

09 October 2015 7,228 1 View

How do grid cells get the message that the individual has ventured into a new location in space?

How does a grid cells get the information?

01 February 2015 5,525 4 View

How can we add chemical residues to the pdb structure of an RNA molecule in In silico mode?

I've to add a methyl group to the C-5 of a Uracil base in the 3D structure of an RNA molecule. Please suggest tools to do so.

06 July 2014 2,435 4 View

Can anyone help me set up a high performing cluster running on Ubuntu Linux in my laboratory with 128 Gb RAM, 32 Core processor and 1 TB memory?

I have two Quad core HP Z400 workstations and two quad core Acer Veriton series with i3 processor.

04 May 2014 5,973 16 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

How to confirm the site-directed mutagenesis result without performing NGS?

I'm cloning a fragment of 3200 nts into plasmid. The cloning was successful, however, 02 amino acids were mutated. Now I want to fix these 02 aa by site-directed mutagenesis technique using...

08 August 2024 4,645 2 View

Weak DAPI staining after immunohistochemistry - how to improve?

After immunohistochemistry of previously fixed in PFA and EtOH and then frozen 20 μm sections of zebrafish brain, DAPI staining is very weak (right) compared to the same sections stained without...

05 August 2024 9,637 2 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Does anyone have issues using Prepman Ultra reagent for MicroSeq ID bacterial, fungal and yeast sample preparation?

I have been attempting to extract DNA from Bacterial, Fungal and Yeast banked samples (>1e7 cells) using Prepman Ultra reagent and I seem to be struggling to obtain a sequence. Although the...

01 August 2024 2,079 0 View

CAD File of human's & rat's respiratory airways ?

Dear all, I am working on particle deposition in human's & rat's respiratory airways using CFD and I am looking for the 3D CAD file for my simulations (STEP or IGES format). If somone has such...

29 July 2024 1,092 2 View

I am working on my Master's thesis on the biogeography of the genus Ruagea and I would like to ask, could someone help me to check whether my result?

I created a file with my outgroup and ingroup species using Beauti, ran it in BEAST, viewed it in Tracer, and then used TreeAnnotator to create a file that I imported into RASP. Could someone...

28 July 2024 2,979 1 View

Could you try using PeptiCloud and see if it's a useful tool for biology research?

PeptiCloud (www.pepticloud.com) is a bioinformatic platform that allows researchers to organize and share their data for their projects as well as collaborate with others in one place. Through...

28 July 2024 4,762 2 View

Do you know of any online international conferences that offer free discussions?

Do you know of any online international conferences that offer free discussions? I am looking for examples in the field of molecular ecology and DIY biology.

28 July 2024 6,501 0 View

Should the amount of DNA input used for ChIP-seq library preparation be matched between the control and experimental groups?

Hi all. As a beginner in ChIP-seq experiments, I hope you understand that the following questions might be somewhat basic. I am planning to perform ChIP-seq or MeDIP-seq analysis to investigate...

28 July 2024 6,938 1 View

Michael B Black Popular answer

I do not think learning C or C++ is necessary at all. You should be familiar with basic command line UNIX/LINUX, and knowing some basic shell scripting, or PERL, and/or PYTHON is certainly useful. One of the key areas to learn is file manipulation - parsing, searching and extracting data from files, converting file output formats for downstream analyses - all this can be done using simple shell or PERL/PYTHON scripts to manipulate files and this sort of thing comes up a lot as you work through a complete analysis pipeline.

R is strongly recommended as there are a host of tools available in R and you would be severely limiting your options if you cannot work with R.

Also, look into what software your Institution may have site licensed - Matlab, SAS (and/or JMP Genomics) and so forth can be very useful for analyses, but such statistics tools have their own scripting languages and syntax. But, if such tools are available to you under some sort of site license, then often that comes with the opportunity for professional instruction in those tools from the vendors themselves.

Also, you may have access to fully GUI tools like Partek or Agilent's NGS software packages. These require no command line knowledge to use, but they are also not simple nor simplistic tools and can have a steep learning curve as well. Again, if your Institution does site license such software, you should have access to professional training from the company's support people.

If you are carrying the analysis all the way through from raw reads to final, interpreted results, then be sure to not neglect learning the statistics of NGS data analysis. It is not enough to just be able to run code, you have to understand the analyses that the code is implementing as well so you can make informed choices of which tools to use.

Marc A van Driel

Hi,

You can start with some Linux, Perl/Python, and R. And I suggest this paper too: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000589

Best,

Marc

Michael B Black

Michael Wendl

Agree with Michael Black. There are numerous "bread and butter" tasks (parsing, format conversion, etc) that are usually project-specific and consequently that the individual winds up doing herself, so good scripting skills (e.g. Perl or Python) are an absolute must. The extremely sophisticated tools that are widely applicable (like for alignment, detection, etc) are written by experts in low-level, compiled languages. These areas area pretty far beyond the skill set of the average analyst and are more deeply set into computer science. So again, "power-user" skills are really all that is needed here: knowing the tools via their documentation and being able to string them together into a useful "pipeline" using command-line skills and a scripting language. The second major requirement is some level of understanding of statistics, especially hypothesis testing, because again, scientific hypotheses are project-dependent and you will find lots of ready-made tools that you will have to decide how to use and how to interpret their results. For economies of scale, a language like Perl also has all of the standard statistical infrastructure available as libraries, so you don't really have to learn lots of different task-specific languages. Hope that's helpful.

Hasnahana Chetia

Thank you for the information. I know there is no shortcut to knowledge, but me having no background in programming is really affecting my work-progress. Hopefully, I'll get a headstart with the tips you have given.

Ethel Angus

Thank you! I fo0und this very usefull.

Tushar Tomar

I agree with previous replies that C+/++ wont help you. Best option as per my experience would be to go for R. If you don´t go for any basic R course, then just learn from internet (very nice you tube videos) and then after basics you can easily go for specific packages in R as per your need and you will have enough command lines and help through web.

Good luck for this

Jenn Tan

Here are some free software tools that you can use. These point and click software that don't require any programming knowledge. Let me know if you find these useful.

http://sw.ezbiocloud.net/

Martin Forde

C/C++ can be quite useful actually, especially if your concerned about memory allocation. Typically, people say perl / python / R because that's what other people use, and because they are conceptually easier (R is probably an exception to this rule) to use. In that sense, yes they are useful to know so that you can understand what other people are doing. But when it comes down to it, most NGS analysis programs are written with C/C++ and distributed as executables (e.g., blast, blowtie, trinity, etc). So what's really important, IMHO, is that you understand shell (e.g., bash). Once you understand how your shell works, you can use whatever programming language you like to invoke commands. Actually, I bet you can perform most if not all of these analyses with just shell scripting.