It all depends on your budget. As much RAM as you can afford and two sets of disks: one set of expensive disks as fast as you can afford (for calculations) and another set of cheap disks as large as possible for storage.
It all depends on your budget. As much RAM as you can afford and two sets of disks: one set of expensive disks as fast as you can afford (for calculations) and another set of cheap disks as large as possible for storage.
It's for a grant proposal, so as long as we don't go overboard we can price up the best machine. Do you know if it would it have to be a Mac or are there PCs that will do the job?
I would strongly recommend linux. It's really the only serious way to go about it, although I am starting to wonder what you mean by 'analysis'. You could go for one of the highest end Dell non-rack mounted servers, such as the PowerEdge T420. They can be specified in the way I describe; a budget in the region of $10K would not be unreasonable (not taking into account academic discounts). There are plenty of other good brands as well, but avoid Viglen like the plague for this sort of thing (UK company).
It really do depends what you mean by "analysis" and "metagenomics".
If this is just single gene metagenomics, such as is common in Microbial Ecology studies, a good high end work station under Unix such as the one proposed above would do the job just fine. The one we use for this type of analyses is an HP Z820 with 16Gb of RAM, 2 2To HD.
If you are meaning analysis of environmental metagenomics, either from gDNA or from cDNA, this will highly depend on the size of the data set you are expecting.
Raquel Tobes show us a good question. Amazon sell cloud computing, for example. It's very nice! But i don't think that is too cheap... Here in Brazil, researchers save money together to buy 1 cloud computer and work with different purposes.(Research in BR sometimes is complicated). Correct if i'm wrong ok Raquel?
1 - I think if you will work a lot with metagenomics is better to have your own computer.
2 - If you will do some analysis for a project for example, i suggest a Cloud Computer like Mr. Raquel said.
We are not talking about Supercomputing, are we? (It's a different point)
We are working on yeast genome sequencing/assembly and RNA-seq data. We work also on bacteria. We have a computer (linux ubuntu) with 24Gb RAM and 12 processors. This is very good but if you work on larger genomes (for example drosophyla or caenorabditis) probably 48Gb RAM is better.
454 has long reads, but max 1milion. Ilumina has short reads but high amount (upt to 300mil). So it depends what you want to study or sequence. For transcriptomics I would recommend Ilumina and for amplicon sequencing (i.e. microbial composition studies) 454 is OK and gives sufficient amount of reads. Both platforms are able to do barcoded sequencing, so you can sequence multiple samples per one RUN.
For analysing metagenomic data:
1. for amplicon sequencing I realy recomend mothur (actual version 1.28). works under Windows, if you do not like Linux and scripts. Under Linux QIIME is the best option. However, you would need IT specialist to properly install QIIME to your computer.
2. If you plan to do denoising of your reads, which is especially for 454 sequencing necesary. mothur works much faster on personal computer compared to QIIME pyronoise srcipt. However, if you plan GS FLX sequencing or Ilumina, computing on cluster is necessary. Imagine, we assembled 1mil reads using 8 processor comp. and it took 1month! When we used 600 proc. cluster, we have results in 20min! Computing on cluster save you much time!
CLOTU works with fasta files only, you can not perform quality filetring and denoising. However, CLOTU is also alternative. Choise depends on your taste, bioinformatic skills and scripting abilities. Often you need to do very specific script, which is not possible with multi-user online programs. Also Galaxy is an option...I would recommedn CLOTU for amplicon sequencing for example.
I also believe linux is better, but if not available, just for information: 12 GB RAM, 64-bit OS, 2X2.7GHz processors and 1TB hard drive also works. We are using CLC Genomics Workbench 5.5.1 for both Illumina and 454 data.
I mostly do de novo assembly of metagenomic Illumina HiSeq data, usually using an entire HiSeq lane (a recent dataset consisted of 175M read pairs, with maximum 101 bp read length after quality filtering). With this amount of data I need ~200 GB ram (sometimes more) when using Velvet for assembly. Having said that it would depend on the complexity of the metagenome (i.e. number of unique k-mers). I would recommend Future Grid (https://portal.futuregrid.org) - it is an HPC cluster that is free to academic researchers. They have one cluster called delta that has nodes with 12 cores and 192 GB ram each, which I've used a lot for assembly. For less memory-intensive tasks like read filtering we have a 2010 Mac Pro 12-core in the lab with 16 GB ram, which has been pretty good. If I built the system over again I might replace Mac OS X with linux, but OS X is alright overall - sometimes a bit of tinkering is required to make things work.
I think we'll end up going for illumina sequencing and will have to go from raw sequences up to getting them ready for depositing into databases. I think the metagenome will be fairly complex (it will be from an aerobic digester) but at the moment i don't have a great deal of information about exactly how.
I've been thinking about using QIIME through iPlant and i have some Linux experience but none of our IT department do. Would it be a better idea to run things like assembly through a computing service provider and then get a machine to do the post-assembly processing locally? Would this make a difference to what i'd need?
If you do 454 this wil be more computationally intensive as denoising the signal may take weeks on a 20-40 core computer.
Illumina does not need denoising so possibly it's less processor-intensive, but since the amount of reads is bigger, you'll need more RAM. I would advice on a half Tb of RAM to play safe...
Linux based server will be a better choice. You can also look for Dell Precision or HP Z820 series of workstations for the same. RAM 256 GB and GPU based systems with Titan X would also be a better choice. Dual 14 core CPU (28 Cores/56 threads) will be perfect.
OK. I am on my way to buy another one too but as laptop (I am consultant in BioIT so I need flexibility).
For right now I am mainly using a 3 years old MSI Apache Pro GE72 6QD (i7-6700) upgraded to 32 Gb DDR4 2166hz and an additionnal SSD Sata-2. Runingh in dual boot (Ubuntu +/ Windows).
That's enough for developping code and testing (what best than "Gamers"). It allows me to load on SSD large BAM file data extract (samtools view) + the full positions (Chr/Posin/Posout/name-ref) hg19/hg38 of all exons (PRT and RNA) as perl hash in RAM (8 Gb).
Also in paralellization it stucks at 3 instances.
So I am looking forward a MSI Titan with the following criteria: Hexacore (e.g. I9-8950), 2 x 512 Gb SSD + 1 T HD 7200, Min 64 Gb RAM 2,666 (+ archiving external HD...so about 5000 Euros)
I would thank all the people that provided suggestions regarding this topic. I would only add that after accurate analysis we acquired a DELL Poweredge T630 with 250Gb of RAM, 32 CPUs and 15Tb of hard disk. Using this server we are able to manage all the analyses we are performing. The most demanding in terms of RAM are metagenomics assembly (we can easily arrive at 100-200Gb of RAM) and taxonomic analyses based on multiple proteins alignment (up to tens of thousand proteins). In terms of computational requests the most demanding is alignemt of short DNA reads on reference DB. The cost was the workstation was approximately 7000 Euros.
Would you say that an 8-core and 64/128 GB RAM laptop (Lenovo P1/P53) would suffice for microbiome/metagenomics analyses (e.g. sequence alignments of 100 genomes, etc.)? Or it would be better to get a desktop workstation (Dell, Mac Pro) with more cores/more RAM?
It always depends upon data however, only a few processes are memory hungry. You can have a rough estimate 1GB of ram per million reads for mapping and assembly process. A minimum of 4 cores processor is required more the cores less the time will be taken for computation. As recommended above you need two hard disks 1 SSD for main computational works and 1 less expensive for storage. Working on human data like whole exom an 8 core processor with 32 GB ram is enough. Scientists working on Plants need varying spec as Plant genomes have much variations and bigger genomes and transcriptomes when we discuss the size but still minimum 8 cores and 32 GB ram will be needed higher the specs will be better. Microbial data will require a bit less memory and processing power as the genomes size are much smaller.