R does have limitations. Currently the compilation uses libraries that are constrained to 32-bit integers. This means that some indeces and vectors are limited to the 32-bit (4G) limit. It is possible to find that some object (dataframe) "runs out of space" even when running R on a powerful large-memory computer.
There are ways around this, as well as packages that create only meta-objects in memory and use HDF5 or NetCDF file storage for very large objects (GenABEL, SNPrelate are examples). In addition there generic packages bigmemory and ff that can in some instances provide workarounds for the 32-bit integer limitation.
This is not to say that R isn't a wonderful system, just to be clear that there are limitations.
sorry, i don't get your question very well but R has no problem in handling large data-sets per-se; i mean {assuming you're running R on a x64 *nix OS} you don't need to worry about the size of your data as long as you've got enough RAM/VMEM for R to work with your data;
R does have limitations. Currently the compilation uses libraries that are constrained to 32-bit integers. This means that some indeces and vectors are limited to the 32-bit (4G) limit. It is possible to find that some object (dataframe) "runs out of space" even when running R on a powerful large-memory computer.
There are ways around this, as well as packages that create only meta-objects in memory and use HDF5 or NetCDF file storage for very large objects (GenABEL, SNPrelate are examples). In addition there generic packages bigmemory and ff that can in some instances provide workarounds for the 32-bit integer limitation.
This is not to say that R isn't a wonderful system, just to be clear that there are limitations.
mr @Tromp; current versions of R don't have these limitations i think; also many of these limitations are system specific, nonetheless one can try ?"Memory-limits" in the R REPL to find out more.
You are correct, I just checked. I used to run into the problem of objects exceeding vector limits all the time for the reason stated above, and now run into a very similar problem in that data sets exceed RAM. I simply assumed it was due to the same root cause.
Nevertheless, the memory mapping solutions given above are useful when the data sets exceed memory storage. I currently run problems that don't fit into 256 GB RAM; one can throw more memory at the problem, but this can be quite costly.
Gerard Tromp's answer covers the size limitations of R pretty well. I only want to add that, if need be, there are packages on CRAN that wraps a data frame and removes the limitations. Check out CRAN guide to High Performance Computing in R.
R is not good choice when it comes to working with true large-scale data (multi Gb) even if you quite powerful computer with decent memory. In this case it make sense to consider Hive on HDFS...