The reference genome is the (so far) known human genome. That includes sequences, but also the transcriptome, which means i.e. gene isoforms. But of course much more.
hg38 just came out, but remember that we know more about hg19 than hg38 ATM. The difference between the two is not much, but Ensembl is more "transcript" focused, and thus have more gene transcripts annotated.
You might also be interested in these versions of the two mentioned, they are prepared for the aligner Tophat and Cufflinks:
This is a good question. I think that to address your point one would need to known which sequence would be the most likely ancestral sequence at each variant location, a type of information we do not have. The current reference human genome is a representative compilation from a number (I think thirteen) of volunteers, but each of these individuals would have variants when matched against the reference. Your point becomes particularly critical when comparing genes or genomic regions that are highly divergent among individuals, such as the major histocompatibility complex.
You successfully touched the point Dr. Albino ... if a gene has 5 alleles for example what makes a certain allele a reference ? is it its ancestry ,or the time it has been sequenced?
I think you got your answer but I add these short explanation for you for sure. The common databases that provided reference genome of human or any other species includes:
Ensemble genome browser : http://www.ensembl.org/index.html (use of this database is quite simple, you can search your gene name after selecting your target species)
UCSC genome browser : http://genome.ucsc.edu/ (like the previous you can use it)
NCBI genom browser : http://www.ncbi.nlm.nih.gov/genome
These databases are really worth, and could be helpful very much.
A reference genome is a known genome that may be used as a reference when we need to identify a newly sequenced unknown genome (but with some mutations).
If we cluster the genomes by considering the sequence identity/ similarity of them we could also find a reference genome for a particular cluster with some variance between cluster members.
If reference genome was derived from 13 people, how was the base choosen at a particular position because all this 13 persons will have variations at a loci.
So which base was selected in reference genome and on what basis??