To answer the first question: The *haploid* genome size (1n) of h.sapiens is 3.3 GB (3.3E9 base pairs).
The genome size is alwas given as the total amount of DNA contained within one copy of a single genome (1n). The diploid (2n) human cell hat a DNA content of 6.6 pg. 1GB has a mass of 1pg (you can calculate it from the average molar weight of a base-pair, what is 660 g/mol). Thus, the diploid cell contains 6.6 GB, so the haploid genome must be 3.3 GB.
I can not answer the second question. I don't know how the two "sets" will be defined. I am not aware that "haplotypes" are sequenced, if you meant this. However, polymorphic regions will be sequenced as such, i.e. if there are different alleles having differing sequences at a particular position, then this position is a polymorphic site, and the result (which bases are there) will be ambiguous.
To answer the first question: The *haploid* genome size (1n) of h.sapiens is 3.3 GB (3.3E9 base pairs).
The genome size is alwas given as the total amount of DNA contained within one copy of a single genome (1n). The diploid (2n) human cell hat a DNA content of 6.6 pg. 1GB has a mass of 1pg (you can calculate it from the average molar weight of a base-pair, what is 660 g/mol). Thus, the diploid cell contains 6.6 GB, so the haploid genome must be 3.3 GB.
I can not answer the second question. I don't know how the two "sets" will be defined. I am not aware that "haplotypes" are sequenced, if you meant this. However, polymorphic regions will be sequenced as such, i.e. if there are different alleles having differing sequences at a particular position, then this position is a polymorphic site, and the result (which bases are there) will be ambiguous.
The reference sequence for the Human genome does not represent any single individual human. It was sequenced across all chromosomes from several individuals (at least as many as 13 if I remember correctly). However, it is always represented in databases as the haploid content to avoid unnecessary duplication of content.
In regions of high variability, it can be linked to other databases of human mapped variation (e.g. dbSNP) to overlay the known variability in a gene or sequence region.
The very first release of the public consortium Human genome was produced primarily from shotgun sequencing. Literally busting up the genomic content of a cell into many smaller fragments, sequencing them, and then assembling them back into their proper order and orientation based on overlapping end regions. That was the single most amazing aspect of the first published human reference genome, as much of that final finished assembly was done by direct human editing.
However, to answer your question, yes, the entire cellular genomic content was sequenced. The representation is only for the haploid genome, and also represents consensus regions across the multiple individuals. The whole intent was to provide a general human reference genome, not a unique individual's specific sequence. Since the data is accessed via various database systems, the actual coding orientation and strandedness of a given gene can easily be added to the basic sequence information.
Be aware now too, that there are in fact many human genomes in existence now, as projects since the first human reference have worked to develop detailed databases of variability in the genome across ethnicity, across disease states, across cell types and organs as well as over time (developmental stages and life states, as well as over historical time by sequencing individuals from times long past).
Concerning the Human Genome Project sequence, the approach was BAC based, with mainly two distinct BAC libraries sources : CalTech (B, C and D segments) and RPCI-11 ones. Each of these libraries were made with DNA from a single donor from an anonymized collection (see http://bacpac.chori.org/library.php?id=7 for RPCI-11, and https://www.ncbi.nlm.nih.gov/clone/library/genomic/12/ , https://www.ncbi.nlm.nih.gov/clone/library/genomic/13/ , https://www.ncbi.nlm.nih.gov/clone/library/genomic/14/ , https://www.ncbi.nlm.nih.gov/clone/library/genomic/15/ and https://www.ncbi.nlm.nih.gov/clone/library/genomic/16/ for details)
BACs were shotguned and assembled separately before merging into a global tilling path, and other non-overlapping BAC clones were used to confirm assembly by comparing restriction maps.
Some chromosomal regions (for example, TCR or Ig loci) were previously sequenced with YAC approches, and these parts directly incorporated in the final tilling path.
A single allele of each region is present in the reference sequence, with local arbitrary selection of representative sequence, hence there is no guaranty of any homogeneity in the representation of the reference sequence.
For repetitive regions, when possible, size estimation and/or ranges of repeats number were given.
The genome size assumed when human is used as a standard for flow cytometry is 3.5 pg or 3.423 Gbp. The assembled 1C (haploid) female genome from Human Genome Assembly GRCh38.p12 contains 3,031,042,417 bp plus an unknown amount of unassembled repeats.
The amount of unassembled repeats almost certainly varies, and also varies to an unknown extent between individuals.
So what is the average? A commonly repeated estimate of the unsequenced repeat amount is 4%. If so, the average complete genome is 3.152 Gbp. The estimates of 3.3 Gb suggest twice that amount of unsequenced repeats, while the size estimated by fluorescence methods suggest 3 times that amount of unsequenced repeats.
Which is correct? We can expect that each of these will turn out to be correct for someone in the world. The assembled 1C genome for an 45 X0 (Turner) individual would be 2.95 Gbp, while 1C for a 47 XXX (trisomy X) female would assemble as 3.109 Gbp. The take home? We may eventually be able to report an average genome size for man. However the range will be relatively large.