If the human genome were random it would be easy to do this with probability. In fact, if the human genome were random, a 17 bp read would be unique. But, the genome isn't random; it contains many duplicated genes, repetitive elements and simple sequence repeats. Approximately 45% of the genome is repeat sequence.
I've linked an interesting BMC Bioinformatics article that shows 96-98% of 100 bp reads are unique (depending on how you look at the data) while about 98.5% of 500 bp reads are unique.
The best way is to write a quick script and get an empirical measure. I did this years back for much smaller sizes of k with the hg18 Human genome. Essentially, over 50 nt or so, the great majority of kmers are unique with most of the non-unique stuff being repeats. Uniqueness slowly approaches the 100% mark in a rather asymptotic fashion.
Expect 500 bp reads to uniquely place far more Alu and LINE repeats and satellite DNA.
500 bp reads would be much nicer for non-model organism sequencing and joining contigs.
Assuming random generating reads, the possiblity to get a same sequence for a 500bp read is 1/4^500, while that for 100bp read is 1/4^100. Even if in real situation, the difference in possiblity for both conditions is very large.
If the human genome were random it would be easy to do this with probability. In fact, if the human genome were random, a 17 bp read would be unique. But, the genome isn't random; it contains many duplicated genes, repetitive elements and simple sequence repeats. Approximately 45% of the genome is repeat sequence.
I've linked an interesting BMC Bioinformatics article that shows 96-98% of 100 bp reads are unique (depending on how you look at the data) while about 98.5% of 500 bp reads are unique.
From the paper that Paul cited, 200bp reads appear to provide the best "bang for the alignment buck," with a long tail afterwards. With 1000bp reads you can uniquely identify 99.5% of the genome. If the extrapolation is valid, that number increases to 99.8% with 10k bp reads.
If longer read lengths followed same curve that sequencing costs have, the genome would be completely mapped by now. Until then, my take home is that bioinformaticians have job security for many years to come.
Not many bioinformaticians are concerned with longer contigs or finishing genomes. Typically those that work in genome centres, or those working on non-model organisms.
But, I agree, bioinformaticians will have good job security for a while yet. While some solid bioinformatics skills and also scripting language knowledge will soon become a standard bit of the toolkit for most wet lab genomics PhD students and postdocs, the bioinformatician will continue to serve as the specialist. A bit like a statistician, really. Plenty of people can do some stats, but you need a statistician for the challenging stuff.