Applying LSA on 500 pdf documents extracted from Google (for a certain feature), I got a low accuracy once I tried to infer the topic of new documents.
The fact that you used a phrase to select the 500 pdfs from Google does not mean that they contain material on the same topic, or even that they are homogeneous. The first 125 returns are more likely to contain your terms than the next 125. If you are using the 501th return from Google for testing, it is possible that the phrase only occurs once in the whole document, or the document is an outlier. Try regarding the first 125 pdfs as 'seen' documents and the next 125 as 'unseen'. Are all these pdf topics poorly classified?
When all else fails, doubt your implementation of the algorithm. Check that all the assumptions for the latent semantic analysis are not breached. See e.g. http://en.wikipedia.org/wiki/Latent_semantic_analysis and its references.