The simple and clear answer is yes. We have applied LSA as a pre-processing step in automatic patent application and patent analysis. Of course you need also a good classification method for your features. Unfortunallly our publications on this matter are in german language.
very shortly and generally the LSA is the method to reduce feature space and in this way, for example, the semantics group of documentc could be created. Suppose you have documents group D_Informatic that describe the computers, data base etc. and another document group D_animal that describes the dogs, cats etc. For the D_Informatic you could have the features such as myslq, postrgresql, ibm, iphone. For the D_animal you could have the features such as persian_cat, siamese, Bulldog etc. The LSA for these data set could create, for example, the two new feature (f1 and f2) that separated these two data set and in this way you receive a ''semantic'', i.e. the feature f1 code D_Informatic and f2 code D_animal (it's kind of generalization process). First you must define what's that mean a "semantic extraction". The LSA don not gave you explicite, for exaple, that postrgresql is a database etc.
@Volker Where can I find your papers? I would be interested in your papers, I can read German without problems.
Generally there is something there in distributional semantics and statistical methods like latent semantic analysis, latent Dirichlet analysis, or the co-occurence method Christian Wartena and I devised (see e.g. http://www.researchgate.net/profile/Christian_Wartena/publication/221466114_Instanced-Based_Mapping_between_Thesauri_and_Folksonomies/links/00b495187d1050d50c000000.pdf sorry about the stupid typo in the title :-( or even older work with Anjo Anjewierden, Robert de Hoog, Lilia Effimova and myself https://www.researchgate.net/publication/228341304_Detecting_knowledge_flows_in_weblogs)
Generally speaking what you pick up is the fact that words tend to correlate if they are semantically related and these correlations can be used as a proxy for the semantics themselves. You do have to realise that the proxy is rather crude: it is based on a bag of word model of language, heavily depends on what documents you put in to learn the correlations (and therefore tends to do rather better when the documents have a clear focus) and finally are not based on any world knowledge.
Conference Paper Instanced-Based Mapping between Thesauri and Folksonomies
Yes, Latent Semantic Analysis can be used semantic representations from large sets of text. Generally, you can create representations of individual words, or larger units of text (e.g., sentences, paragraphs, or whole documents). Typically one compares the vector representations of one unit text to another (e.g., similarity of "dog" to "cat" or of a document to a query, or of one document to another. One place to get started if you want more of a feel of what you can do is at the website LSA.colorado.edu. It allows you to experiment with several semantic spaces. You might also look at Landauer, Foltz & Laham 1998 "An introduction to Latent Semantic Analysis" for some of the basics.
As others have mentioned, there are considerations, such as what documents or corpus you use to train the system that can affect the representation and that not all aspects of semantics are accurately represented in an LSA analysis. Your choice of tools depends on what you need to accomplish.
Article An Introduction to Latent Semantic Analysis
The Term Document matrix {X} has decomposed using SVD as follows:
{X}={W}{S}{P}'
After this these decomposed matrices have been reduced and the product of these reduced matrices has been calculated as {X`}. Then the similarity between two terms have been calculated using correlation between the vectors of terms from this matrix.
I have a question: Can we calculate term similarity only from {W} matrix after reduction? Is {W} matrix also give similar type of relation between terms as {X}?