Lets say, I have decomposed a matrix (of terms and documents) svd = UXV where U and V are orthogonal matrices. I am not sure how I can interpret this in a scatter plot. Explanations provided in terms of 2 dimensions are highly appreciated.
Really interesting question, although I'm not sure that the answer can really be limited to a 2D scatterplot. The dimensionality of the output of an SVD calculation is dependent upon the dimensionality of the input. My experience is limited to 3D data where spatial coordinates are input into an SVD and the result is a set of transformations that are inherently spatial as they relate directly to the spatial transformations of the coordinates. In this case, the visualization is directly related to the spatial context of the SVD operation.
In your case, I would suggest that the best visualization would depend upon the dimensionality of your data, and the context in which the SVD is acting. For example, the SVD might be interpreted as "aligning" or "mapping" your data preferentially along the singular values. Therefore, it might be worth investigating the nature of the singular values and the relationship of your data to them.
For me, performing SVD on a matrix of data is very close to Principal component analysis (PCA) well known in statistics. As far as I understand, you have a data matrix X in which perhaps "documents" are the rows and "columns" are the terms. Perhaps an element of the matrix is the occurrence of the terms in the document.
In such a situation, if you carry out a SVD on centred X, you will have, in S, a matrix which is proportional to the scores of PCA, and, in D, the matrix of the PCA loadings (or eigenvectors of X'X).
By plotting the columns of X (for example column 1 and 2) you will have a map, in which each point is a document. Documents which are close on the map have some kind of similarity. There are no particular reasons to imagine that the rank of X is 2. It is relevant and interesting to look at others pairs of columns in S.
In the same way, the columns D can be plotted . A similarity between the terms can be seen as a proximity on this new map. There are some details on the standardization of S and V that I have not explained here. Please consult some statistics book dealing with PCA.
By the way, if X is really a matrix of occurrence, I would suggest to test a variant of PCA called "correspondence analysis" adapted to such kind of data.
when SVD gives you reduced dimensions, project it to original matrix and you will get the reduced matrix , then you can use cosine similarity to visualize the documents similairty.