Generally, it depends on what role the measure plays in the whole system. In the most typical architecture, where it is used to compare a question to possibly relevant passages, the problem is that they are most likely going to be several times longer than the question. It may affect results, as cosine similarity uses vectors of equal length to represent both elements. An alternative may be Jaccard similarity, which has been designed to measure similarity of sets, ignoring their sizes. What is more, you can easily extend it by taking into account weights of matched elements (e.g. co-occurrence of word "Toronto" is more informative than "to"). Other possible direction would be to concentrate on words of question only, for example by counting what percentage of them occur in tested passage.
However, despite these problems, the cosine similarity is in fact commonly used in QA systems. To definitely judge which one is better in this application one would need to test different measures in a single system - I haven't seen such experiments.
Piotr you may find some related experiments in few of my papers, in particular "Distributed representations for Semantic Matching in non-factoid Question Answering". Hope it can be interesting for you.
To answer the original question: yes cosine similarity can be a suitable measure, but it all depends on how the vectors you calculate the similarity of are constructed. Moreover, for a high accuracy QA system several different measures should be combined.
Conference Paper Exploiting Distributional Semantic Models in Question Answering
Article Playing with knowledge: A virtual player for “Who Wants to B...
Conference Paper Distributed Representations for Semantic Matching in non-fac...
I suggest you to use LSA based text similarity. LSA find text similarity beyond lexical similarity. We used different similarity measure to test text to text similarity among them LSA was the most efficient tool.
Check the following URL for SEMILAR APLI : http://www.semanticsimilarity.org/
Cosine similarity can not detect semantic similarity.
I agree with Safwan. If lexical similarity is enough for your purposes then Cosine is OK, but if you need to take into account semantics Cosine is not useful. At my lab we did some work on semantic similarity using an extension of Jaccard measure. Obviously, to deal with semantics you need a semantic relational representation of the words, for instance a semantic network representation capturing that "user" is a subset of "human".