I'm looking at this problem of trying to identify users asking a question we already have an answer for, in a way that we are not accounting for. For example, I might have a response for a question like "what is the population of brazil?", but the user might ask "how many people live in brazil?" or "what's the number of inhabitants in brazil?". I would like to be able to classify those questions as equivalent.
Any suggestions on some papers, research material I should look into, or maybe hints on what things I should try?