what's the best way to test the effectiveness of a new short text similarity measure for the Arabic language? especially that there is no corpus dedicated for this task
You need to use a couple of Machine Translation tools to produce so called Candidate Translation equivalents of your short text. Your short text should be treated as a Reference Translation. Then you can test your measure or compare it to BLUE, METEOR or NIST.
sorry, but i don't understand way should i use a machine translation tools; because what i try to do is calculating the similarity between a pair of short text writtren with the same language. for example what the degree of similarity between "united state president" and "Barak obama" even if they don't share any terms but the degree of similarity is too high
What factors are measured - what each of your varieties represent?
What research goal do you plan to achieve? E.g. compare with some other measurements to prove that for Arabic it works better or worse and to what degree?
there is some measure like the cosine, jaccard and overlap and they are based on the common words between texst for computing the similarity...and since the short text do not provide enough contextual information we have developped a new method for computing the similarity between a pair of short text..and i ask whats the best factor we have to use in order to test the effectiveness of our method (for example for text categorization we use f1-measure which combine precision and recall to test the effectiveness of a text categorization system)
The example of the US President is rather more like Question Answering when your system works on a standalone database from which the right information snippet is extracted. It does not have much to do with similarity because the name of the President depends upon the tenure. For F1-score you need to provide a number of the system responses that should have been returned, I wonder how you do that. Although I know nothing about the concept of your measure, I would find some other measures applied to short texts only for comaparison which is always the best way to show how better it is from those well-known measures in science, of course not cosine similarity, Jaccard coefficient, mapping etc.