Standard datasets include WordSim-353 (see Agirre et al. 2009), SimLex-999 (Hill et al. 2016) and Chiarello et al. (1990). If you're interested in paraphrases including longer phrases, I'd also look at the Penn Paraphrase Database (PPDB, Ganitkevitch et al. 2013):
Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca, and Aitor Soroa (2009), A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of HLT-NAACL 2009. Boulder, CO, 19–27.
Christine Chiarello, Curt Burgess, Lorie Richards, and Alma Pollock (1990), Semantic and associative priming in the cerebral hemispheres: Some words do, some words don’t sometimes, some places. Brain and language 38(1):75–104.
Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch (2013), PPDB: The Paraphrase Database. In: Proceedings of NAACL-HLT 2013. Atlanta, GA, 758–764.
Felix Hill, Roi Reichart, and Anna Korhonen (2016), Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics 41(4).