I meant which activities should be taken in order to normalize social media texts, such as: error corrections, transforming abbreviations into their correct long forms, translation of shortcuts such as "ICY" to "I see you".
I will be happy to receive full and detailed answers (links to helpful corpora are also welcome).
I confess that I do not see what you are trying to do so I have some trouble with the question still.
Errors in typing are universal. Some are consistent and may be useful in author identification. Similarly the use of some abbreviations as the 'ICY' you mentioned. So if it is author identification that interests you correcting these maight be counter productive.
If you are looking at the frequency and variety of misspellings or use of abbreviations again altering these seems counter productive.
If you are looking at (say) word variety it is a matter of personal choice whether to remove misspellings and abbreviations provided that these make only a small part of the text. There is a whole branch of statistical theory devoted to the treatment of such matters.
A reasonable alternative would be first to make a list of all the words in the text, locate the misspellings and abbreviations and replace them in the text.
The last two suggestions could be both carried out and the results compared.
If is very hard to make any recommendations without knowing what the data is to be use for.
Concerning texts: I believe that there are some aggregated twitter feeds on the next suitable for study. If not it should be fairly simple to collect the tweets with a simple programme and aggregate them yourself.