What are the recommended normalization activities of Social Media Texts such as tweets?

More Yaakov HaCohen-Kerner's questions See All

Can you recommend on papers that are related to what features are regarded as over fitted for classification tasks?

Can you recommend on papers that are related to what features are regarded as over fitted (e.g., unigrams below a certain threshold) for classification tasks?

01 February 2017 733 0 View

What are the influences of of the proportion between the size of the corpora & the # of features on various ML methods?

What are the influences of of the proportion between the size of the corpora (both positive & negative examples and learning and testing corpus) and the number of features on the...

11 December 2016 5,623 0 View

Examples of burials of amputated limbs (apart from hospital contexts and martial trophies)... Any ideas ?

Dear archaeologists,I am currently working on partial burials for my PhD. work. I am looking for examples of burials of amputated limbs (apart from hospital contexts and martial trophies)... Any...

12 February 2016 5,167 18 View

Classification of tweets - Which corpora do you think should be taken into account?

We're going to to perform some tasks concerning sentiment classification of tweets. Which lexicons, dictionaries, and other types of corpora do you think should be taken into account? Could you...

28 November 2015 9,679 6 View

Which features do you think should be taken into account for sentiment classification of tweets?

We're going to to perform some tasks concerning sentiment classification of tweets?Which features do you think should be taken into account?

28 November 2015 3,076 11 View

What are the file formats and programs for working with pictures and what are the kinds of visual features?

Dear Colleagues, (1) Please rank the various file formats for working with pictures from the viewpoints of: quality of the pictures, quality and number of extracted features, size of needed files,...

04 May 2015 9,479 0 View

What are the types of visual features, their classifications and how to extract them?

Dear Colleagues, I have a few basic questions concerning visual features and their extraction. I will be grateful if you can send me detailed answers plus references to the most relevant books and...

03 April 2015 2,115 0 View

What are the baseline results to compare with for text classification?

Concerning text classification, what are the the baseline results to compare with our best result(s) achieved by the best combination of features sets and/or single features? I though about...

03 February 2015 1,799 9 View

How can I do an automatic tuning of the parameters of any ML method in a systematic way in general and in WEKA in particular?

Hello, How can I do an automatic tuning of the parameters of any ML method in a systematic way in general and in WEKA in particular? Thanks in advance, Yaakov

01 February 2015 8,127 10 View

Could you refer me to cognitive models that detect and correct typing errors?

Could you refer me to cognitive models that explain the types of errors in typing text and ways to correct them?

31 December 2014 4,561 3 View

Why activated CAR-Jurkat cell could not kill targets?

Previously when I co-coluture anti-CD19(FMC63) CAR-Jurkat with Raji with E:T=5:1, Jurkat can eliminate Raji in 24h. However, when I test another CAR construct, although I can dectect totally CD69...

06 August 2024 641 2 View

Why do exism movements become permanent dictatorship threats within liberal democracy thinking under majority rule-independent rule of law system?

Exism movements after gaining power within liberal democracies under majority rule and independent rule of law system become permanent dictatorship threats, but why this is the case is not clear...

04 August 2024 8,125 3 View

How Social Media Affects Your Mental Health ?

How Social Media Affects Your Mental Health

04 August 2024 6,961 3 View

Are current regulations effective in preventing cancer caused by toxins?

I am currently researching the impact of environmental toxins on children's health and would greatly appreciate insights from experts in the field. If you are an expert or researcher working on...

02 August 2024 4,474 2 View

How does social media influence people's dental habits and interest in aesthetic dental treatments?

Hello, Dear friends, I am surveying as part of my thesis research and would greatly appreciate your participation. The study investigates how social media influences dental health behaviors...

01 August 2024 9,349 2 View

HAs anyone used TGA to find activation enegy of water evaporation?

I have different polymer solutions and I hypothesize that water molecules are retained based on the water-polymer interactions, so the amount of free water is less to intercat with drug tablet....

28 July 2024 9,545 6 View

What is effective targeted chaos?

Perfect democracy thinking assumes no chaos so no need for independent rule of law system and liberal democracies assume the possibility of normal democratic chaos that can be sorted out by an...

28 July 2024 473 1 View

What are the important scholarly articles related to 'Use of Likert Scale in social science research' ?

The Likert scale is a commonly used measurement tool in social science research to assess people's attitudes, opinions, or perceptions. The social science students still have difficulty selecting...

25 July 2024 7,535 1 View

Can primary care be sustainable ? And how ?

I am starting to work on this question, with particular reference to what we might be able to learn from countries outside the UK. Of course this topic is huge, and I am breaking sustainability...

25 July 2024 1,545 3 View

Corporate Social responsibility strategy?

strategy and organisation of effective social responsibility in corporations

24 July 2024 5,003 2 View

David Mitchell

I think the question should be clarified. What does the authro mean by "normalization activities"?

Yaakov HaCohen-Kerner

Dear David,

Thank you for the input.

I meant which activities should be taken in order to normalize social media texts, such as: error corrections, transforming abbreviations into their correct long forms, translation of shortcuts such as "ICY" to "I see you".

I will be happy to receive full and detailed answers (links to helpful corpora are also welcome).

Best regards,

Yaakov

Dear Yaakov

I confess that I do not see what you are trying to do so I have some trouble with the question still.

Errors in typing are universal. Some are consistent and may be useful in author identification. Similarly the use of some abbreviations as the 'ICY' you mentioned. So if it is author identification that interests you correcting these maight be counter productive.

If you are looking at the frequency and variety of misspellings or use of abbreviations again altering these seems counter productive.

If you are looking at (say) word variety it is a matter of personal choice whether to remove misspellings and abbreviations provided that these make only a small part of the text. There is a whole branch of statistical theory devoted to the treatment of such matters.

A reasonable alternative would be first to make a list of all the words in the text, locate the misspellings and abbreviations and replace them in the text.

The last two suggestions could be both carried out and the results compared.

If is very hard to make any recommendations without knowing what the data is to be use for.

Concerning texts: I believe that there are some aggregated twitter feeds on the next suitable for study. If not it should be fairly simple to collect the tweets with a simple programme and aggregate them yourself.

One web page listing twitter aggregators is this one:

http://searchengineland.com/tracking-tweets-how-to-twitter-aggregation-tools-22699

There are lot of others. Ask Google for more