Dear colleagues,

I need to scrape as much as I can for my PhD research, my research area in health communication, it investigates the role of mediated communication in public health, specifically focusing on anti-vaccine issue as a comparative study on vaccination messages in KSA & AUS, I will focus on one media platform: Twitter or Facebook.

I have used a scraper tool to collect data from twitter, I started with some hashtags for KSA, and added a few more hashtags I found. I noticed that some hashtags are used for spams, I tried to clean the data from spam as much as we can, but I may still find some spam tweets.

At the same time, I have found bad news, as Facebook and Instagram are banning anti-vaccination content, and seems like twitter is starting to do the same, a lot of the hashtags I'm trying in English but have very few and bad results, even if I'm not focusing on KSA or Australia as you see in this link:

https://thehill.com/policy/technology/435207-instagram-to-block-anti-vaccine-hashtags-amid-misinformation-crackdown#.XJVaGjePUc4.twitter

As a result of that, I am facing two problems: How can I determine country in scraping data? and How can I translate data from Arabic to English for analysis as I will use lexomancer, and it does not work with Arabia content?

I need to be collecting as much data NOW as I can, so could you have any helpful advise in that please?

Many Thanks

More Khalid Alsulami's questions See All
Similar questions and discussions