Dear colleagues,
I need to scrape as much as I can for my PhD research, my research area in health communication, it investigates the role of mediated communication in public health, specifically focusing on anti-vaccine issue as a comparative study on vaccination messages in KSA & AUS, I will focus on one media platform: Twitter or Facebook.
I have used a scraper tool to collect data from twitter, I started with some hashtags for KSA, and added a few more hashtags I found. I noticed that some hashtags are used for spams, I tried to clean the data from spam as much as we can, but I may still find some spam tweets.
At the same time, I have found bad news, as Facebook and Instagram are banning anti-vaccination content, and seems like twitter is starting to do the same, a lot of the hashtags I'm trying in English but have very few and bad results, even if I'm not focusing on KSA or Australia as you see in this link:
https://thehill.com/policy/technology/435207-instagram-to-block-anti-vaccine-hashtags-amid-misinformation-crackdown#.XJVaGjePUc4.twitter
As a result of that, I am facing two problems: How can I determine country in scraping data? and How can I translate data from Arabic to English for analysis as I will use lexomancer, and it does not work with Arabia content?
I need to be collecting as much data NOW as I can, so could you have any helpful advise in that please?
Many Thanks