Hello Everyone,
need some help in using beautifulsoup library for webscrapping.
I need to extract the text from the webpage http://thehill.com/blogs/blog-briefing-room/365407-sean-diddy-combs-wants-to-buy-carolina-panthers-and-sign-kaepernick
my goal is to get the extract text exactly as i the webpage for which I a extracting all the "p" tags and its text, but inside "p" tags there are "a" tags which has also some text.
so my questions:
1. how to convert the unicoded ("") into normal strings as the text in the webpage? because when I only extract "p" tags, the beautifulsoup library converts the text into unicoded and even the special characters are unicoded, so I want to convert the extracted unicoded text into normal text. How could I do that?
2. How to extract the text inside "p" tags which has "a" tags in it. I mean I would like to exract the complete text inside the "p" tags including the text inside nested tags.
I have tried with the following code:
html = requests.get("http://thehill.com/blogs/blog-briefing-room/365407-sean-diddy-combs-wants-to-buy-carolina-panthers-and-sign-kaepernick").content
news_soup = BeautifulSoup(html, "html.parser")
a_text = news_soup.find_all('p')
y = a_text[1].find_all('a').string