Hello Everyone,

need some help in using beautifulsoup library for webscrapping.

I need to extract the text from the webpage http://thehill.com/blogs/blog-briefing-room/365407-sean-diddy-combs-wants-to-buy-carolina-panthers-and-sign-kaepernick

my goal is to get the extract text exactly as i the webpage for which I a extracting all the "p" tags and its text, but inside "p" tags there are "a" tags which has also some text.

so my questions:

1. how to convert the unicoded ("") into normal strings as the text in the webpage? because when I only extract "p" tags, the beautifulsoup library converts the text into unicoded and even the special characters are unicoded, so I want to convert the extracted unicoded text into normal text. How could I do that?

2. How to extract the text inside "p" tags which has "a" tags in it. I mean I would like to exract the complete text inside the "p" tags including the text inside nested tags.

I have tried with the following code:

html = requests.get("http://thehill.com/blogs/blog-briefing-room/365407-sean-diddy-combs-wants-to-buy-carolina-panthers-and-sign-kaepernick").content

news_soup = BeautifulSoup(html, "html.parser")

a_text = news_soup.find_all('p')

y = a_text[1].find_all('a').string

More Udaysimha Nerella's questions See All
Similar questions and discussions