jeudi 5 mars 2020

How to remove certain information from webscraping function (Beautiful Soup):

I am using BeautifulSoup to scrape from this website https://lawyers.justia.com/lawyer/michael-paul-ehline-85006

I do not not want the sponsered listings in my output:

My code:

for o in soup.findAll('div', attrs={"class": "block-wrapper"}): 
    for de in o.findAll("li"):
        if de != []:
            de=remove_tags(str(de))
            print (de)

Output in python: OUTPUT IMAGE




Aucun commentaire:

Enregistrer un commentaire