samedi 24 avril 2021

Extracting links and ads Titles from B4S filtered list

I'm trying to filter what I extract from a website in 2 lists: one for the links and one for the titles of the ads connected to the links.

Here is my code:

import re as standardre
from operator import contains
import requests
import bs4 as bs
import string


def raccogliannunci():
    listatitoloannuncio = []
    listalinkannuncio = []

    # r = input("Metti il link: ")
    r = 'https://www.immobiliare.it/vendita-case/roma/parioli-flaminio/?criterio=rilevanza'
    page = requests.get(r)
    soup = bs.BeautifulSoup(page.content, features="html.parser")
    lista_annunci = soup.find_all(href=standardre.compile("annunci"))


    for line in lista_annunci:
            listatitoloannuncio.append(line)

    return listatitoloannuncio

Code explained:

The functions raccogliannunci() opens up the link (the user will provide it when the program works) and requests the page desired by the user. It then finds all the code that contains the word "annunci" (ads in Italian) and places them in this list that BS4 creates. At this point I'm having problems filtering the exact info I want in 2 lists.

  • In the first list (listatitoloannuncio) I want the Title associated with the link, here is a snippet of the website:

<a href="https://www.immobiliare.it/annunci/86565462/" title="Appartamento via Ettore Ximenes, Parioli, Roma" id="link_ad_86565462" data-row-link=""> Appartamento via Ettore Ximenes, Parioli, Roma </a>

So here I want the "Appartamento via Ettore Ximenes, Parioli, Roma" and do this for every line in the list.

  • For the second list I want the links to be entered. The links have all the word "annunci" in common, that is why I search for them like that in the first place.

I tried this already:

for line in lista_annunci:
    if contains("href", str(line)):
        listatitoloannuncio.append(line)

Thanks and I really hope you can help me out.




Aucun commentaire:

Enregistrer un commentaire