mercredi 23 septembre 2020

BeautifulSoup4 can't go deep enough to find articles

I just started experimenting with python and BeautifulSoup.

I want to get the links to articles that are related to a specific city

Here is the current code

import requests
from bs4 import BeautifulSoup

city = "london"
result = requests.get('https://www.sample.com/search/index.html?q=' + city)


def main_loop():
    soup = BeautifulSoup(result.content, features="lxml")
    articles = soup.find("div", "oc-articleList")

    print(articles)


if result.status_code == 200:
    main_loop()
else:
    print('error:', result.status_code)

The result is:

<div class="oc-articleList"></div>

The first thing I tried was getting the articles with:

articles = soup.find_all("article")

But it could find anything.

If you check the sites source code it looks something like this:

<div class="oc-articleList">
    <article>...</article>
    <article>...</article>
    <article>...</article>
    <article>...</article>
    .
    .
    .
</div>

How can I make BS parse deeper into the DOM?




Aucun commentaire:

Enregistrer un commentaire