jeudi 3 septembre 2020

Scraping web in database server with python

I'm searching for the tag b, where the data can be finded, this is the code:

#Connection to the site:

url_pcd = http://sinda.crn.inpe.br/PCD/SITE/novo/site/historico/

url = url_pcd + passo2.php/

data_limits = [ ]

#request html

dados = requests.get(url)

soup = BeautifulSoup(dados.text, 'html.parser')


#extracting the tag data:


for option in soup.find_all('b'):

    data_limits.append(option.text)

It returns me the pages html, but with this text without the tag b which are the data I want:

Período Disponível:<br/>
<!-- Inicio checagem de dados disponíveis -->
Não encontrado
<!-- Fim verificacao -->
<ul>
<li id="li_1">
<label class="description" for="element_1">Data Inicial: </label>
<span>

What I realized is that:

We have this url in the first page:

http://sinda.crn.inpe.br/PCD/SITE/novo/site/historico/index.php

and this is the url in the second page:

http://sinda.crn.inpe.br/PCD/SITE/novo/site/historico/passo2.php

If we access this url, it returns without the dates, without the tags . What I've understood of the code, inside of the app.py it seems like it's using the action.php to get this request of the tag b, but when I use by the url below:

http://sinda.crn.inpe.br/PCD/SITE/novo/site/historico/passo2.php/action.php

It also returns me without the tag b

Why does it happen? It seems like this tags are powered by a db/API that when we select the pcd and it loads the data and save in the tag b. I'd have to make another request?




Aucun commentaire:

Enregistrer un commentaire