jeudi 28 novembre 2019

How to scrape multiple pages trouble with loop?

here is my code to scrape only one page but I have 11000 of them. The difference is in their id.

https://www.rlsnet.ru/mkb_index_id_1.htm
https://www.rlsnet.ru/mkb_index_id_2.htm
https://www.rlsnet.ru/mkb_index_id_3.htm
....
https://www.rlsnet.ru/mkb_index_id_11000.htm

How can I loop my code to scrape all that 11000 pages? is it even possible with such a big amount of pages ? It is possible to put them into a list and then scrape but with 11000 of them it will be a long way.

import requests
from pandas import DataFrame
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup

page_sc = requests.get('https://www.rlsnet.ru/mkb_index_id_1.htm')
soup_sc = BeautifulSoup(page_sc.content, 'html.parser')
items_sc = soup_sc.find_all(class_='subcatlist__item')
mkb_names_sc = [item_sc.find(class_='subcatlist__link').get_text() for item_sc in items_sc]
mkb_stuff_sce = pd.DataFrame(
    {
        'first': mkb_names_sc,
    })
mkb_stuff_sce.to_csv('/Users/gfidarov/Desktop/Python/MKB/mkb.csv')




Aucun commentaire:

Enregistrer un commentaire