lundi 2 décembre 2019

Scrap the data from a website thats load the data from javascript / Json Array (POST request) using Python

I am trying to scrap the data from this link https://www.ikh.se/sv/arbetskladsel--skyddsutrustning--skyddsprodukter/brandskydds--och-forsta-forbandsprodukter

I have tired this way

from bs4 import BeautifulSoup
import urllib.request
import csv

# specify the url
urlpage =  'https://www.ikh.se/sysNet/getProductsJSON/getProductsJSONDB.aspx?sua=1&lang=2&navid=19277994'

 query the website and return the html to the variable 'page'
page = urllib.request.urlopen(urlpage)
parse the html using beautiful soup and store in variable 'soup'
 soup = BeautifulSoup(page, 'html.parser')
 tag = soup.find('div', attrs={'class':'dnsCell'})
 text = (''.join(tag.stripped_strings))
print (page)

but i got the html dom but the the product list dom are mising . actaully i gues the product list dom manage by a json array thats request from https://www.ikh.se/sysNet/getProductsJSON/getProductsJSONDB.aspx?sua=2&lang=2&navid=11994180 but i am not sure about the produclt list dom load method . I am right or wrong. I want to scrap the all product details from this site and export in the excel




Aucun commentaire:

Enregistrer un commentaire