I try to crawl the product title and other information in the website. There is a category and then a page. In the page, there is a product list. I click one of the products in the list and then I crawl the information(title, etc.). After crawling, I click to go back to the list. I click next one of the products in the list until specific page and category.
This is my logic.
There is a problem that under 'for i in range(0,20):' driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
html and soup has the page_source(information) about the products list, not the information about the product. I tried to get the current URL and get the information but the source keep having the prior page information.
I need a help about it.
def get_search_page_url(category, page):
return('https://www.missycoupons.com/zero/board.php#id=hotdeals&category={}&page={}'.format(category, page))
def get_prod_items(prod_items):
prod_data = []
for prod_item in prod_items:
try:
title = prod_item.select('div.rp-list-table-row.normal.post')[0].text.strip()
except:
title =''
prod_data.append([title])
return prod_data
#####
driver = webdriver.Chrome('C:/chromedriver.exe')
driver.implicitly_wait(10)
prod_data_total =[]
for category in range(1, 2):
for page in range(1, 2):
url = get_search_page_url(category, page)
driver.get(url)
time.sleep(15)
for i in range(0,20):
driver.find_elements_by_css_selector("div.rp-list-table-cell.board-list.mc-l-subject>a")[i].click()
url=driver.current_url
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
prod_items = soup.select('div#mc_view_title')
prod_item_list = get_prod_items(prod_items)
prod_data_total = prod_data_total + prod_item_list
driver.back()
time.sleep(5)
Aucun commentaire:
Enregistrer un commentaire