mardi 24 septembre 2019

How to pull a product link from customer profile page on Amazon

I'm trying to get the product link from a customers profile page usign R's RVEST package

I've referenced various questions on stack overflow including here(could not read webpage with read_html using rvest package from r), but each time I try something, I'm not able to return the correct result.

For example on this profile page:

https://www.amazon.com/gp/profile/amzn1.account.AETT6GZORFV55BFNOAVFDIJ75QYQ/ref=cm_cr_dp_d_gw_tr?ie=UTF8

I'd like to be able to return this link, with the end goal to extract the product id: B01A51S9Y2

https://www.amazon.com/Amagabeli-Stainless-Chainmail-Scrubber-Pre-Seasoned/dp/B01A51S9Y2?ref=pf_vv_at_pdctrvw_dp

library(dplyr)
library(rvest)
library(stringr)
library(httr)
library(rvest)

# get url
url='https://www.amazon.com/gp/profile/amzn1.account.AETT6GZORFV55BFNOAVFDIJ75QYQ/ref=cm_cr_dp_d_gw_tr?ie=UTF8'
x <- GET(url, add_headers('user-agent' = 'test'))
page <- read_html(x)

page %>%
  html_nodes("[class='a-link-normal profile-at-product-box-link a-text-normal']") %>%
  html_text()

#I did a test to see if i could even find the href, with no luck

test <- page %>%
  html_nodes("#a-page") %>%
  html_text()

grepl("B01A51S9Y2",test)

#I've made hours worth of attempts using the html_nodes to extract the href with no luck. Any help would be appreciated. 

Most of these return: 

character(0)



Aucun commentaire:

Enregistrer un commentaire