dimanche 3 novembre 2019

r cannot find numbers of pages of website in web scrapping

I want to take number of pages from web site. I try to do it like on titorial.I used this function:

get_last_page <- function(html){

  pages_data <- html %>% 
                  # The '.' indicates the class
                  html_nodes('.pagination-page') %>% 
                  # Extract the raw text as a list
                  html_text()                   

  # The second to last of the buttons is the one
  pages_data[(length(pages_data)-1)] %>%            
    # Take the raw string
    unname() %>%                                     
    # Convert to number
    as.numeric()                                     
}
first_page <- read_html(url)
(latest_page_number <- get_last_page(first_page))

for website

url <-'http://www.trustpilot.com/review/www.amazon.com'

it works fine.When I tried to do it with

url <-'https://energybase.ru/en/oil-gas-field/index'

I got integer(0).

I change

html_nodes('.pagination-page') 

to

html_nodes('.html_nodes('data-page')') 

And failed. How can I change my code to make it works fine?




Aucun commentaire:

Enregistrer un commentaire