mercredi 19 décembre 2018

Unable to Scrape the threads using rvest in R

I need to scrape the threads and replies from the following website:

https://forums.vwvortex.com/forumdisplay.php?5449-Atlas-SUV/page2&pp=200&sort=lastpost&order=desc&daysprune=-1

I tried this code:

url<-"https://forums.vwvortex.com/forumdisplay.php?5449-Atlas-SUV/page1&pp=200&sort=lastpost&order=desc&daysprune=-1"

N_pages <- 5

A <- NULL

D<-NULL

for (j in 1: N_pages){

review <- read_html(paste0(url, j))

threads<- cbind(review %>% html_nodes(".threadtitle") %>% html_text() )

author <- cbind(review %>% html_nodes(".label") %>% html_text() )

X<- rbind(A, threads, author)

x <- as.data.frame(X) }

Problem: I used selectorgadget to get the correct HTML source. However, when I run the code, I do not get the required results.

Output I get:

V1

1 Title/thread Starter

2 Sticky: ****Please use the search****

3 Sticky: **** The Official Atlas SUV DIY/FAQ thread****

Required output:

Threads Author Replies

Text name, date text

Text name, date text

Text name, date text

How do to get scrape these threads. Should I use rvest or is it through API/Json? I do I know how to go about it ?




Aucun commentaire:

Enregistrer un commentaire