I need to scrape the threads and replies from the following website:
I tried this code:
N_pages <- 5
A <- NULL
D<-NULL
for (j in 1: N_pages){
review <- read_html(paste0(url, j))
threads<- cbind(review %>% html_nodes(".threadtitle") %>% html_text() )
author <- cbind(review %>% html_nodes(".label") %>% html_text() )
X<- rbind(A, threads, author)
x <- as.data.frame(X) }
Problem: I used selectorgadget to get the correct HTML source. However, when I run the code, I do not get the required results.
Output I get:
V1
1 Title/thread Starter
2 Sticky: ****Please use the search****
3 Sticky: **** The Official Atlas SUV DIY/FAQ thread****
Required output:
Threads Author Replies
Text name, date text
Text name, date text
Text name, date text
How do to get scrape these threads. Should I use rvest or is it through API/Json? I do I know how to go about it ?
Aucun commentaire:
Enregistrer un commentaire