I am creating a webscraper where I am gathering the full text of an article. As so right now I have not been able to grab the html needed for the full text of the arrticle. The text should later be outputted onto a csv with the text all in one row
My output is currently blank
My program is below:
library(rvest)
library(RCurl)
library(XML)
library(stringr)
#for Fulltext to read pdf
####install.packages("pdftools")
library(pdftools)
fullText <- function(parsedDocument){
fullText <- parsedDocument %>%
html_nodes("a.article-body") %>%
html_text() %>%
return(fullText)
}
#main function with input as parameter year
testFullText <- function(DOIurl){
parsedDocument <- read_html(DOIurl)
DNAresearch <- data.frame()
allData <- data.frame("Full Text" = fullText(parsedDocument), stringsAsFactors = FALSE)
DNAresearch <- rbind(DNAresearch, allData)
write.csv(DNAresearch, "DNAresearch.csv", row.names = FALSE)
}
testFullText("https://doi.org/10.1093/dnares/dsm026")
Aucun commentaire:
Enregistrer un commentaire