I want to scrape the news articles in my Google alerts RSS feed. This is my code in R using the XML package:
install.packages("XML")
library(XML)
doc1<xmlTreeParse("http://ift.tt/1GQgWv0")
file<-xmlRoot(doc1)
src<-xpathApply(file[5]$entry,"\\entry")
This is what I think the problem is:
-
The xmlRoot() function creates a list of 24 elements. If there was only 1 element, the xpathApply() function will be able to detect the nodes, as in this example here: http://ift.tt/1zDxd9h
-
The urls for the news articles that I am looking for are hidden in a mess of html code.
I would greatly appreciate if anyone could help me with this issue, or give alternative approaches to the problem. Thank you.
Aucun commentaire:
Enregistrer un commentaire