mercredi 6 mai 2015

Unable to find node in xmlRoot Google RSS feed in R

I want to scrape the news articles in my Google alerts RSS feed. This is my code in R using the XML package:

install.packages("XML") 
library(XML)

doc1<xmlTreeParse("http://ift.tt/1GQgWv0") 
file<-xmlRoot(doc1)
src<-xpathApply(file[5]$entry,"\\entry")

This is what I think the problem is:

  1. The xmlRoot() function creates a list of 24 elements. If there was only 1 element, the xpathApply() function will be able to detect the nodes, as in this example here: http://ift.tt/1zDxd9h

  2. The urls for the news articles that I am looking for are hidden in a mess of html code.

I would greatly appreciate if anyone could help me with this issue, or give alternative approaches to the problem. Thank you.




Aucun commentaire:

Enregistrer un commentaire