I've been using BeautifulSoup to create a webscraper. I know that I can do something like:
h4Results = soup.find_all(class_='top')[0].find_all('h4')
to find all the H4 items within the "top" class.
However, what would I do if I want to scrape VARIABLE_TO_SCRAPE from the javascript example below (assuming this is the page source):
<script type="text/javascript">
var data= {
id: 123456,
startTimeMs: 1480816800000,
VARIABLE_TO_SCRAPE: scrapeMe
};
</script>
Is my best bet to do create some sort of regex and search for it that way? If so, can someone help me with that as I'm very unfamiliar with regexes. I tried doing something like:
soup.body.findAll(text=re.compile.('\AVARIABLE_TO_SCRAPE'), limit=1)
But I just get a result of [ ]
Aucun commentaire:
Enregistrer un commentaire