jeudi 8 décembre 2016

Python, use BeautifulSoup to find a variable from a section of javascript?

I've been using BeautifulSoup to create a webscraper. I know that I can do something like:

h4Results = soup.find_all(class_='top')[0].find_all('h4') 

to find all the H4 items within the "top" class.

However, what would I do if I want to scrape VARIABLE_TO_SCRAPE from the javascript example below (assuming this is the page source):

    <script type="text/javascript">
    var data= {
        id: 123456,
        startTimeMs: 1480816800000,
        VARIABLE_TO_SCRAPE: scrapeMe
    };
    </script>

Is my best bet to do create some sort of regex and search for it that way? If so, can someone help me with that as I'm very unfamiliar with regexes. I tried doing something like:

soup.body.findAll(text=re.compile.('\AVARIABLE_TO_SCRAPE'), limit=1)

But I just get a result of [ ]




Aucun commentaire:

Enregistrer un commentaire