mercredi 25 janvier 2017

Can this python web-scrapper run in dedicated server?

I run this web-scrapper on my notebook - it uses Firefox (selenium - webdriver) to get the data - it must actually open the Firefox because the data are created by JavaScript. So I wonder if dedicated server can open Firefox and get the data too - I think dedicated servers have no display so it will not work? The script is much more complicated (well 152 lines) - I pasted only the parts which I think will not work. I believe importing the data into PostgreSQL is no problem in dedicated server.

    from selenium import webdriver
    import time
    from bs4 import BeautifulSoup
    import lxml
    import re
    import psycopg2
    import sys

    driver = webdriver.Firefox()
    driver.set_window_position(-9999, -9999)
    driver.get("http://ift.tt/2jZy0Ow")

    time.sleep(20) #waits till the page loads

    html_source = driver.page_source
    soup = BeautifulSoup(html_source, 'lxml')
# finds tags with speed information (km/h)
for i in (soup.find_all("tspan", {"id" : re.compile("tspan_Label_\w*")})):
            if re.match("^[0-9]+$", (str(i.getText()))) is not None:
                if (str(i.parent.get('fill'))) == '#5f5f5f':
                    list1.append(i.getText())




Aucun commentaire:

Enregistrer un commentaire