I run this web-scrapper on my notebook - it uses Firefox (selenium - webdriver) to get the data - it must actually open the Firefox because the data are created by JavaScript. So I wonder if dedicated server can open Firefox and get the data too - I think dedicated servers have no display so it will not work? The script is much more complicated (well 152 lines) - I pasted only the parts which I think will not work. I believe importing the data into PostgreSQL is no problem in dedicated server.
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import lxml
import re
import psycopg2
import sys
driver = webdriver.Firefox()
driver.set_window_position(-9999, -9999)
driver.get("http://ift.tt/2jZy0Ow")
time.sleep(20) #waits till the page loads
html_source = driver.page_source
soup = BeautifulSoup(html_source, 'lxml')
# finds tags with speed information (km/h)
for i in (soup.find_all("tspan", {"id" : re.compile("tspan_Label_\w*")})):
if re.match("^[0-9]+$", (str(i.getText()))) is not None:
if (str(i.parent.get('fill'))) == '#5f5f5f':
list1.append(i.getText())
Aucun commentaire:
Enregistrer un commentaire