dimanche 23 juin 2019

scrapping with python and run script from php form + passing data

im currently working on how to scrapping everything on a page with python then execute & passing the url input from php, and then save the scrap page to csv file to later import to database.

im using ubuntu 18.04 with python 3.6 and php 7. and the weird thing is, if i run the python script from the terminal it works, but if i run it from php from browser its not working. and if i change the url to "in.mail.yahoo.com" or link to gmail that i allready sign in. its scrap the whole thing and save it to csv. but if i input a wikipedia.org or even google.com it cant work if i run it from the browser.

the python code :

!/usr/bin/python

import sys,urllib,requests

from bs4 import BeautifulSoup

from urllib.request import urlopen

def scrap():

url = ("google.com")

r  = requests.get("https://" +url)

data = r.text

soup  = BeautifulSoup(data)

for script in soup (["script", "style"]):

    script.extract()



text = soup.get_text()



lines = (line.strip() for line in text.splitlines())



chunks = (phrase.strip() for line in lines for phrase in line.split("  "))



text = text.replace(' ','\n ',)



text = "\n".join([ll.rstrip() for ll in text.splitlines() if ll.strip()])



print (text) 



local_file = open(url.strip("https//" + "http://") + "_scrapped.csv" , "w")



# write the var pretty_soup to file



local_file.write(text)



# local_file.write(pretty_source.decode('utf-8','ignore'))

# local_file.write(pretty_source.decode('utf-8')



#close file

local_file.close()

scrap()

and the PHP code :

isi URL LOE :

i expect if i run the php from the browser, im gonna input a link and then the python is processing the url and scrapping the whole thing, then save the scrap page to csv files.




Aucun commentaire:

Enregistrer un commentaire