im currently working on how to scrapping everything on a page with python then execute & passing the url input from php, and then save the scrap page to csv file to later import to database.
im using ubuntu 18.04 with python 3.6 and php 7. and the weird thing is, if i run the python script from the terminal it works, but if i run it from php from browser its not working. and if i change the url to "in.mail.yahoo.com" or link to gmail that i allready sign in. its scrap the whole thing and save it to csv. but if i input a wikipedia.org or even google.com it cant work if i run it from the browser.
the python code :
!/usr/bin/python
import sys,urllib,requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
def scrap():
url = ("google.com")
r = requests.get("https://" +url)
data = r.text
soup = BeautifulSoup(data)
for script in soup (["script", "style"]):
script.extract()
text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = text.replace(' ','\n ',)
text = "\n".join([ll.rstrip() for ll in text.splitlines() if ll.strip()])
print (text)
local_file = open(url.strip("https//" + "http://") + "_scrapped.csv" , "w")
# write the var pretty_soup to file
local_file.write(text)
# local_file.write(pretty_source.decode('utf-8','ignore'))
# local_file.write(pretty_source.decode('utf-8')
#close file
local_file.close()
scrap()
and the PHP code :
isi URL LOE :i expect if i run the php from the browser, im gonna input a link and then the python is processing the url and scrapping the whole thing, then save the scrap page to csv files.
Aucun commentaire:
Enregistrer un commentaire