im experimenting with proxy servers, i want to create a Bot who connects every few minutes to my webserver and scrapes a file (namely the index.html) for changes.
i tried to apply things i learned in some multihour python tutorials and got to the result to make it a bit more funny i could use random proxies.
so i wrote down this methond
import requests
from bs4 import BeautifulSoup
from random import choice
#here i get the proxy from a proxylist due processing a table embedded in html with beautifulSoup
def get_proxy():
print("bin in get_proxy")
proxyDomain = 'https://free-proxy-list.net/'
r = requests.get(proxyDomain)
print("bin in mache gerade suppe")
soup = BeautifulSoup(r.content, 'html.parser')
table = soup.find('table', {'id': 'proxylisttable'})
#this part works
#print(table.get_text)
print("zeit für die Liste")
ipAddresses = []
for row in table.findAll('tr'):
collumns = row.findAll('td')
try:
ipAddresses.append("https://"+str(collumns[0].get_text()) + ":" + str(collumns[1].get_text()))
#ipList.append(str(collumns[0].get_text()) + ":" + str(collumns[1].get_text()))
except:
pass
#here the program returns one random IP Address from the list
return choice(ipAddresses)
# return 'https://': + choice(iplist)
def proxy_request(request_type, url, **kwargs):
print("bin in proxy_request")
while 1:
try:
proxy = get_proxy()
print("heute verwenden wir {}".format(proxy))
#so until this line everything seems to work as i want it to do
#now the next line should do the proxied request and at the end of the loop it should return some html text....
r = requests.request(request_type, url, proxies=proxy, timeout=5, **kwargs)
break
except:
pass
return r
def launch():
print("bin in launch")
r = proxy_request('get', 'https://mysliwje.uber.space.')
### but this text never arrives here - maybe the request is going to be carried out the wrong way
###does anybody got a idea how to solve that program so that it may work?
print(r.text)
launch()
as i explained in the code section before, the code works nice, it picks some random ip of a random ip list and prints it even to the cli. the next step all of the sudden seems to be carried out the wrong way, because the tools is running back scraping a new ip address and another and another and another and another... of a list that seems to be updated every few minutes.... so i ask myself what is happening, why i dont see the simple html code of my indexpage?
Anybody any Idea?
Thanxx
Aucun commentaire:
Enregistrer un commentaire