I've been wondering- how do I pull the text off a site then shove it in an array? At the moment I'm trying something like this: https://hastebin.com/vokoxixaqo.lua
I want to store each site separately in an array so I can read from it one by one. Any help would be greatly appreciated, thank you!
html = urlopen(url).read()
soup = BeautifulSoup(html, features="html.parser")
for script in soup(["script", "style"]):
script.extract()
text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split("'\'"))
finaletext = '\n'.join(chunk for chunk in chunks if chunk)
print(text)
with open('sites.txt', 'wt') as outfile:
json.dump(finaletext, outfile)
sites = []
f = open('sites.txt', 'r')
for line in f:
line = line.strip()
words = line.split('\n')
for w in words:
sites.append(w)
f.close()
print(sites)```
Aucun commentaire:
Enregistrer un commentaire