I am creating a program to determine if a specific vanity URL on steam is already taken. I need to read the first line from a file which contains a list of potential vanity URLs and check if the URL is already taken by scraping the page with beautiful soup and reading the header. In my code it reads the first character of each line rather than reading the whole first line. I need to know how to read the first line from the file containing the list and then increment the line number by 1 after each check.
I have tried using fileRead.readline(currentLine) instead of fileRead.read(currentLine)... I can not seem to find any syntax guidance for how to read whole lines individually one after the other.
#imports
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
#init vars
word = ""
URL = "https://steamcommunity.com/id/"
idTaken = False
totalWords = 0
totalAvailableIds = 0
currentLine = 0
#main
with open('wordList.txt','r') as fileRead:
print(fileRead.name,fileRead.mode)
print('')
for counter in fileRead:
totalWords += 1
currentLine += 1
currentWord = fileRead.readline(currentLine)
URL += currentWord
print(URL)
uClient = uReq(URL)
pageHTML = uClient.read()
uClient.close()
pageSoup = soup(pageHTML,"html.parser")
print(pageSoup.h1)
if pageSoup.h1 == "<h1>Sorry!</h1>":
idTaken = False
totalAvailableIds += 1
fileWrite = open('idsAvailable.txt','w')
#append currentWord to fileWrite
else:
idTaken = True
Output looks like this:
wordList.txt r
https://steamcommunity.com/id/a
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaa
None
https://steamcommunity.com/id/aaaaba
None
https://steamcommunity.com/id/aaaabaabac
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaaabaabacabaft
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaaabaabacabaftabalon
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaaabaabacabaftabalonabandon
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaaabaabacabaftabalonabandonabandone
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaaabaabacabaftabalonabandonabandoneabandonme
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaaabaabacabaftabalonabandonabandoneabandonmeabandons
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaaabaabacabaftabalonabandonabandoneabandonmeabandonsabased
When it should look like this:
https://steamcommunity.com/id/aardvark
None
https://steamcommunity.com/id/aardwolf
<h1>Sorry!</h1>
... etc
Aucun commentaire:
Enregistrer un commentaire