mardi 18 juin 2019

How do I increment the line number being read from a file?

I am creating a program to determine if a specific vanity URL on steam is already taken. I need to read the first line from a file which contains a list of potential vanity URLs and check if the URL is already taken by scraping the page with beautiful soup and reading the header. In my code it reads the first character of each line rather than reading the whole first line. I need to know how to read the first line from the file containing the list and then increment the line number by 1 after each check.

I have tried using fileRead.readline(currentLine) instead of fileRead.read(currentLine)... I can not seem to find any syntax guidance for how to read whole lines individually one after the other.

#imports
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

#init vars
word = ""
URL = "https://steamcommunity.com/id/"
idTaken = False
totalWords = 0
totalAvailableIds = 0
currentLine = 0

#main
with open('wordList.txt','r') as fileRead:
    print(fileRead.name,fileRead.mode)
    print('')
    for counter in fileRead:
        totalWords += 1
        currentLine += 1
        currentWord = fileRead.readline(currentLine)
        URL += currentWord
        print(URL)
        uClient = uReq(URL)
        pageHTML = uClient.read()
        uClient.close()
        pageSoup = soup(pageHTML,"html.parser")
        print(pageSoup.h1)
        if pageSoup.h1 == "<h1>Sorry!</h1>":
            idTaken = False
            totalAvailableIds += 1
            fileWrite = open('idsAvailable.txt','w')
            #append currentWord to fileWrite
        else:
            idTaken = True

Output looks like this:

wordList.txt r

https://steamcommunity.com/id/a
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaa
None
https://steamcommunity.com/id/aaaaba
None
https://steamcommunity.com/id/aaaabaabac
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaaabaabacabaft
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaaabaabacabaftabalon
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaaabaabacabaftabalonabandon
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaaabaabacabaftabalonabandonabandone
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaaabaabacabaftabalonabandonabandoneabandonme
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaaabaabacabaftabalonabandonabandoneabandonmeabandons
<h1>Sorry!</h1>
https://steamcommunity.com/id/aaaabaabacabaftabalonabandonabandoneabandonmeabandonsabased

When it should look like this:

https://steamcommunity.com/id/aardvark
None
https://steamcommunity.com/id/aardwolf
<h1>Sorry!</h1>

... etc




Aucun commentaire:

Enregistrer un commentaire