mardi 17 novembre 2020

Why does this while loop print out 7 lines of code rather than 1?

I am working on this assignment:

Use urllib to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name you find.

Start at: http://py4e-data.dr-chuck.net/known_by_Emmy.html I need to find the link at position 18 (the first name is 1). Follow that link. Repeat this process 7 times. The answer is the last name that you retrieve. So the Count will be 7 and Position will be 18 when entered. The answer will be Lochlann.

  1. Can someone explain me line by line in detail how the while statement works here? Why does it print out 7 lines if you have it so it only prints out Retrieving: URL when count == position? If you input 18 for position, shouldn't it only print out the line when count = 18? Also, if I wanted to have the final line print out just "Lochlann", how do I code that .contents line? Code:
    from urllib.request import urlopen
    from bs4 import BeautifulSoup
    import ssl
    
    ctx = ssl.create_default_context()
    ctx.check_hostname = False
    ctx.verify_mode = ssl.CERT_NONE
    
    url = input("Enter URL:")
    numbers  = int(input("Enter count:"))
    position = int(input("Enter position:"))
    
    n = 0
    count = 0
    
    while n < numbers:
        html = urlopen(url, context=ctx).read()
        soup = BeautifulSoup(html, 'html.parser')
        tags = soup('a')
        for tag in tags:
          count = count + 1
          if count == position:
              url  = tag.get('href', None)
              print("Retrieving:" , url)
              count = 0
              break
        n = n + 1



Aucun commentaire:

Enregistrer un commentaire