I am working on this assignment:
Use urllib to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name you find.
Start at: http://py4e-data.dr-chuck.net/known_by_Emmy.html I need to find the link at position 18 (the first name is 1). Follow that link. Repeat this process 7 times. The answer is the last name that you retrieve. So the Count will be 7 and Position will be 18 when entered. The answer will be Lochlann.
- Can someone explain me line by line in detail how the while statement works here? Why does it print out 7 lines if you have it so it only prints out Retrieving: URL when count == position? If you input 18 for position, shouldn't it only print out the line when count = 18? Also, if I wanted to have the final line print out just "Lochlann", how do I code that .contents line? Code:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input("Enter URL:")
numbers = int(input("Enter count:"))
position = int(input("Enter position:"))
n = 0
count = 0
while n < numbers:
html = urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
tags = soup('a')
for tag in tags:
count = count + 1
if count == position:
url = tag.get('href', None)
print("Retrieving:" , url)
count = 0
break
n = n + 1
Aucun commentaire:
Enregistrer un commentaire