jeudi 24 octobre 2019

issue downloading multiple pdf's through python

after running the following code, I am unable to open the downloaded PDF's. Even though the code ran successfully, the downloaded PDF files are damaged. My computer's error message is "Unable to open file. it may be damaged or in a format Preview doesn't recognize."

Why are they damaged and how do I solve this?

'''

import os
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

url = "https://github.com/sonhuytran/MIT8.01SC.2010F/tree/master/References/University%20Physics%20with%20Modern%20Physics%2C%2013th%20Edition%20Solutions%20Manual"

#If there is no such folder, the script will create one automatically
folder_location = r'/Users/rahelmizrahi/Desktop/ Physics_Solutions'
if not os.path.exists(folder_location):os.mkdir(folder_location)

response = requests.get(url)
soup= BeautifulSoup(response.text, "html.parser")     
for link in soup.select("a[href$='.pdf']"):

    filename = os.path.join(folder_location,link['href'].split('/')[-1])
    with open(filename, 'wb') as f:
        f.write(requests.get(urljoin(url,link['href'])).content) 

'''




Aucun commentaire:

Enregistrer un commentaire