mercredi 2 juin 2021

How to upload a large file with Python module

Let me describe the current problem.

  1. A user can log into https://<FQDN>/UI by web browser, then click somewhere and can upload a file, for example, a file with size 2.5G. All those operations(login,..., upload) are actual API calls.

  2. Alternatively, user can directly make Rest API calls by Java or Python apps as well. So, I am writing a Python code snippet to do this. Unfortunately, as a succssful API call of "https://<FQDN>/api/login", the next API call " https://<FQDN>/api/v1.0/updateFiles/upload" has problem. It can only upload small files. For large file like 2.5G, it fails.

  3. As I go back to web browser, and try upload big file, with debug window opened by F12, I see it makes multiple requests and in each request header it contain like below:

....
content-length : 5000464
sec-ch-ua : " Not A;Brand";v="99", "Chromium";v="90", "Microsoft Edge";v="90"
sec-ch-ua-mobile : ?0
user-agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36 Edg/90.0.818.46
content-type : multipart/form-data; boundary=----WebKitFormBoundaryPuIjpBG8n874gC29
content-range : bytes 0-4999999/2684354560
accept : application/json
x-requested-with : XMLHttpRequest
content-disposition : attachment; filename="b86aee78-56db-4398-8cd0-69b156740a2a"
....

It looks like it split the whole file and set "content-range" and make API calls.

  1. I want to simulate the behavior of web browser so I write a Python snippet, as below:
#_*_ coding:utf-8 _*_
import requests
import time
import random
import string
from requests_toolbelt.multipart import encoder

datas = {'userName':'<Username>','<Password>'}
header = {'accept':'application/json', 'Content-Type':'application/json'}

r = requests.post('https://<FQDN>/api/login', headers=header, data=datas)
#print(r.content)
print(r.status_code)
if r.status_code == 200:
    print("Login is successful")
responseBody = r.json()
sessionId = responseBody['sessionId']
print(sessionId)

url = "https://<FQDN>/api/v1.0/updateFiles/upload"
    

fields = {
    'Content-Dispostion':'attachment; filename="3rdParty1.tgz"',
    'file': ('1.tar.gz', open('C:/Users/Jie2/Downloads/1.tar.gz', 'rb'), 'application/zip')
}

boundary = '----WebKitFormBoundary' + ''.join(random.sample(string.ascii_letters+string.digits,16))

multipart_encoder = encoder.MultipartEncoder(fields=fields,boundary=boundary)


headers = {
    'accept':'application/json',
    'sessionId':sessionId,
    'Referer':'https://<FQDN>/UI/',
    'Content-Type': 'multipart/form-data; boundary={}'.format(boundary),
    
}

response = requests.post(url=url, headers=headers,
                         data=multipart_encoder,
                         cookies=r.cookies)
       
print(response)

But the above code can only upload small file like Kbytes. For large file like 2.5G, it fails. The above code takes https://toolbelt.readthedocs.io/en/latest/uploading-data.html as reference.

  1. I also tried giving up requests_toolbelt, and only use requests module, and use a loop to manually read the file section by section and manually set the "content-range". Unfortunately it doesn't work either.
def mutiple_upload(str,f,str1):
    # m1 = md5()
    # m1.update(Bfile)
    hx = {
        "Host": "<FQDN>",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36",
        "Accept": "application/json",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
        "Referer": "https://<FQDN>/ui",
        "X-Requested-With": "XMLHttpRequest",
        "Content-Length": "75023",
        "Connection": "keep-alive",
        "Transfer-Encoding": 'chunked',
        "sessionId": sessionId,
        "timestamp": now2,
        "Content-MD5": str1,
        "Content-Range": str
    }
    resp=requests.post(url, headers=hx, files=f)
    # print(resp.json())

for x in range(0,t):
    m = md5()
    if ((x+x*chunk_size+chunk_size) < fLen):
        crf= "bytes " + str(x+x*chunk_size) + "-" + str(x+x*chunk_size+chunk_size)+ "/" + str(fLen)
        f = open('<FileName>', 'rb')
        f.seek((x+x*chunk_size),0)
        fs = f.read(chunk_size)
        fx = {
            'name': (None, '<FileName>'),
            'file': ('<FileName>',fs)}
        m.update(fs)
        mutiUpload = mutiple_upload(crf, fx, str(m.hexdigest()))
    else:
        crf = "bytes " + str(x + x * chunk_size) + "-" + str(fLen - 1) + "/" + str(fLen)
        f = open('<FileName>', 'rb')
        f.seek((x+x*chunk_size),0)
        fs = f.read(chunk_size - 1)
        fx = {
            'name': (None, '<FileName>'),
            'file': ('<FileName>',fs)}
        m.update(fs)
        mutiUpload = mutiple_upload(crf,fx, str(m.hexdigest()))

Does anybody know if there is a python module that can help me to implement this requirement ------ automatically read the large file section by section, and automatically set the header like "content-range" to make it work? Or, I do not find a correct way to use requests_toolbelt.

Kindly please advise.

Thanks & regards, Jie




Aucun commentaire:

Enregistrer un commentaire