mardi 1 décembre 2020

Downloading OECD data with Python into a dataframe

I am trying to load MEI data into a dataframe from OECD. Here is the link to the dataset

https://stats.oecd.org/viewhtml.aspx?datasetcode=MEI_BTS_COS&lang=en

I created an API from their tool, the link to which is

https://stats.oecd.org/SDMX-JSON/data/MEI_BTS_COS/CS+CSES+CSESFT+CSCI+CSCICP02+CSIN+CSINFT.AUS+AUT+BEL+CHL+COL+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+IRL+ISR+ITA+JPN+KOR+LVA+LTU+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+SVN+ESP+SWE+CHE+TUR+GBR+USA+EA19+NMEC+BRA+CHN+CRI+IND+IDN+RUS+ZAF.BLSA.Q+M/all?startTime=2019-Q2&endTime=2020-Q4

Trying to download the entire dataset and have tried a few option which are

Method 1:

import pandas as pd
import requests
import json
import requests
import pandasdmx

import requests

url = "https://stats.oecd.org/SDMX-JSON/data/MEI_BTS_COS/CS+CSES+CSESFT+CSCI+CSCICP02+CSIN+CSINFT.AUS+AUT+BEL+CHL+COL+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+IRL+ISR+ITA+JPN+KOR+LVA+LTU+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+SVN+ESP+SWE+CHE+TUR+GBR+USA+EA19+NMEC+BRA+CHN+CRI+IND+IDN+RUS+ZAF.BLSA.Q+M/all?startTime=2019-Q2&endTime=2020-Q4"

payload={}
headers = {}

response = requests.request("GET", url, headers=headers, data=payload)


response1 = requests.request("GET", url, headers=headers, data = payload)

c = response1.json()
cs = c['dataSets']['attributes']
df1 = pd.read_json(json.dumps(cs), orient='records')
df1.reset_index(drop=True, inplace=True)

print(df1)

output

C:\Python\Python38\lib\site-packages\pandasdmx\remote.py:10: RuntimeWarning: optional dependency requests_cache is not installed; cache options to Session() have no effect                                   
  warn(                                                                                                                                                                                                       
        action                                             series                                                                                                                                             
0  Information  {'0:0:0:0': {'attributes': [0, 0, 0, None], 'o...  

Method 2:

def get_from_oecd(sdmx_query):
    return pd.read_csv(
        f"https://stats.oecd.org/SDMX-JSON/data/{sdmx_query}?contentType=csv"
    )

print(get_from_oecd("MEI_BTS_COS/CSESFT.AUS.M/OECD").head())  

Output

  File "oecd2.py", line 49, in <module>                                                                                                                                                                       
    print(get_from_oecd("MEI_BTS_COS/CSESFT.AUS.M/OECD").head())                                                                                                                                              
  File "oecd2.py", line 45, in get_from_oecd                                                                                                                                                                  
    return pd.read_csv(                                                                                                                                                                                       
  File "C:\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f                                                                                                                     
    return _read(filepath_or_buffer, kwds)                                                                                                                                                                    
  File "C:\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 430, in _read                                                                                                                        
    fp_or_buf, _, compression, should_close = get_filepath_or_buffer(                                                                                                                                         
  File "C:\Python\Python38\lib\site-packages\pandas\io\common.py", line 172, in get_filepath_or_buffer                                                                                                        
    req = urlopen(filepath_or_buffer)                                                                                                                                                                         
  File "C:\Python\Python38\lib\site-packages\pandas\io\common.py", line 141, in urlopen                                                                                                                       
    return urllib.request.urlopen(*args, **kwargs)                                                                                                                                                            
  File "C:\Python\Python38\lib\urllib\request.py", line 222, in urlopen                                                                                                                                       
    return opener.open(url, data, timeout)                                                                                                                                                                    
  File "C:\Python\Python38\lib\urllib\request.py", line 531, in open                                                                                                                                          
    response = meth(req, response)                                                                                                                                                                            
  File "C:\Python\Python38\lib\urllib\request.py", line 640, in http_response                                                                                                                                 
    response = self.parent.error(                                                                                                                                                                             
  File "C:\Python\Python38\lib\urllib\request.py", line 569, in error                                                                                                                                         
    return self._call_chain(*args)                                                                                                                                                                            
  File "C:\Python\Python38\lib\urllib\request.py", line 502, in _call_chain                                                                                                                                   
    result = func(*args)                                                                                                                                                                                      
  File "C:\Python\Python38\lib\urllib\request.py", line 649, in http_error_default                                                                                                                            
    raise HTTPError(req.full_url, code, msg, hdrs, fp)                                                                                                                                                        
urllib.error.HTTPError: HTTP Error 400: Semantic Error   
                                                         

Still can't figure out what I am doing wrong, although I have my doubts that it has something to do with query parameters. Is there a way of getting the data into a df?




Aucun commentaire:

Enregistrer un commentaire