I am trying to scrape the SEC's website to grab company headquarter data. The function is below and it just takes way too long to execute. Is there a better way of running it to make it faster and more efficient? Thanks so much!
def get_hq(cik):
for i, row in Companies.iterrows():
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2810.1 Safari/537.36'}
base_url = 'https://sec.report/CIK/{}'
cik = Companies['CIK'][i]
response_object = requests.get(base_url.format(cik), headers = headers)
raw_html = html.fromstring(response_object.text)
try:
hq = raw_html.xpath('//tr[./td[contains(text(),"Business Address")]]/td[2]/text()')[0:]
Companies['Headquarters'][i] = hq[0:]
except:
Companies['Headquarters'][i] = np.nan
Aucun commentaire:
Enregistrer un commentaire