mardi 28 mai 2019

Python: How to scrape an updating website and store data for future analysis

I have a developed to scrape website below https://www.insidefutures.com/markets/data.php?page=quote&sym=NG&x=19&y=5

The data updates every 10 minutes and I would like to find a relationship between prices and volumes traded. However, I would need to download the data every 10 minutes and store it for future analysis.

On website update I would like my code to run and also download to database every 10 minutes for future analysis. How can I achieve this?

  from urllib.request import urlopen
  from bs4 import BeautifulSoup
  import pandas as pd
  import requests
  import numpy as np

  res = requests.get('https://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=ng&x=13&y=8&domain=if&display_ice=1&enabled_ice_exchanges=&tz=0&ed=0')

  soup = BeautifulSoup(res.text, 'lxml')
  soup.prettify()
  Header = soup.findAll('tr', limit=2)[1].findAll('th')

  column_headers = [th.getText() for th in soup.findAll('tr', limit=2) 
  [1].findAll('th')]


 data_rows = soup.findAll('tr')[2:]
 i = range(len(data_rows))
 # for cell in data_rows
 Contracts =[]
 Lasts =[]
 Changes =[]
 Opens = []
 Highs =[]
 Lows =[]
 Volumes=[]
 Previous_Settles=[]

 for td in data_rows:

Contract = td.findAll('td')[0].text
Contracts.append(Contract)

Last = td.findAll('td')[1].text
Lasts.append(Last)

Change = td.findAll('td')[2].text
Changes.append(Change)

Open = td.findAll('td')[3].text
Opens.append(Open)

High = td.findAll('td')[4].text
Highs.append(High)

Low = td.findAll('td')[5].text
Lows.append(Low)

Volume = td.findAll('td')[6].text
Volumes.append(Volume)

Previous_Settled = td.findAll('td')[7].text
Previous_Settles.append(Previous_Settled)

Date_Time = td.findAll('td')[8].text
df = pd.DataFrame({'Contracts' : Contracts, 'Last': Last, 'Change': Changes, 'Open':Opens, 'High': Highs, 'low': Lows,'Previous_Settled': Previous_Settles})
print(df)




Aucun commentaire:

Enregistrer un commentaire