Im attempting to scrape a website and grab a javascript generated table and convert it to format for ingesting with logstash.
so far i have come up with this..
import requests
from bs4 import BeautifulSoup
url='http://arizonascaleracers.liverc.com/results/?p=view_race_result&id=2227665'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
data = soup.find_all('table')
for div in data:
if not div.has_attr('cellpadding'):
row = ''
rows = div.find_all('tr')
for row in rows:
if(row.text.find('td') != False):
print(row.text)
Here is what it generates, data like this
<tr>
<td>7</td>
<td>
<span class="car_num">7</span>
<span class="driver_name">CRAIG NELSON</span>
<br/><span class="text-nowrap"><small><small><a class="driver_laps" data-driver-id="166965" href="#"><span class="fa fa-eye"></span> View Laps</a></small></small></span>
</td>
<td>7</td>
<td>21/5:08.304</td>
<td>1 Lap</td>
<td>13.713</td>
<td><div class="hidden">14.824</div>14.824</td>
<td><div class="hidden">13.957</div>13.957</td>
<td><div class="hidden">14.122</div>14.122</td>
<td><div class="hidden">14.371</div>14.371</td>
<td><div class="hidden">41.978</div>41.978</td>
<td>1.03</td>
<td>93.08%</td>
</tr>
I have been able to generate table data but im lost at this point trying to convert it in to something like this that i can ingest with logstash.
7 CRAIG NELSON 7 22/5:03.815 0.948 12.919 13.927 13.139 13.261 13.403 39.839 1.12 91.98%
or something similar..
I feel like this shouldnt be hard but i've been working on getting this going for days now and im just frustrated with it.. id also like to dump this data to a csv file. any ideas?
eventually i will need to actually start one page one level up and crawl the submpages to get each sub page worth of data but I figured id start with something easy.... and it hasnt been as easy as i thought.
Thanks for the help!
Aucun commentaire:
Enregistrer un commentaire