mercredi 12 septembre 2018

Table Scraping with same class attributes

I am trying to scrape the prayer time from a website which is www.hujjat.org.

Here is the html part of the area I am interested in (as you may have noticed the class attribute is the same for all the 4 prayers):

<table width="100%">
    <tbody>
        <tr>
            <td class="NamaazTimes">
                <div class="NamaazTimeName">Fajr</div>
                <div class="NamaazTime">04:42</div>
            </td>
            <td class="NamaazTimes">
                <div class="NamaazTimeName">Sunrise</div>
                <div class="NamaazTime">06:32</div>
            </td>
            <td class="NamaazTimes">
                <div class="NamaazTimeName">Zohr</div>
                <div class="NamaazTime">13:02</div>
            </td>
            <td class="NamaazTimes">
                <div class="NamaazTimeName">Maghrib</div>
                <div class="NamaazTime">19:33</div>
            </td>
        </tr>
    </tbody>
</table>

So far I have written the following code:

# import libraries
import json
import urllib2
from bs4 import BeautifulSoup
# specify the url
quote_page = 'http://www.hujjat.org/'
# query the website and return the html to the variable 'page'
page = urllib2.urlopen(quote_page)
# parse the html using beautiful soap and store in variable 'soup'
soup = BeautifulSoup(page, 'html.parser')

table = soup.find("div",class_="NamaazTimeName", text="Fajr").find_previous("table")
for row in table.find_all("tr"):
    a = row.find_all("td")

   # print(row.find_all("td"))

print (a)

And my result is :

[<td class="NamaazTimes">\n<div class="NamaazTimeName">Fajr</div>\n<div class="NamaazTime">04:42</div>\n</td>, <td class="NamaazTimes">\n<div class="NamaazTimeName">Sunrise</div>\n<div class="NamaazTime">06:32</div>\n</td>, <td class="NamaazTimes">\n<div class="NamaazTimeName">Zohr</div>\n<div class="NamaazTime">13:02</div>\n</td>, <td class="NamaazTimes">\n<div class="NamaazTimeName">Maghrib</div>\n<div class="NamaazTime">19:33</div>\n</td>]

What I want from my code is just the time for each of the prayer e.g. If it is "Fajr" prayer then the output should be "04:42". I then want to save this "04:42" in a text file.

I would then have to repeat the above 4 times.

Can someone help me please?

Thanks.




Aucun commentaire:

Enregistrer un commentaire