samedi 24 mars 2018

BeautifulSoup4 not able to scrape data from this table

Sorry for this silly question as I'm new to web scraping and have no knowledge about HTML etc.

I'm trying to scrape data from this website. Specifically, from this part/table of the page: This is the screenhot of the data I want to get.

末"四"位数 9775,2275,4775,7275 末"五"位数 03881,23881,43881,63881,83881,16913,66913 末"六"位数 313110,563110,813110,063110 末"七"位数 4210962,9210962,9785582 末"八"位数 63262036 末"九"位数 080876872

I'm sorry that's in Chinese and it looks terrible since I can't embed the picture. However, The table is roughly in the middle(40 percentile from the top) of the page. The table id is 'tr_zqh'.

Here is my source code:

import bs4 as bs
import urllib.request

def scrapezqh(url):
    source = urllib.request.urlopen(url).read()
    page = bs.BeautifulSoup(source, 'html.parser')
    print(page)

url = 'http://data.eastmoney.com/xg/xg/detail/300741.html?tr_zqh=1'
print(scrapezqh(url))

It scrapes most of the table but the part that I'm interested in. Here is what it returns:

<td class="tdcolor">网下有效申购股数(万股)
            </td>
<td class="tdwidth" id="td_wxyxsggs"> 
            </td>
</tr>
<tr id="tr_zqh">
<td class="tdtitle" id="td_zqhrowspan">中签号
            </td>
<td class="tdcolor">中签号公布日期
            </td>
<td class="ltxt" colspan="3"> 2018-02-22 (周四)
            </td>

I'd like to get the content of this table: tr id="tr_zqh" (the 6th row above). However for some reason it doesn't scrape its data. I don't think it is a dynamic table. I've tried both lxml and html parser and I've tried pandas.read_html. It returned the same results. I'd like to get some help to understand why it doesn't get the data and how I can fix it. Many thanks!




Aucun commentaire:

Enregistrer un commentaire