samedi 2 novembre 2019

Can't seem to scrape tbody from this website

I'm trying to scrape data from this website: https://web.archive.org/web/20130725021041/http://www.usatoday.com/sports/nfl/injuries/


page = requests.get('https://web.archive.org/web/20130725021041/http://www.usatoday.com/sports/nfl/injuries/')
soup = BeautifulSoup(page.text, 'html.parser')
soup.find_all('tbody')

soup.find_all('tbody') returns []. I'm not entirely sure why.

This is the tbody part I'm trying to scrape out:

<tbody><tr class="page"><td>
                                    7/23/2013


                        </td><td>


                                    Anthony Spencer


                        </td><td>



                                        Cowboys



                        </td><td>


                                    DE


                        </td><td>


                                    Knee


                        </td><td>


                                    Knee


                        </td><td>


                                    Out


                        </td><td>


                                    Is questionable for 9/8 against the NY Giants


                        </td></tr><tr class="page"><td>


                                    7/22/2013


                        </td><td>


                                    Tyrone Crawford


                        </td><td>



                                        Cowboys



                        </td><td>


                                    DE


                        </td><td>


                                    Achilles-tendon


                        </td><td>


                                    Achilles


                        </td><td>


                                    Out


                        </td><td>


                                    Is expected to be placed on injured reserve


                        </td></tr><tr class="page"><td>


                                    7/16/2013


                        </td><td>


                                    Ryan Broyles


                        </td><td>



                                        Lions



                        </td><td>


                                    WR


                        </td><td>


                                    Knee


                        </td><td>


                                    Knee


                        </td><td>


                                    Questionable


                        </td><td>


                                    Is questionable for 9/8 against Minnesota


                        </td></tr><tr class="page"><td>


                                    7/2/2013


                        </td><td>


                                    Jahvid Best


                        </td><td>



                                        Lions



                        </td><td>


                                    RB


                        </td><td>


                                    Concussion


                        </td><td>


                                    Concussion


                        </td><td>


                                    Out


                        </td><td>


                                    Is out indefinitely


                        </td></tr><tr class="page"><td>


                                    7/2/2013


                        </td><td>


                                    Jerel Worthy


                        </td><td>



                                        Packers



                        </td><td>


                                    DE


                        </td><td>


                                    Knee


                        </td><td>


                                    Knee


                        </td><td>


                                    Out


                        </td><td>


                                    Is out indefinitely


                        </td></tr><tr class="page"><td>


                                    7/2/2013


                        </td><td>


                                    JC Tretter


                        </td><td>



                                        Packers



                        </td><td>


                                    TO


                        </td><td>


                                    Ankle


                        </td><td>


                                    Ankle


                        </td><td>


                                    Out


                        </td><td>


                                    Is out indefinitely


                        </td></tr><tr class="page"><td>



                        </td></tr></tbody>

Could someone help me out and let me know why the find_all on tbody returns an empty list? Even when i try tr with class page it returns an empty list.




Aucun commentaire:

Enregistrer un commentaire