mardi 29 octobre 2019

scraping web page's links info

I deal woth this website https://www.ntsb.gov/investigations/AccidentReports/Pages/railroad.aspx I want to get from each brief report (like "​Railroad Accident Brief: Dallas, Garland & Northeastern Railroad" and "​Railroad Accident Brief: Derailment of Metro-North Railroad Commuter Train Rye, New York Employee Fatality") Executive Summary and craete a dataset using scrapy. Im new at using scrapy, so maybe im on a wring path of solving a problem, but i tried this:

scrapy shell
fetch('https://www.ntsb.gov/investigations/AccidentReports/Pages/railroad.aspx')
response.css(".ms-vb reporttitle::text").extract()

I thought its gonna bring all info from subpages, but its empty. How could i fix it?




Aucun commentaire:

Enregistrer un commentaire