dimanche 3 janvier 2016

issue scraping with python/requests

I am trying (half for educational purposes, half to monitor a fare for myself) to scrape price data for a specific flight on united airline's website.

I've done it successfully with selenium, but it's a pretty clunky implementation and in the process I noticed that there is an ajax call after an initial redirect that has a nice JSON response with everything i would want. I tried to hit the endpoint directly by passing the appropriate post parameters that I saw in the network tab of dev tools, but it wasn't working. I then noticed that there was a 'cart-id' field which looked dynamic while the other's looked static, so i lifted that from the pre-redirect form submission page and inserted it into the post, and this time I got a 'don't have permission' rather than a bum response.

I am not sure what data I am missing in the post at this point. I am also hitting the form submission page first in order to set cookies with a persistent session object, thinking that would help, but no dice. What am I missing? You can see the actual response I am looking for by navigating in your browser to the first URL below, watching the network tab, and the first xhr named 'rev' has the posted form data I am trying to mimic along with the JSON I am after.

with requests.session() as s:
    formsubmitpage = s.get('http://ift.tt/1O2Y1l2')
    doc = html.fromstring(formsubmitpage.text)
    cartid = doc.xpath('//a[@class="no-rtad"]/@data-cartid')[0]
    print(cartid)
    params = {"Revise":False,"UnaccompaniedMinorDisclamer":False,"ConfirmationID":None,"searchTypeMain":"roundTrip","Origin":"sfo","Destination":"tpe","DepartDate":"Jan 20, 2016","ReturnDate":"Jan 26, 2016","awardTravel":False,"MaxTrips":None,"numberOfTravelers":1,"numOfAdults":1,"numOfSeniors":0,"numOfChildren04":0,"numOfChildren03":0,"numOfChildren02":0,"numOfChildren01":0,"numOfInfants":0,"numOfLapInfants":0,"travelerCount":1,"revisedTravelerKeys":None,"revisedTravelers":None,"OriginalReservation":None,"RiskFreePolicy":None,"IsUnAccompaniedMinor":False,"MilitaryTravelType":None,"MilitaryOrGovernmentPersonnelStateCode":None,"tripLength":6,"IsParallelFareWheelCallEnabled":False,"flexMonth":None,"flexMonth2":None,"SortType":None,"cboMiles":None,"cboMiles2":None,"Trips":[{"DestinationAll":False,"returnARC":None,"connections":None,"nonStopOnly":True,"nonStop":True,"oneStop":False,"twoPlusStop":False,"ChangeType":0,"DepartDate":"Jan 20, 2016","ReturnDate":None,"PetIsTraveling":False,"PreferredTime":"","PreferredTimeReturn":None,"Destination":"TPE","Index":1,"Origin":"SFO","Selected":False,"FormatedDepartDate":"Wed, Jan 20, 2016","OriginCorrection":None,"DestinationCorrection":None,"OriginAll":False,"Flights":None},{"DestinationAll":False,"returnARC":None,"connections":None,"nonStopOnly":True,"nonStop":True,"oneStop":False,"twoPlusStop":False,"ChangeType":0,"DepartDate":"Jan 26, 2016","ReturnDate":None,"PetIsTraveling":False,"PreferredTime":"","PreferredTimeReturn":None,"Destination":"SFO","Index":2,"Origin":"TPE","Selected":False,"FormatedDepartDate":"Tue, Jan 26, 2016","OriginCorrection":None,"DestinationCorrection":None,"OriginAll":False,"Flights":None}],"nonStopOnly":1,"CalendarOnly":False,"InitialShop":True,"IsSearchInjection":False,"CartId":cartid,"CellIdSelected":None,"BBXSession":None,"SolutionSetId":None,"SimpleSearch":True,"RequeryForUpsell":False,"RequeryForPOSChange":False,"YBMAlternateService":False,"ShowClassOfServiceListPreference":False,"SelectableUpgradesOriginal":None,"RegionalPremierUpgradeBalance":0,"GlobalPremierUpgradeBalance":0,"RegionalPremierUpgrades":None,"GlobalPremierUpgrades":None,"FormattedAccountBalance":None,"GovType":None,"TripTypes":0,"flexible":False,"flexibleAward":False,"FlexibleDaysAfter":0,"FlexibleDaysBefore":0,"hiddenPreferredConn":None,"hiddenUnpreferredConn":None,"carrierPref":0,"chkFltOpt":0,"portOx":0,"travelwPet":0,"NumberOfPets":0,"cabinType":0,"cabinSelection":"ECONOMY","awardCabinType":0,"FareTypes":0,"FareWheelOnly":False,"EditSearch":False,"buyUpgrade":0,"offerCode":None,"TVAOfferCodeLastName":None,"ClassofService":None,"UpgradeType":None,"BillingAddressCountryCode":None,"BillingAddressCountryDescription":None,"IsPassPlusFlex":False,"IsPassPlusSecure":False,"IsOffer":False,"IsMeetingWorks":False,"IsValidPromotion":False,"CalendarDateChange":None,"CoolAwardSpecials":False,"LastResultId":None,"IncludeLmx":False,"NGRP":False,"calendarStops":0,"isReshopPath":False}
    redirect_endpoint = s.post('http://ift.tt/1R4wzZT',params=json.dumps(params))
    print(redirect_endpoint.text)#denied!




Aucun commentaire:

Enregistrer un commentaire