As a practice, I was trying to web scrape the data from airbnb website.
What I was trying to scrape was room IDs (They called rooms as listings).
I read postings and watched many videos on how to scrape data by using beautifulsoup
and used this library to parse the listing IDs.
My strategy was to scrape all the listing IDs within a certain city (Let's say New York as an example). I searched homes in New York from the search box at the homepage to get all the available listings in NY. Then I right-clicked to open source codes of one listing. I was going to get its ID and will loop over all the listings within the same page. Then I was going to loop over pages to get all IDs that are located in NY.
But the problem was that I got stuck from the beginning. From the codes, if I print listing, then this should give me all the things under the div class _1mpo9ida
. In the same way, if I print the length, then I should expect to get 18 since each of this div class includes each listing information (listing id as well) and 18 listings show up on each page. What I've got was an empty list,[], and 0. I'm not sure where I made a mistake. Or I was guessing airbnb intentionally blocked people to scrape all the information. Anyway here's my code. I was going to build more after I see 18 by running len(listing) and just wanted to remind you that this code is incomplete. Thanks!
from urllib import urlopen
from bs4 import BeautifulSoup
html=urlopen("https://www.airbnb.com/s/Los-Angeles--CA--United-States/homes?refinement_paths%5B%5D=%2Fhomes&place_id=ChIJE9on3F3HwoAR9AhGJW_fL-I&adults=1&children=0&guests=1&query=Los%20Angeles%2C%20CA%2C%20United%20States&click_referer=t%3ASEE_ALL%7Csid%3Afeeacae7-fd05-4699-b9a0-4dac6237d486%7Cst%3ASELECT_TAB_HOMES_GROUPING&superhost=false&title_type=SELECT_TAB_OTHER_HOMES&allow_override%5B%5D=&s_tag=GOt7jkOs")
soup=BeautifulSoup(html,"html.parser")
listing=soup.findAll("div",{"class":"_1mpo9ida"})
print listing
print len(listing)
My final question is how to scrape all the room (listing)IDs within a certain city? Would be appreciated if you could help me out! Thanks.
Aucun commentaire:
Enregistrer un commentaire