mardi 24 novembre 2015

Python Selenium webscraping mutiple pages(increment page number in url)

I have little to none programming experience and I just started learning Python two weeks ago it was a pain running under windows (e.g. environment variable etc something I didn't really know what it is until two weeks ago).

I am using seleniumto try to web scrap information. basically the url(mainly jscript) pages changes incrementally: e.g.

http://ift.tt/1kRlYUY
http://ift.tt/1ShlX78
http://ift.tt/1kRlYV2
http://ift.tt/1ShlX7a
http://ift.tt/1kRlYV4
http://ift.tt/1ShlX7c
http://ift.tt/1kRlYV6
http://ift.tt/1ShlX7e

I want to pragmatically and systemically webscrap specific content(find by xpath) under each page and its subpage. However, I don't know how loop works (e.g. for i to xxx) in this case because every url has to be "get" by browser web driver.

There are method for scraping content for url that is fixed. But in my case the url does changes so I assume it can be done different.

Please enlighten me With thanks, Iverson




Aucun commentaire:

Enregistrer un commentaire