dimanche 19 février 2017

Using WGET How to efficiently download PDFs from specific URL pattern

I want to download PDFs from certain locations of a website. There is a main page which has links to sub-pages along with thousands of other links. All PDF links to be downloaded are on sub-pages.

The website is very huge with multilevel links and thousands of links at each level.

I want to optimise downloading using WGET so that

  1. only two levels are considered - main-page & sub-page.
  2. Only specific type of links are picked on main-page.
  3. Folders are named based on the link name on main-page

URL pattern for main page and sub page given below.

Main Page ->

  • Page 1 (PDF Link 1 + PDF Link 2 + lots of other links)
  • Page 2 (PDF Link 1 + PDF Link 2 + lots of other links)
  • ....... so on

URL Patterns

  • Main Page (https:// foo.com / mainpage)
  • Sub Page(https:// http://ift.tt/2lxY7xg)
  • PDF (https:// http://ift.tt/2mbaHzl, https:// http://ift.tt/2lxYlEI)

Thanks




Aucun commentaire:

Enregistrer un commentaire