lundi 25 mai 2015

HTMLUNIT Web Scraping

I am building a price comparison site and collecting e-commerce data. Mostly it all works but I have a lot of issues with Javscript sites. Here is my code for handling JS sites:

    throws FailingHttpStatusCodeException, MalformedURLException,
        IOException {
        final WebClient webClient = new WebClient(BrowserVersion.CHROME);
   //   webClient.getOptions().setJavaScriptEnabled(true);
//      webClient.getOptions().setThrowExceptionOnScriptError(true);
  //    webClient.getOptions().setCssEnabled(true);
    //  webClient.setAjaxController(new NicelyResynchronizingAjaxController());
        final HtmlPage page = webClient.getPage(url);
  //    final String pageAsText = page.asText();
        final String pageAsText = page.asXml();
        System.out.println("url="+ url); 
        System.out.println("page="+ pageAsText);
        System.out.println("page="+ page.asText());

I have a target page here -

http://ift.tt/1eqnABA

I also have these imports:

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.ArrayList;
import java.util.List;

import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.DomAttr;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlDivision;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

The price is wrong in the code after visiting the website, it is related somehow to the dynamic generation of the price.

Has anyone any ideas?




Aucun commentaire:

Enregistrer un commentaire