samedi 7 janvier 2017

nokogiri scrape all children divs from a selected div

I was playing around with Nokogiri in my free time, and I am afraid I got really stuck.I am trying to solve this problem since this morning (almost 8h now :( ) and it looks that I didn't progress at all. On the website I want to scrape all the threads on the page.So far I realize that parent for all threads is

<div id="threads" class="extended-small">

each thread consist of 3 elements:

  1. link to the image
  2. div#title that contains value of replies(R) and images(I)
  3. div#teaser that contains the name of the thread

My question is how can I select the children of the id='threads' and push each child with 3 elements to the array ? As you can see in this code I don't really know what I am doing and I would very , very much appreciate

require 'httparty'
require 'nokogiri'
require 'json'
require 'pry'
require 'csv'

page = HTTParty.get('http://ift.tt/2iPKF42')

parse_page = Nokogiri::HTML(page)

threads_array = []

threads = parse_page.search('.//*[@id="threads"]/div') do |a|
    post_id = a.text
    post_pic = a.text
    post_title = a.text
    post_teaser = a.text
threads_array.push(post_id,post_pic,post_title,post_teaser)
end

CSV.open('sample.csv','w') do |csv|
    csv << threads_array
end

Pry.start(binding)

page and codeSimilar Questions

Aucun commentaire:

Enregistrer un commentaire