I was playing around with Nokogiri in my free time, and I am afraid I got really stuck.I am trying to solve this problem since this morning (almost 8h now :( ) and it looks that I didn't progress at all. On the website I want to scrape all the threads on the page.So far I realize that parent for all threads is
<div id="threads" class="extended-small">
each thread consist of 3 elements:
- link to the image
- div#title that contains value of replies(R) and images(I)
- div#teaser that contains the name of the thread
My question is how can I select the children of the id='threads' and push each child with 3 elements to the array ? As you can see in this code I don't really know what I am doing and I would very , very much appreciate
require 'httparty'
require 'nokogiri'
require 'json'
require 'pry'
require 'csv'
page = HTTParty.get('http://ift.tt/2iPKF42')
parse_page = Nokogiri::HTML(page)
threads_array = []
threads = parse_page.search('.//*[@id="threads"]/div') do |a|
post_id = a.text
post_pic = a.text
post_title = a.text
post_teaser = a.text
threads_array.push(post_id,post_pic,post_title,post_teaser)
end
CSV.open('sample.csv','w') do |csv|
csv << threads_array
end
Pry.start(binding)
Aucun commentaire:
Enregistrer un commentaire