dimanche 30 octobre 2016

Save body text on csv file | Python 3

I am trying to create a database with several articles for Text mining purposes. I am extracting the body via web scraping and then save the body of these articles on a csv file. However, I couldn't manage to save all the body texts. The code that I came up with saves only the text the last URL (article) while if I print what I am scraping (and what I am supposed to save) I obtain the body of all the articles.

I just included some of the URL from the list (which contains a larger number of URLs) just to give you an idea:

import requests
from bs4 import BeautifulSoup
import csv

r=["http://ift.tt/2e2rbee",
"http://ift.tt/2f2NI78    attack.html",
"http://ift.tt/2dSBZrd",
"http://ift.tt/2bcB32V",
"http://ift.tt/2fkgZOA",
]

for url in r:
    t= requests.get(url)
    t.encoding = "ISO-8859-1"
    soup = BeautifulSoup(t.content, 'lxml')
    text = soup.find_all(("p",{"class": "story-body-text story-content"}))
    print(text)
with open('newdb30.csv', 'w', newline='') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ',quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(text)




Aucun commentaire:

Enregistrer un commentaire