samedi 6 juillet 2019

Output content is not shows on terminal

I am trying to scrape medium posts and contents. Everything is fine the code runs too and opens up the browser directing to the specified URL. But on the output screen, it should have shown posts name, content, author name and other print stuff.

All the class names are also correct. then I thought, It may be because of never-ending dynamic content but I set the limit to variable output, still not showing output.

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.common.exceptions import TimeoutException

option = webdriver.ChromeOptions()


browser = webdriver.Chrome(executable_path=r"C:/Users/Jai 
          Sipani/Downloads/chrome_driver/chromedriver.exe", 
          chrome_options=option)

browser.get("https://medium.com/topic/startups")


        # Wait 60 seconds for page to load
timeout = 60
try:
    WebDriverWait(browser, 
    timeout).until(EC.visibility_of_element_located((By.XPATH, 
    "//img[@class='n dx dy dz ea ed y']")))
except TimeoutException:
    print("Timed out waiting for page to load")
    browser.quit()

find_elements_by_xpath returns an array of selenium objects.

titles_heading = browser.find_elements_by_class_name("ar aj da bc db bd 
                 em gb gc at aw eo dg dh av")

titles_heading = titles_heading[:10]
titles = [x.text for x in titles_heading]
print('titles:')
print(titles, '\n')


titles_desc = browser.find_element_by_class_name("bh bi bc b bd be bf bg 
              at aw dj dg dh av ef ep")

titles_desc = titles_desc[:10]
desc = [i.text for i in titles_desc]
print('desc:')
print(desc, '\n')


authors = browser.find_element_by_class_name("bc b bd be bf bg at aw dj 
          dg dh av ar aj") 


authors = authors[:10]
author = [x.text for x in authors]
print('author: ')
print(author, '\n')

timeline = browser.find_element_by_class_name("fg ae fh")
timeline = timeline[:10]
time = [x.text for x in timeline]
print('time: ')
print(time, '\n')



for title, desc, author, time in zip(titles, titles_desc, authors, 
timeline):
    print("Title : title_Desc : authors : timeline")
    print(title + ": " + desc + ": "+ author + ": " + time, '\n')

I expected the output to be a list of printed posts and content but didn't get. The script works perfectly with session closing in 60 sec.




Aucun commentaire:

Enregistrer un commentaire