mercredi 19 juin 2019

How to fix Newspaper3k 403 Client Error for certain URL's?

I am trying to get a list of articles using a combo of the googlesearch and newspaper3k python packages. When using article.parse, I end up getting an error: newspaper.article.ArticleException: Article download() failed with 403 Client Error: Forbidden for url: https://www.newsweek.com/donald-trump-hillary-clinton-2020-rally-orlando-1444697 on URL https://www.newsweek.com/donald-trump-hillary-clinton-2020-rally-orlando-1444697

I have tried running as admin when executing script and the link works when opening straight in a browser.

Here is my code:

import googlesearch
from newspaper import Article

query = "trump"
urlList = []

for j in googlesearch.search_news(query, tld="com", num=500, stop=200, pause=.01):
    urlList.append(j)

print(urlList)

articleList = []

for i in urlList:
    article = Article(i)
    article.download()
    article.html
    article.parse()
    articleList.append(article.text)
    print(article.text)

Here is my full error output:

Traceback (most recent call last):
  File "C:/Users/andre/PycharmProjects/StockBot/WebCrawlerTest.py", line 31, in <module>
    article.parse()
  File "C:\Users\andre\AppData\Local\Programs\Python\Python37\lib\site-packages\newspaper\article.py", line 191, in parse
    self.throw_if_not_downloaded_verbose()
  File "C:\Users\andre\AppData\Local\Programs\Python\Python37\lib\site-packages\newspaper\article.py", line 532, in throw_if_not_downloaded_verbose
    (self.download_exception_msg, self.url))
newspaper.article.ArticleException: Article `download()` failed with 403 Client Error: Forbidden for url: https://www.newsweek.com/donald-trump-hillary-clinton-2020-rally-orlando-1444697 on URL https://www.newsweek.com/donald-trump-hillary-clinton-2020-rally-orlando-1444697

I expected it to just output the text of the article. Any help you can give would be great. Thanks!




2 commentaires:

  1. Ce commentaire a été supprimé par l'auteur.

    RépondreSupprimer
  2. https://stackoverflow.com/questions/56678732/how-to-fix-newspaper3k-403-client-error-for-certain-urls/56778947#56778947

    RépondreSupprimer