jeudi 28 septembre 2017

How to extract the category/section field in news articles

Is there a standard section field to look for while scraping news articles?

Explanation: I would like to get extract the title of an article (news) and its associated category or section.

Example:

Article:
http://ift.tt/2xzYLAi

Title of the article:
Environmentalists: UK's Antarctic islands need protection

Section or Category: 
Science and Environment 

There are various categories such as politics, lifestyle, tech, sports, etc. I checked the BBC and the guardian. They have different fields to specific these sections.

I expect that it might be different for various news websites. However, could it be that these different fields are already known so I can look for them while scraping?

Ideally, is there already a library which provides such as a category extraction (in Python)? I am going to write one myself so if one already exists then I do not want to reinvent the wheel.




Aucun commentaire:

Enregistrer un commentaire