I have s specific script (see below) and I am trying to search for articles for specific keywords on the subject of coffee. However, when I run my script it returns the most current articles and not articles based on coffee.
I would also like to include the date of the article as a search criteria. I want to eventually determine if there is an increase in articles written on the subject.
I have tried to modify the script several times but it doesn't seem like it adheres to the search criteria (i.e. intext:coffee) and I do not know why. I try the same search in Google and I obtain the following article as the first hit (note: I have not been able to get the date criteria working yet either)
https://www.cnn.com/travel/article/worlds-most-expensive-cup-of-coffee/index.html
url_root = 'https://www.google.com/search';
search_str = '?q=html&sitesearch=cnn.com&intext:coffee';
url_str = strcat(url_root, search_str);
html_str = '(?<=url\?q\=)[^&]+(?=\.html)';
s = urlread(url_str);
m = strcat(regexp(s, html_str, 'match'), '.html');
t = urlread(m{1})
The expected result should provide me the text of the first CNN coffee article, namely something like:
"Coffee aficionados are willing to shell out big bucks for the perfect cup of coffee. But Californians may be taking it to the next level. A single cup of coffee just sold for $75 at Klatch Coffee's new San Francisco location, which hosted a tasting over the weekend..."
Aucun commentaire:
Enregistrer un commentaire