Let me explain what I mean.
I have numerous bots which basically load a given URL regularly and then check if it has changed over a certain percentage, and notifies me if it has.
This mechanism is repeatedly broken due to all kinds of "nag screens" that often show up instead of the actual webpage, such as Cloudflare's "anti-DDoS" nonsense.
Naturally, this confuses my code (sends me "false positive" notifications) unless I account for such pages. But accounting for them is not straight-forward.
Websites never report the correct HTTP status codes anymore, so relying on those is simply not possible. (I've lost count on how many "page not found" webpages were reported as "200 OK", etc.)
It strikes me that this sounds like something which a lot of people would have encountered and "fixed" by now.
Is there such a project, which provides some kind of data file with a reliable list of phrases/regexps to look for to determine if a given HTML string corresponds to such a known "nag screen"? And which is actually maintained and reliable, of course.
If so, this would save me a lot of headaches.
Aucun commentaire:
Enregistrer un commentaire