jeudi 4 juillet 2019

I get different result everytime I crawl some pages

I am using a python library called urlwatch https://github.com/thp/urlwatch It basically send a requests to some websites that I added to watch, and diff the previous result and the current result.

But everytime I run command, the result always comes and goes. (It says that previous result is not visible but the next time it says that the element is now visible.)

I've tried using filter which is html2text but didn't work. I strongly doubt that the website element is visible and not for some reason every time I run the command.

My setting is like below

filter: css:tr:nth-child(1) a
kind: url
url: http://admission.jnu.ac.kr/user/indexSub.action?codyMenuSeq=18098&siteId=admission_new&menuUIType=sub

When I run command several times, I can get the result back from either of below two, each take its turn.

--- @    Thu, 04 Jul 2019 16:24:32 +0900
+++ @    Thu, 04 Jul 2019 16:30:51 +0900
@@ -1,3 +0,0 @@
-<a href="boardList.action?command=view&page=1&boardId=2272&boardSeq=669613">
-                            2020학년도 약학대학 입학전형 선수과목 사전 확인 안내
-                            </a>

or

--- @    Thu, 04 Jul 2019 16:16:23 +0900
+++ @    Thu, 04 Jul 2019 16:24:33 +0900
@@ -0,0 +1,3 @@
+<a href="boardList.action?command=view&page=1&boardId=2272&boardSeq=669613">
+                            2020학년도 약학대학 입학전형 선수과목 사전 확인 안내
+                            </a>




Aucun commentaire:

Enregistrer un commentaire