mardi 11 mai 2021

How to get the html plaintext of full inspect table of Chrome or Firefox. New to scraping

This is my first post, hope not missing any rule here.

I want to make a script to get some info from many webs of a domain. The info always starts with a fixed string and, appended to it, there is another string (that is what I really want). I can find all this searching for the fixed string in the inspect tab of Chrome/Firefox and copying it outside (It seems not to be loaded at first, but it eventually shows up, it is inside an iframe I think). It would be amazing to download that Chrome/Firefox inspect table when fully formed, but I doubt is possible :(

However my objective is to get this info from a couple of hundreds of webs and I don't know how to do it automatically because I am new to scraping (I've got a txt file with all the URLs and tried some shell curls or copying some phantomjs code), tough I can program in C, shell or python with ease.

Btw i need to work with cookies but I think that shouldn't be a problem if the used tool has any way to add them (as phantomjs did). Sorry for the language, not English native here!

Could you guys tell me how could I approach this problem? Thanks.




Aucun commentaire:

Enregistrer un commentaire