mercredi 25 août 2021

Scraping html list data from a dynamic server

Hallo guys!

Sorry for the dump question, this is my last resort. I swear i triend countless of other Stackoverflow questions, different Frameworks, etc., but those didnt seem to help.

Ich have the following Problem: A website displays a list of data (there is a TON of div, li, span etc. tags infront, its a big HTML.)

Im writing a tool that fetches data from a specific list inside a ton of other div tags, downloads it and outputs an excel file.

The website im trying to access, is dynamic. So you open the website, it loads a little bit, and then the list appears (probably some JS and stuff). When i try to download the website via a webRequest in C#, the html I get ist almost empty with a ton on white spaces, lots of non-html stuff, some garbage data as well.

Now: Im pretty used to C#, HTMLAgillityPack, and countless other libraries, not so much in web related stuff tho. I tried CefSharp, Chromium etc. all of those stuff, but couldnt get them to work properly unfortunately.

I want to have a HTML in my program to work with that looks exactly like the HTML that you see when you open the dev console in chrome wenn visting the website mentined above. The HTML parser works flwalessly there.

This is how I image how the code could look like simplified.

Extreme C# pseudocode:

WebBrowserEngine web = new WebBrowserEngine()
web.LoadURLuntilFinished(url); // with all the JS executed and stuff
String html = web.getHTML();
web.close();

My Goal would be that the string html in the pseudocode looks exactly like the one in the Chrome dev tab. Maybe there is a solution posted somewhere else but i swear i coudlnt find it, been looking for days.

Andy help is greatly appreciated.




Aucun commentaire:

Enregistrer un commentaire