mercredi 23 octobre 2019

Scrape JSON grid data using Python3

first time posting after much research. I'm very very new to Python3 and webscraping in general but have had some successes.

I am scraping an ATS (BrassRing). I can get logged-in without an issue using Selenium and get the webpage I'm interested in scraping. As I was searching for the table I discovered that the data I want is stored in a jsonGrid.

Everything I can find about Selenium and scraping does not cover how to scrape the contents of a JSON grid.

There are 8 columns in this grid/table that have all of the date beneath them (some cells are empty and that's ok)

As far as I can tell the columns are labeled as follows in the JSON itself though the website displays them slightly differently:

Action Type Action Date Action By Details Name emailfrom emailto folderid

Here is the first part of the website code that shows the headers and some of the column values.

It would be great if you could provide some information on how I can just scrape the JSON grid/JSON data from the site. I really appreciate any assistance!

<input type="hidden" name="Grid$jsonData183" id="Grid_jsonData183" class="jsonGridData" value="[{&quot;ActionType&quot;: &quot;Communication - Email&quot;,&quot;ActionDate&quot;: &quot;18-Oct-2019 14:14:25&quot;,&quot;ActionBy&quot;: &quot;Manager, Automation ()&quot;,&quot;Details&quot;: &quot;Status: Sent as to&quot;,&quot;Name&quot;: &quot;&lt;a href=\&quot;#\&quot;/ class=\&quot;ViewCommunication\&quot;&gt;Not Selected&lt;/a&gt;&quot;,&quot;emailfrom&quot;: &quot;Manager, Automation ()&quot;,&quot;emailto&quot;: &quot;Smith, John(john.smith@notreal.com)&quot;,&quot;hm_category&quot;: &quot;5&quot;,&quot;hm_Folderid&quot;: &quot;6537489&quot;,&quot;hm_ReqId&quot;: &quot;-1&quot;,&quot;hm_content&quot;: &quot;1&quot;,&quot;hm_md_communication_type&quot;: &quot;Communication - Email&quot;,&quot;hm_md_communication_correspondenceid&quot;: &quot;1&quot;,&quot;hm_md_communication_correspondenceresumeid&quot;: &quot;46878397&quot;,&quot;hm_pushportal&quot;: &quot;0&quot;,&quot;hm_unpostportal&quot;: &quot;0&quot;,&quot;hm_postportall&quot;: &quot;0&quot;,&quot;hm_PortalExpired&quot;: &quot;0&quot;,&quot;hm_md_communication_agencycodetypeid&quot;: &quot;0&quot;,&quot;hm_md_communication_agencycodeid&quot;: &quot;0&quot;,&quot;hm_md_communication_userid&quot;: &quot;41&quot;,&quot;hm_md_RecipientType&quot;: &quot;4&quot;,&quot;hm_EmailLogId&quot;: &quot;0&quot;,&quot;hm_md_ReceiverUserID&quot;: &quot;0&quot;,&quot;hm_md_fid&quot;: &quot;6537489&quot;,&quot;hm_md_rid&quot;: &quot;6454343&quot;,&quot;hm_md_rftid&quot;: &quot;17&quot;,&quot;hm_md_rsts&quot;: &quot;0&quot;,&quot;hm_md_myfolder&quot;: &quot;0&quot;,&quot;foldername&quot;: &quot;&lt;a href=&#39;#&#39; class=&#39;ViewFolder&#39;&gt;1738995BR:Customer Service Associate II&lt;/a&gt;&quot;,&quot;hm_md_afl&quot;: &quot;0&quot;,&quot;hm_md_rfl&quot;: &quot;1&quot;,&quot;hm_md_rlg&quot;: &quot;en&quot;, &quot;rowmetadata&quot;: &quot;&lt;div&gt;&lt;div name=\&quot;category\&quot; value=\&quot;5\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;folderid\&quot; value=\&quot;6537489\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;reqid\&quot; value=\&quot;-1\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;content\&quot; value=\&quot;1\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_communication_type\&quot; value=\&quot;Communication+-+Email\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_communication_correspondenceid\&quot; value=\&quot;1\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_communication_correspondenceresumeid\&quot; value=\&quot;46878397\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;pushportal\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;unpostportal\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;postportall\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;portalexpired\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_communication_agencycodetypeid\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_communication_agencycodeid\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_communication_userid\&quot; value=\&quot;41\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_recipienttype\&quot; value=\&quot;4\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;emaillogid\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_receiveruserid\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_fid\&quot; value=\&quot;6537489\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_rid\&quot; value=\&quot;6454343\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_rftid\&quot; value=\&quot;17\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_rsts\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_myfolder\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_afl\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_rfl\&quot; value=\&quot;1\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_rlg\&quot; value=\&quot;en\&quot;&gt;&lt;/div&gt;&lt;/div&gt;&quot;},{&quot;ActionType&quot;: &quot;Communication - Email&quot;,&quot;ActionDate&quot;: &quot;18-Oct-2019 13:24:13&quot;,&quot;ActionBy&quot;: &quot;Manager, Automation ()&quot;,&quot;Details&quot;: &quot;Status: Sent as to&quot;,&quot;Name&quot;: &quot;&lt;a href=\&quot;#\&quot;/ class=\&quot;ViewCommunication\&quot;&gt;Not Selected&lt;/a&gt;&quot;,&quot;emailfrom&quot;: &quot;Manager, Automation ()&quot;,&quot;emailto&quot;: &quot;Smith, John(john.smith@notreal.com)&quot;,&quot;hm_category&quot;: &quot;5&quot;,&quot;hm_Folderid&quot;: &quot;6513663&quot;,&quot;hm_ReqId&quot;: &quot;-1&quot;,&quot;hm_content&quot;: &quot;1&quot;,&quot;hm_md_communication_type&quot;: &quot;Communication - Email&quot;,&quot;hm_md_communication_correspondenceid&quot;:




Aucun commentaire:

Enregistrer un commentaire