web: Python - HTML Web Scraping

samedi 7 mars 2020

Python - HTML Web Scraping - BeautifulSoup

I am trying to get bold/italic string from below html code:

<div class="cmp-Review-author">
<span class="cmp-ReviewAuthor" itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<meta itemprop="name" content="***TIER I ASSOCIATE PICKER/ PPQA ASSOCIATE***">
<a class="cmp-ReviewAuthor-link" rel="nofollow" href="/cmp/Amazon.com/reviews?fjobtitle=Order+Picker">TIER I ASSOCIATE PICKER/ PPQA ASSOCIATE</a> 
<!-- -->(***Former Employee***)<!-- --> - 
<a class="cmp-ReviewAuthor-link" rel="nofollow" href="/cmp/Amazon.com/reviews?fcountry=US&amp;floc=Edgerton%2C+KS">***Edgerton, KS***</a>
 - <!-- -->***March 5, 2020***</span></div>

and save them in separate columns in a dataframe.

I am able to get a string as "TIER I ASSOCIATE PICKER/ PPQA ASSOCIATE (Former Employee) - Edgerton, KS - March 5, 2020" but not sure how to split it as position,employee_status, location, date since this structure is different for each position line: (see example)

1- Customer Service Associate (Former Employee) - Missouri City, TX - December 19, 2019 2- Picker, Processor, Gatekeeper, Ambassador, Problem Solver, Tier 3 Trainer (Former Employee) - Hebron, KY - March 6, 2020 3- Kurier (Current Employee) - Manheim - March 6, 2020 4- Picker/Packer (Former Employee) - North Randall, OH - March 5, 2020 5- Tdr (Current Employee) - 5300 holibird Avenue - March 5, 2020 any idea?

web

samedi 7 mars 2020

Python - HTML Web Scraping - BeautifulSoup

Aucun commentaire:

Enregistrer un commentaire