vendredi 5 novembre 2021

How do I web srape list of all the companies and their industry using python?

I am new to web scraping. I was trying out the following code to extract table information from Wikipedia. But I am only able to extract data from a URL which has a table in it. Can anyone help me with it? I need to extract company details for each state.

import numpy as np
import pandas as pd # library for data analysis
import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML documents
# get the response in the form of html
wikiurl="https://en.wikipedia.org/wiki/List_of_companies_of_the_United_States_by_state"
table_class="wikitable sortable jquery-tablesorter"
response=requests.get(wikiurl)
print(response.status_code)
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(response.text, 'html.parser')
indiatable = soup.find_all("table",{"class":"wikitable"})
df=pd.read_html(str(indiatable))
# convert list to dataframe
df=pd.DataFrame(df[0])
print(df)



Aucun commentaire:

Enregistrer un commentaire