web: Remove extra table from web scraping results in python

vendredi 26 mai 2017

Remove extra table from web scraping results in python

My code produces extra tables that i'd like to remove. I want to remove all other tables except for this one.

My Code

import csv 
from bs4 import BeautifulSoup
import requests
import pandas as pd 
import telnetlib as tn
import os 
#import sys 
cwd = os.getcwd()
print (os.getcwd)
cwd = os.getcwd()
os.chdir('c:\\Users\STaiwo\Desktop\My R code')
page = requests.get("http://ift.tt/2r5Ukd4
miles/airlines/partner/180/china-eastern.html", verify = False)
print(page.content) ### Collects HTML content of site 
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify()) ## Cleans up the content of the site 
for table in soup.findAll('tbody'):
print('Table')
list_of_rows = []
for row in table.findAll('tr')[1:]:
    list_of_cells = []
    for cell in row.findAll('td'):
        text = ((cell.text.replace('&nbsp;', '')))
        list_of_cells.append(text)
    list_of_rows.append(list_of_cells)
print(list_of_rows)

The Result I'm currently getting: Table [['First Class', 'F, U', '150%'], ['P', '125%'], ['Business Class', 'J, C, D, I', '125%'], ['Premium Economy Class', 'W', '110%'], ['Economy Class', 'Y, B', '100%'], ['E, H, M', '75%'], ['L, N, R, S, V, K', '50%'], ['T', '30%'], ['Not eligible for accrual', 'Z, Q, G', '0%']] Table [] Table [] Table [['Distance in miles: 6,482', 'Total'], ['Booking sub-class: 125%', '8,103'], ['8,103']] Table [['Distance in miles: 6,482', 'Total'], ['Booking sub-class: 125%', 'Elite bonus: 75%', '12,965'], ['8,103', '4,862']] Table [['Distance in miles: 6,482', 'Total'], ['Booking sub-class: 50%', '3,241'], ['3,241']] Table [['Distance in miles: 6,482', 'Total'], ['Booking sub-class: 50%', 'Elite bonus: N/A', '3,241'], ['3,241', '0']]

The Result I want: Table [['First Class', 'F, U', '150%'], ['P', '125%'], ['Business Class', 'J, C, D, I', '125%'], ['Premium Economy Class', 'W', '110%'], ['Economy Class', 'Y, B', '100%'], ['E, H, M', '75%'], ['L, N, R, S, V, K', '50%'], ['T', '30%'], ['Not eligible for accrual', 'Z, Q, G', '0%']]

web

vendredi 26 mai 2017

Remove extra table from web scraping results in python

Aucun commentaire:

Enregistrer un commentaire