vendredi 26 mai 2017

Remove extra table from web scraping results in python

My code produces extra tables that i'd like to remove. I want to remove all other tables except for this one.

My Code

import csv 
from bs4 import BeautifulSoup
import requests
import pandas as pd 
import telnetlib as tn
import os 
#import sys 
cwd = os.getcwd()
print (os.getcwd)
cwd = os.getcwd()
os.chdir('c:\\Users\STaiwo\Desktop\My R code')
page = requests.get("http://ift.tt/2r5Ukd4
miles/airlines/partner/180/china-eastern.html", verify = False)
print(page.content) ### Collects HTML content of site 
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify()) ## Cleans up the content of the site 
for table in soup.findAll('tbody'):
print('Table')
list_of_rows = []
for row in table.findAll('tr')[1:]:
    list_of_cells = []
    for cell in row.findAll('td'):
        text = ((cell.text.replace(' ', '')))
        list_of_cells.append(text)
    list_of_rows.append(list_of_cells)
print(list_of_rows)

The Result I'm currently getting: Table [['First Class', 'F, U', '150%'], ['P', '125%'], ['Business Class', 'J, C, D, I', '125%'], ['Premium Economy Class', 'W', '110%'], ['Economy Class', 'Y, B', '100%'], ['E, H, M', '75%'], ['L, N, R, S, V, K', '50%'], ['T', '30%'], ['Not eligible for accrual', 'Z, Q, G', '0%']] Table [] Table [] Table [['Distance in miles: 6,482', 'Total'], ['Booking sub-class: 125%', '8,103'], ['8,103']] Table [['Distance in miles: 6,482', 'Total'], ['Booking sub-class: 125%', 'Elite bonus: 75%', '12,965'], ['8,103', '4,862']] Table [['Distance in miles: 6,482', 'Total'], ['Booking sub-class: 50%', '3,241'], ['3,241']] Table [['Distance in miles: 6,482', 'Total'], ['Booking sub-class: 50%', 'Elite bonus: N/A', '3,241'], ['3,241', '0']]

The Result I want: Table [['First Class', 'F, U', '150%'], ['P', '125%'], ['Business Class', 'J, C, D, I', '125%'], ['Premium Economy Class', 'W', '110%'], ['Economy Class', 'Y, B', '100%'], ['E, H, M', '75%'], ['L, N, R, S, V, K', '50%'], ['T', '30%'], ['Not eligible for accrual', 'Z, Q, G', '0%']]




Aucun commentaire:

Enregistrer un commentaire