My code produces extra tables that i'd like to remove. I want to remove all other tables except for this one.
My Code
import csv
from bs4 import BeautifulSoup
import requests
import pandas as pd
import telnetlib as tn
import os
#import sys
cwd = os.getcwd()
print (os.getcwd)
cwd = os.getcwd()
os.chdir('c:\\Users\STaiwo\Desktop\My R code')
page = requests.get("http://ift.tt/2r5Ukd4
miles/airlines/partner/180/china-eastern.html", verify = False)
print(page.content) ### Collects HTML content of site
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify()) ## Cleans up the content of the site
for table in soup.findAll('tbody'):
print('Table')
list_of_rows = []
for row in table.findAll('tr')[1:]:
list_of_cells = []
for cell in row.findAll('td'):
text = ((cell.text.replace(' ', '')))
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
print(list_of_rows)
The Result I'm currently getting: Table [['First Class', 'F, U', '150%'], ['P', '125%'], ['Business Class', 'J, C, D, I', '125%'], ['Premium Economy Class', 'W', '110%'], ['Economy Class', 'Y, B', '100%'], ['E, H, M', '75%'], ['L, N, R, S, V, K', '50%'], ['T', '30%'], ['Not eligible for accrual', 'Z, Q, G', '0%']] Table [] Table [] Table [['Distance in miles: 6,482', 'Total'], ['Booking sub-class: 125%', '8,103'], ['8,103']] Table [['Distance in miles: 6,482', 'Total'], ['Booking sub-class: 125%', 'Elite bonus: 75%', '12,965'], ['8,103', '4,862']] Table [['Distance in miles: 6,482', 'Total'], ['Booking sub-class: 50%', '3,241'], ['3,241']] Table [['Distance in miles: 6,482', 'Total'], ['Booking sub-class: 50%', 'Elite bonus: N/A', '3,241'], ['3,241', '0']]
The Result I want: Table [['First Class', 'F, U', '150%'], ['P', '125%'], ['Business Class', 'J, C, D, I', '125%'], ['Premium Economy Class', 'W', '110%'], ['Economy Class', 'Y, B', '100%'], ['E, H, M', '75%'], ['L, N, R, S, V, K', '50%'], ['T', '30%'], ['Not eligible for accrual', 'Z, Q, G', '0%']]
Aucun commentaire:
Enregistrer un commentaire