jeudi 22 octobre 2015

Separating/Parsing a list for certain elements and putting them in separate lists. (Python)

from bs4 import BeautifulSoup #imports beautifulSoup package
import urllib2




url2 = 'http://ift.tt/1OK85nM'
page2 = urllib2.urlopen(url2) 
soup2 = BeautifulSoup(page2.read(), "lxml")

row2 = soup2.findAll('p')
row2 = row2[18:-4] 

names2 = []
firstName = []
lastName = []
for x in row2:
    currentString2 = x.findAll('strong')
    if len(currentString2) > 0:
        currentString2 = currentString2[0].text
        tokens = currentString2.split(' ')
        firstName.append(tokens[0])
        for token in tokens[1:]:
           check = tokens[1]            
           if check[1] != '.': and check[1] != '\\':
               lastName.append(tokens[1])

Hey guys, I'm trying to parse this list where I gather the first names and put those first names in their own list and then find the last names and put them in their own list. I'm also checking if they have an initial after the first name like "John B. Smith" and basically skipping over the "B." so that I only append the last name into the list. Any help please? I think what I have now is just not appending the last name if the person's name has a middle initial. :/




Aucun commentaire:

Enregistrer un commentaire