I'd like to scrape google search result url with python.
Here's my code
import requests
from bs4 import BeautifulSoup
def search(keyword):
html = requests.get('https://www.google.co.kr/search?q={}&num=100&sourceid=chrome&ie=UTF-8'.format(keyword)).text
soup = BeautifulSoup(html, 'html.parser')
result = []
for i in soup.find_all('h3', {'class':'r'}):
result.append(i.find('a', href = True) ['href'][7:])
return result
search('computer')
Then I can get result. First url of the list is wikipedia.com which is,
'https://en.wikipedia.org/wiki/Computer&sa=U&ved=0ahUKEwixyfu7q5HdAhWR3lQKHUfoDcsQFggTMAA&usg=AOvVaw2nvT-2sO4iJenW_fkyCS3i', '?q=computer&num=100&ie=UTF-8&prmd=ivnsbp&tbm=isch&tbo=u&source=univ&sa=X&ved=0ahUKEwixyfu7q5HdAhWR3lQKHUfoDcsQsAQIHg'
I want to get clean url, which is 'https://en.wikipedia.org/wiki/Computer' including all the other search result in this case.
How can I modify my codes?
Aucun commentaire:
Enregistrer un commentaire