stackoverfollowers!
There is a task that i am trying to resolve now. here it is:
'' Write a function samewords(u1, u2, enc, k) that:
- Take 2 urls and enc = ‘utf8’:
u1 = 'http://ift.tt/2fiYYz5'
u2 = 'http://ift.tt/2ePQugb'- On this web pages u1 and u2 find words of the length k that occur on both pages
- Count how many times that words occur on each page
- Return a list that contain groups of 3 parameters: word (found in paragraph 2), occur1 (how many times a word occurs on the page u1), occur2 (how many times a word occurs on the page u2)
- A returned list should be in decreasing ordered in accordance with total number of occurs on the both pages ''
So returned list should look like this if k=10 (the length of serching words):
[(u'fondamenti', 4, 4), (u'istruzioni', 4, 3), (u'operazioni', 2, 3), (u'stylesheet', 2, 2), (u'permettono', 2, 1), (u'googlecode', 1, 1), (u'inlinemath', 1, 1), (u'javascript', 1, 1), (u'parentnode', 1, 1), (u'tantissime', 1, 1)]
using this code to delete all notalphabetic characters
def mywords(s): # delet nonalphabetic characters
for c in '''!?/-,():;--'.\_[]"{}''':
s = s.replace(c, ' ')
return s.split() # return a list of all words from page with my url
import urllib.request as ul
def myurl(u, enc): #open my url
p = ul.urlopen(u)
t = p.read()
p.close()
return mywords(t.lower())
And then i meet difficultes with points 3-5 and stuck (mainly because if something doesn't go i check the code online with pythontutor.com but in this case i can't do that because it doesn't support urllib library)
Thank you!!!
Aucun commentaire:
Enregistrer un commentaire