samedi 23 mai 2015

Extract data from a webpage

I have about 10000 html downloaded files. They have a section of html code like this:

<tr>
   <td width="10%" valign="top"><p>City:</p></td>
   <td colspan="2"><p>
        London
   </p></td>
</tr>

What I need is a way of getting the cities from all the files. I'm using linux so I was thinking in using some batch file to do it with sed but sed doesn't work well with these files because of some encoding issues (some cities have accents like Jérica and it wouldn't find their names). Whats the proper way of doing it?




Aucun commentaire:

Enregistrer un commentaire