vendredi 1 janvier 2016

html agility pack in c# that scrapes urls [duplicate]

This question already has an answer here:

Hi I am trying to write a method that will scrape urls with the html agility pack in the nuget packages. I am new to this package and I am getting a null reference exception by the foreach loop below. Can someone help me out and perhaps show me what I am doing wrong and to properly write code with this package that can scrape a web page for the url links. Thanks:

edit: I am not asking what is a null reference exception, I am saying I don't exactly understand what I am doing that I am receiving this exception.

code:

public static List<string> ParseLinks(string html)
    {
        HashSet<string> list = new HashSet<string>();
        var doc = new HtmlDocument();
        doc.LoadHtml(html);
        var nodes = doc.DocumentNode.SelectNodes("//a[@href]");
        foreach (var n in nodes)
        {
            string href = n.Attributes["href"].Value;
            list.Add(href);
        }
        return list.ToList();
     }




Aucun commentaire:

Enregistrer un commentaire