mercredi 13 avril 2016

Scrape webpage data with proxy

The following code scrapes the source of the inputted site, I want to do the same - but with a proxy inputted by the user.

Console.WriteLine("Enter path");
            string fileName = Console.ReadLine();
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();

            if (response.StatusCode == HttpStatusCode.OK)
            {
                Console.WriteLine("Page OK");
                Stream receiveStream = response.GetResponseStream();
                StreamReader readStream = null;

                if (response.CharacterSet == null)
                {
                    readStream = new StreamReader(receiveStream);
                }
                else
                {
                    readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
                }

                string data = readStream.ReadToEnd();


                response.Close();
                readStream.Close();
                Console.WriteLine(data);

                System.IO.File.WriteAllText(@fileName, data);

I have tried the following code - but i get the error: System.UriFormatException

            Console.WriteLine("proxy ip:");
            string proxyip = Console.ReadLine();
            Console.WriteLine("port");
            string proxyport = Console.ReadLine();
            string proxyaddress = (proxyip + ":" + proxyport);
            HttpWebRequest requestproxy = (HttpWebRequest)WebRequest.Create("url");
            WebProxy myproxy = new WebProxy(proxyaddress, false);
            requestproxy.Proxy = myproxy;
            HttpWebResponse responseproxy = (HttpWebResponse)requestproxy.GetResponse();
            Console.WriteLine("file path:");
            string fileName = Console.ReadLine();

            if (responseproxy.StatusCode == HttpStatusCode.OK)
            {
                Console.WriteLine("Page OK");
                Stream receiveStream = responseproxy.GetResponseStream();
                StreamReader readStream = null;

                if (responseproxy.CharacterSet == null)
                {
                    readStream = new StreamReader(receiveStream);
                }
                else
                {
                    readStream = new StreamReader(receiveStream, Encoding.GetEncoding(responseproxy.CharacterSet));
                }

                string data = readStream.ReadToEnd();

                responseproxy.Close();
                readStream.Close();
                Console.WriteLine(data);
                System.IO.File.WriteAllText(@fileName, data);

What is wrong with the above code?




Aucun commentaire:

Enregistrer un commentaire