mardi 1 mai 2018

PHP crawl and return URL jump error in HTML document

my use PHP crawl the web to return html documents

my test website : https:pet9.000webhostapp.com/google.php?q=wiki

why the url in the inside is changed to "original domain name.q= Click on the domain name & sa=xxxx "? (I konw the "&sa=xxxxxxx" is the googleserch suffix )

Because the url is "domain name?q=xxxx" ; so i want use the index.php capture and processing cut the extra head and foot ,but can't capture . My php code:

    <?php

// Do regular expression detection on q If there is obvious url feature PHP crawls this url page

If (preg_match("/(HTTP://)(www.)(.com)(.org)(.edu)/i", "($_GET['q'])($_POST['q'] )")) {

     $q = htmlspecialchars(urlencode($_GET['q']));

     $fh= file_get_contents("$q");

} Else {

//otherwise as a search term PHP crawl Google search page

$q = htmlspecialchars(urlencode($_GET['q']));

$fh= file_get_contents("http://www.google.com/search?q=$q");

}

// Return (print) grabbed HTML document

Echo $fh;

?>




Aucun commentaire:

Enregistrer un commentaire