my use PHP crawl the web to return html documents
my test website : https:pet9.000webhostapp.com/google.php?q=wiki
why the url in the inside is changed to "original domain name.q= Click on the domain name & sa=xxxx "? (I konw the "&sa=xxxxxxx" is the googleserch suffix )
Because the url is "domain name?q=xxxx" ; so i want use the index.php capture and processing cut the extra head and foot ,but can't capture . My php code:
<?php
// Do regular expression detection on q If there is obvious url feature PHP crawls this url page
If (preg_match("/(HTTP://)(www.)(.com)(.org)(.edu)/i", "($_GET['q'])($_POST['q'] )")) {
$q = htmlspecialchars(urlencode($_GET['q']));
$fh= file_get_contents("$q");
} Else {
//otherwise as a search term PHP crawl Google search page
$q = htmlspecialchars(urlencode($_GET['q']));
$fh= file_get_contents("http://www.google.com/search?q=$q");
}
// Return (print) grabbed HTML document
Echo $fh;
?>
Aucun commentaire:
Enregistrer un commentaire