I'm trying to figure out a clean way to stop web crawlers from accessing my site except for the www subdomain on port 80.
Here's what my Apache config looks like:
<VirtualHost 255.255.255.255:80>
Include /etc/httpd/conf/apps/website.common
ServerName www.website.com
ServerAlias www.website.com
alias /robots.txt /var/www/apps/website/current/public/okrobots.txt
</VirtualHost>
<VirtualHost 255.255.255.255:80>
Include /etc/httpd/conf/apps/website.common
ServerName star.website.com
ServerAlias *.website.com
alias /robots.txt /var/www/apps/website/current/public/robots.txt
</VirtualHost>
<VirtualHost 255.255.255.255:443>
Include /etc/httpd/conf/apps/website.common
SSLEngine on
SSLCertificateFile /etc/httpd/conf/apps/ssl/website.crt
SSLCertificateKeyFile /etc/httpd/conf/apps/ssl/website.key
SSLCACertificateFile /etc/httpd/conf/apps/ssl/website_ca_bundle.crt
SSLProtocol -ALL +SSLv3 +TLSv1
SSLCipherSuite ALL:!ADH:!LOW:!SSLv2:!EXP:+HIGH:+MEDIUM
RequestHeader set X_FORWARDED_PROTO 'https'
</VirtualHost>
okrobots.txt only allows the access to homepage and contains
User-Agent: *
Allow: /$
Disallow: /
and robots.txt contains
User-Agent: *
Disallow: /
Would I be able to put alias /robots.txt /var/www/apps/website/current/public/robots.txt in the /etc/httpd/conf/apps/website.common file?
I think this would stop all robots from accessing the site because I use the Include directive in each of the VirtualHost blocks. In the first VirtualHost block I would then override the alias in the common file to use the okrobots.txt file to let the crawlers access the www site on port 80.
So, my main concern is, can I effectively override the directives in my common file?
Aucun commentaire:
Enregistrer un commentaire