lundi 19 octobre 2020

Online website crawlers can't crawl "/en" translated pages

For instance my page is https://www.example.com/fr/page It's translated version in English is https://www.example.com/en/page

But online website crawlers can only find /fr page.

I checked my robots.txt and .htaccess files but nothing in there is blocking such pages in my opinion.

robots.txt :

Allow: /

# Do not index website's backend
# Disallow: /admin/

# Do not index some specific files
Disallow: /composer.json
Disallow: /composer.lock
Disallow: /CONTRIBUTING.md
Disallow: /CONTRIBUTOR_LICENSE_AGREEMENT.html
Disallow: /COPYING.txt
Disallow: /Gruntfile.js
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /nginx.conf.sample
Disallow: /package.json
Disallow: /php.ini.sample
Disallow: /RELEASE_NOTES.txt

# Do not index session ID
# Disallow: /*?SID=
# Disallow: /*?
# Disallow: /*.php$

# Do not index CVS, SVN directory and dump files
Disallow: /*.CVS
Disallow: /*.Zip$
Disallow: /*.Svn$
Disallow: /*.Idea$
Disallow: /*.Sql$
Disallow: /*.Tgz$

Sitemap : https://www.example.com/sitemap.xml

.htaccess :

    <IfModule mod_negotiation.c>
        Options -MultiViews
    </IfModule>
    RewriteEngine On
    ##
    ## You may need to uncomment the following line for some hosting environments,
    ## if you have installed to a subdirectory, enter the name here also.
    ##
    RewriteBase /*companyname*/
    RewriteRule ^index\.php$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]
    
    ##
    ## Force www.  
    ##
    # RewriteEngine On
    # RewriteCond %{HTTP_HOST} !^www\.
    # RewriteRule ^(.*)$ https://www.%{HTTP_HOST}/$1 [R=301,L]
    ##
    ## Uncomment following lines to force HTTPS.
    ##
    # RewriteCond %{HTTPS} off
    # RewriteRule (.*) https://%{SERVER_NAME}/$1 [R,L]
    ##
    ## Black listed folders
    ##
    RewriteRule ^bootstrap/.* index.php [L,NC]
    RewriteRule ^config/.* index.php [L,NC]
    RewriteRule ^vendor/.* index.php [L,NC]
    RewriteRule ^storage/cms/.* index.php [L,NC]
    RewriteRule ^storage/logs/.* index.php [L,NC]
    RewriteRule ^storage/framework/.* index.php [L,NC]
    RewriteRule ^storage/temp/protected/.* index.php [L,NC]
    RewriteRule ^storage/app/uploads/protected/.* index.php [L,NC]
    ##
    ## White listed folders
    ##
    RewriteCond %{REQUEST_FILENAME} -f
    RewriteCond %{REQUEST_FILENAME} !/.well-known/*
    RewriteCond %{REQUEST_FILENAME} !/storage/app/uploads/.*
    RewriteCond %{REQUEST_FILENAME} !/storage/app/media/.*
    RewriteCond %{REQUEST_FILENAME} !/storage/temp/public/.*
    RewriteCond %{REQUEST_FILENAME} !/themes/.*/(assets|resources)/.*
    RewriteCond %{REQUEST_FILENAME} !/plugins/.*/(assets|resources)/.*
    RewriteCond %{REQUEST_FILENAME} !/modules/.*/(assets|resources)/.*
    RewriteRule !^index.php index.php [L,NC]
    ##
    ## Block all PHP files, except index
    ##
    RewriteCond %{REQUEST_FILENAME} -f
    RewriteCond %{REQUEST_FILENAME} \.php$
    RewriteRule !^index.php index.php [L,NC]
    ##
    ## Standard routes
    ##
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^ index.php [L]
</IfModule>

The website is made through OctoberCMS Build 447.

If i try to crawl the website with commands like wget --spider and print the result on a file, I can get every URLs, including /en ones. (Thats what I use to formate and generate my own sitemap.xml without any plugin). Maybe something is wrong with my translating plugin "Rainlab Translate" ?

Thank you in advance.

Aucun commentaire:

Enregistrer un commentaire