Post by genegrin » Mon Oct 22, 2018 12:14 am

Could you look at this, please? I heard very different things about robots.txt. And I’m not sure if I completed it correct. Thanks advance.
-------------------------------------------
Disallow: /login
Disallow: /shopping-cart
Disallow: /contact-us
Disallow: /account
Disallow: /*?
Disallow: /vqmod/
Disallow: /cgi-bin/
Disallow: /*&limit
Disallow: /*?limit
Disallow: /*&sort
Disallow: /*?sort
Disallow: /*?route=checkout/
Disallow: /*?route=account/
Disallow: /*?route=product/search
Disallow: /*?route=affiliate/
Disallow: /*&keyword
Disallow: /blog/*?pavreset=?
Disallow: /blog/wp-admin
Disallow: /blog/wp-includes
Disallow: /blog/trackback
Disallow: /blog/*?*
Disallow: /blog/tag

Active Member

Posts

Joined
Wed Jun 08, 2011 3:00 am

Post by rjcalifornia » Mon Oct 22, 2018 12:05 pm

genegrin wrote:
Mon Oct 22, 2018 12:14 am
Could you look at this, please? I heard very different things about robots.txt. And I’m not sure if I completed it correct. Thanks advance.
-------------------------------------------
Disallow: /login
Disallow: /shopping-cart
Disallow: /contact-us
Disallow: /account
Disallow: /*?
Disallow: /vqmod/
Disallow: /cgi-bin/
Disallow: /*&limit
Disallow: /*?limit
Disallow: /*&sort
Disallow: /*?sort
Disallow: /*?route=checkout/
Disallow: /*?route=account/
Disallow: /*?route=product/search
Disallow: /*?route=affiliate/
Disallow: /*&keyword
Disallow: /blog/*?pavreset=?
Disallow: /blog/wp-admin
Disallow: /blog/wp-includes
Disallow: /blog/trackback
Disallow: /blog/*?*
Disallow: /blog/tag
It has to be something like this:

Code: Select all

User-agent: *
Disallow: /administration/
https://support.google.com/webmasters/a ... 2596?hl=en

Image
A2 Hosting features: Shared Turbo Boost, Managed Warp 1, Unmanaged Hyper 1, and Warp 2 Turbo


Active Member

Posts

Joined
Fri Sep 02, 2011 1:19 pm
Location - Worldwide

Post by genegrin » Mon Oct 22, 2018 8:12 pm

Thank you.

Could I also keep all these lines:
Disallow: /*?
Disallow: /*&limit
Disallow: /*?limit
Disallow: /*?sort
Disallow: /*?route=checkout/
Disallow: /*?route=account/
Disallow: /*?route=product/search
Disallow: /*?route=affiliate/
Or just this one
Disallow: /*?

Active Member

Posts

Joined
Wed Jun 08, 2011 3:00 am

Post by ADD Creative » Mon Oct 22, 2018 10:45 pm

One thing to remember is that the robots.txt file won't nessary stop the pages from being indexed. You need to use noindex meta tag or header for that.

From https://support.google.com/webmasters/a ... 2608?hl=en.
While Google won't crawl or index the content blocked by robots.txt, we might still find and index a disallowed URL if it is linked from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the page can still appear in Google search results. To properly prevent your URL from appearing in Google Search results, you should password-protect the files on your server or use the noindex meta tag or response header (or remove the page entirely).
Last edited by ADD Creative on Tue Oct 23, 2018 6:49 pm, edited 1 time in total.

www.add-creative.co.uk


Expert Member

Posts

Joined
Sat Jan 14, 2012 1:02 am
Location - United Kingdom

Post by genegrin » Tue Oct 23, 2018 1:53 am

i don't know how and where to put the noindex meta tag in opencart pages. :crazy:

Active Member

Posts

Joined
Wed Jun 08, 2011 3:00 am

Post by ADD Creative » Tue Oct 23, 2018 6:48 pm

You would need to make modifications. A quick search for 'opencart noindex' returned the following and probably more.

viewtopic.php?t=126686
viewtopic.php?t=173034#p656796

www.add-creative.co.uk


Expert Member

Posts

Joined
Sat Jan 14, 2012 1:02 am
Location - United Kingdom

Post by Elevate » Mon Oct 29, 2018 11:50 pm

Another thing to consider:

Google really wants to crawl every page of your site to 'have a better understanding of it' so they actually suggest not blocking pages and directories and instead using the noindex / nofollow meta tags.

'For non-image files (that is, web pages) robots.txt should only be used to control crawling traffic, typically because you don't want your server to be overwhelmed by Google's crawler or to waste crawl budget crawling unimportant or similar pages on your site. You should not use robots.txt as a means to hide your web pages from Google Search results. This is because other pages might point to your page, and your page could get indexed that way, avoiding the robots.txt file. If you want to block your page from search results, use another method such as password protection or noindex tags or directives.'

https://support.google.com/webmasters/a ... 2608?hl=en

ELEV8TE Website Development
https://www.elev8your.com


User avatar
New member

Posts

Joined
Fri Jul 06, 2018 12:40 am
Location - Denver, Colorado, USA
Who is online

Users browsing this forum: Amazon [Bot] and 45 guests