View Full Version : block indexing via HTTPS
06-15-2004, 08:42 PM
I just took a look at the listings on Google for a site that I maintain and today Google added a reference for every page on the site using https in addition to the pages already indexed via http.
Question 1, is this going to kill me/look like duplicate content.
Question 2, how can I prevent indexing via HTTPS? Can it be done using robots.txt?
Thanks in advance,
06-17-2004, 12:56 PM
Does anyone have any suggestions that would help me out here? I've been looking all over but haven't found any information on how to block search engines from indexing my site via https. I don't want to get penalized for duplicate content.
06-17-2004, 04:10 PM
I thought google doesn't index https pages.
Have you seen any https pages in the SERPs?
06-17-2004, 04:40 PM
They aren't showing up in the SERPs yet, but they're indexed. If I do a search for site:url they're all in there.
06-17-2004, 11:49 PM
I would assume that this would be something best taken care of at the web server. Why would you want it to answer to both http and https for the same page?
Also, assuming that the server just answers for both secure and non-secure requests with the same page in all cases (and it does not for the 2 domains in your sig), it would indicate that you have a link from one version to another on a page in your site.
If you want to pm me an example URL, I have a tool that can check that.
If it is not a server side configuration, and there are two sites (http & https), you can just add the meta robots tag to each duplicate (https?) page.
I hope at least some of that makes sense.
new jersey real estate (http://new-jersey.realsearch.com/) connecticut real estate (http://connecticut.realsearch.com/)
06-18-2004, 01:04 AM
why you are not using it robots.txt to stop google from pages which you do not want to index. It will work
06-18-2004, 01:15 AM
[QUOTE=megri]why you are not using it robots.txt to stop google from pages which you do not want to index. It will work[/QUOTE]
It depends on how things are set up. In the posts above, there is not enough information to determine the problem.
However, it is quite likely that there is only a single directory, and the same page is servers for both secure and non-secure requests.
If that is the case, blocking one in robots text would also block the other.
Actually, to clarify it a bit more, he most likely is not trying to block access to a page, but needs to block the type of access to a page.
It is simply a matter of how the request is made, with the major difference being which server port number the request is made on (standards are port 80 for http, and 443 for https), but all requests simply go to the same place for the same page.
One he wants to continue to happen, the other he does not.
I hope that this does snot just make things more confusing. Basically the difference from what you suggest (which would otherwise work), is that one is a secure site, and robots.txt can not filter by port number.
massachusetts real estate (http://massachusetts.realsearch.com/) michigan real estate (http://michigan.realsearch.com/)
vBulletin v3.0.3, Copyright ©2000-2013, Jelsoft Enterprises Ltd.