View Full Version : Secured Archeive Pages


BungalowBill
06-20-2004, 11:42 PM
I need to submit articles that are protected by a cookie, using .asp. If the cookie is present the users get the article. If the cookie is not present then a seperate page is displayed asking the person to join. What will happen If I submit the article page to the browser. Will it index the target article or stop when it has not cookie? :eek:

Pyrrhonist
06-21-2004, 02:31 PM
What you're going to need to do is put an exception on the page, so that the search engine doesn't need the cookie to view the article. This is something similar to what www.medbroadcast.com does. You can see that many of their pages are indexed by Google, but can't be viewed until you agree to the terms of service when you visit them. (They use a php flag to indicate that you've agreed, but it's the same type of idea.)

BungalowBill
06-22-2004, 01:02 PM
But how do I know if it's a spyder?

Pyrrhonist
06-22-2004, 01:17 PM
In order to determine whether it's a spider or not, you want to examine the HTTP_USER_AGENT that is sent with every request to your site. What you have to create is a conditional that says:
if (spider)
ignore cookie requirement
else
require cookie


I did a quick search for lists of user agents, and found quite a large list here:

http://www.siteware.ch/webresources/useragents/db.html
and http://www.pgts.com.au/pgtsj/pgtsj0208c.html

I have no idea whether these are the best or not - I just wanted to provide you with an example.

As to how to implement this, there's a couple ways you could do it. You could put a comprehensive list of user agents into a file, and then load the file and check against that. Doing it that way gives you the advantage of being able to update the list as often as required, and makes maintenance really quite easy.

You can also load the user-agents into an array and iterate through the array when checking.

Or the quick and easy hack is to choose 2 or 3 really important ones (googlebot and inktomi slurp come to mind) and create a longer if statement (if (googlebot || inktomi.slurp) and so on).

Oh yeah, you can check to see how this is working for you (and if it is working at all) by setting your script to exempt mozilla and then trying to load the page. If you can get in without the cookie, then you know you're set, and you can get rid of the exemption for Mozilla and off you go on you way.

Does that help at all?

BungalowBill
06-22-2004, 07:54 PM
Thank-you, thats just the info I needed.

Claude Wilson (BungalowBill)

Pyrrhonist
06-22-2004, 08:43 PM
No problem Claude,

just a hint, put your site in your signature - it will end up giving you a backlink for every post that you make.

eCommando
06-23-2004, 09:09 AM
That's kinda like cloaking...