![]() |
|
|
|||||||
| Register | FAQ | Members List | Domain Registrar Pricing | Rules | Search | Today's Posts | Mark Forums Read |
![]() |
|
|
Thread Tools | Display Modes |
|
#1
|
||||
|
||||
|
It's been asked recently how to hide from search engines in another thread.
I have been asked the same thing lately elsewhere due to this site here extracting everyones details. http://www.123people.com/ If anyone has some pointers on how to hide all of or part of your site from being crawled, that would be much appreciated. Something that has apparently made a difference was adding this line to robots.txt Disallow: /directory_name/ Then deleting the directory via google webmaster tools.
__________________
Poker Tournaments | Web Design & Development Coming Soon: Play Dough | Pizza Tonight | Poker Wear | Poker Talk | More soon... |
|
#2
|
||||
|
||||
|
thank you
|
|
#3
|
||||
|
||||
|
Yeh I had to go through and do this to a few of my sites after being quietly hacked and had malicious content uploaded. There were hidden links and pages, indexing nearly 5000 pages into google of adult/pharmacy crap. Webmaster Tools from google is very handy for that type of stuff, they were all excluded in a matter of days.
|
|
#4
|
||||
|
||||
|
the problem my mate is having with 123people is they use yahoo apis, so not real sure if yahoo have an equivelant exclude option.
__________________
Poker Tournaments | Web Design & Development Coming Soon: Play Dough | Pizza Tonight | Poker Wear | Poker Talk | More soon... |
|
#5
|
||||
|
||||
|
You can also use meta tags. It's a good idea to use both.
Also use .htaccess file to restrict access to directories
__________________
Index > Rank > Monetise |
|
#6
|
|||
|
|||
|
My understanding (which may be better or worse than the suggestions above as i'm not an abosolute expert on this) is that you are close to the money with your origional suggestion of using robots.txt - with a couple buyer bewares.
Basically the three possible files you could edit are robots.txt, .htaccess or .htpasswd (there are MANY other ways to restrict content of course). .htpasswd I won't discuss as I have never really used it and I presume you don't want everyone to need a password to access the site. By far the easiest to work with in this case is the robots.txt. This file basically tells robots if they are or are not allowed to crawl your site. You can set this to disallow nothing, disallow all, or disallow specific directories (there is no "allow" only disallow - anything not disallowed is presumed allowed). The following allows everything: User-agent: * disallow: The following disallows everything (note the slash) User-agent: * disallow: / And various versions are possible blocking specific bots (aka user-agents) or specific directories or combinations of the two. The only problems are you need to know the name of the bot that is doing the crawling you dont want (maybe from access logs), and that it isn't a restriction just an advice - a "bad" bot that hasn't been programmed to play nice will just ignore it and do what it wants. With .htaccess you can restrict access to the whole site or directories but it's more complicated. AND it 's worth noting that this will apply to anyone and everyone, unless you know the IP address of the "bad bot" and it doesn't keep changing IP. PLUS, you then really should do tricky stuff to ensure the bot can still see robots.txt (blocking everything blocks robots.txt as well), as otherwise it will not find robots.txt, will presume it doesn't exist, will then try to crawl other pages, fail and then keep going - no real problem except a LOT of extra server traffic. If you then want the pages removed from google as well just use webmaster tools as suggested above - not sure for other engines, though in time you should disapear if using robots.txt. For more info try this link: http://www.webmasterworld.com/robots_txt/3523560.htm Regarding your specific problem with 123people, i'm guessing if they already have your info they aren't giving it up, and removing pages from the web after the horse has bolted won't help. SORRY ALL for being so wordy, hopefully this may help a little bit not just cause more confusion! Matt Marketing Web ___________________________________ Anti Aging Medicine Cattle Crushes Promotional Bags Free Christian Dating Australia |
|
#7
|
||||
|
||||
|
They search live from my understanding so removing should help eventually.
__________________
Poker Tournaments | Web Design & Development Coming Soon: Play Dough | Pizza Tonight | Poker Wear | Poker Talk | More soon... |
![]() |
| Thread Tools | |
| Display Modes | |
|
|