Website Comments, Suggestions & News This is a discussion on, Yahoo! Slurp Spider within the Tiscali / TalkTalk User Announcements forum; Yahoo! Slurp Spider is always searching the forums, does this use a lot of traffic? how can it know what ...

Reply
 
LinkBack Thread Tools
Old 26-06-2007, 01:06 PM   #1
Tiscali User Member
 
acidtechno's Avatar
 
Join Date: Oct 2005
Location: Kent, England
Posts: 1,646
Thanks: 0
Thanked 0 Times in 0 Posts
Yahoo! Slurp Spider

Yahoo! Slurp Spider is always searching the forums, does this use a lot of traffic? how can it know what to search for? I assume it's doing something similar to when a user types a search term in, but what on earth would the yahoo bot be searching for? does it know what kind of forum this is and searches for related keywords to gather pages on the forum for its search engine?
acidtechno is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 26-06-2007, 01:26 PM   #2
Tiscali User Admin
 
aos101's Avatar
 
Join Date: Jun 2004
Location: Kent
Posts: 3,760
Thanks: 1
Thanked 6 Times in 6 Posts
From what I can see it's just searching for all posts by a user, probably using the link in the users profile. It's showing to me as something like "Searching the Forums User: aos101". I don't think the Yahoo spider is using too much traffic at the moment (we've had some search bots use over a GB in a few days, which we blocked), but it has been going crazy lately.

I guess it could be because we merged a couple of small forums into larger forums, and it's decided to re-index a lot of it. I think also it might be indexing the site multiple times, since we have have the .net, .com & .co.uk domains all with the same content.
__________________
Adam
aos101 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 26-06-2007, 01:58 PM   #3
Tiscali User Admin
 
NewsreadeR's Avatar
 
Join Date: Feb 2006
Posts: 388
Thanks: 0
Thanked 0 Times in 0 Posts
Quote:
since we have have the .net, .com & .co.uk domains all with the same content.
That's pretty bad Adam, you really need to do a 301 redirect, so when going to any of the other domains its points to .net

Less resources on the server and you will be penalised by Search Bots for duplicate content.
__________________
SkyUser - Home of the Unofficial Sky Broadband forums

How fast is your current internet connection?

Freedom 2 Surf are without doubt the worst ISP I have ever had the pleasure of paying, for an internet connection.
NewsreadeR is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 26-06-2007, 02:00 PM   #4
Tiscali User Admin
 
NewsreadeR's Avatar
 
Join Date: Feb 2006
Posts: 388
Thanks: 0
Thanked 0 Times in 0 Posts
http://www.google.co.uk/search?q=sit...L_enGB211GB211

Shows that you have 25,900 pages indexed by Google and unless you have blocked showposts in your robots.txt, you should have many 1000's more.
__________________
SkyUser - Home of the Unofficial Sky Broadband forums

How fast is your current internet connection?

Freedom 2 Surf are without doubt the worst ISP I have ever had the pleasure of paying, for an internet connection.
NewsreadeR is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 26-06-2007, 03:35 PM   #5
Tiscali User Admin
 
aos101's Avatar
 
Join Date: Jun 2004
Location: Kent
Posts: 3,760
Thanks: 1
Thanked 6 Times in 6 Posts
Yeah we'll look at perhaps doing that sometime. It's always been done that way because the .co.uk and .com are just alias domains on the account (which is only for one domain) so only show the same content as the main domain. I'll have to check the host header on all requests and redirect clients to the .net URL of the page they requested. I'll have to lookup how I can do all that...
__________________
Adam
aos101 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 26-06-2007, 03:47 PM   #6
Tiscali User Admin
 
NewsreadeR's Avatar
 
Join Date: Feb 2006
Posts: 388
Thanks: 0
Thanked 0 Times in 0 Posts
Adam,

If you go to whoever you have registered the domains with, you should be able to do it there, if you need any help give me a shout

Or you can use .htaccess to achieve the same afaik
__________________
SkyUser - Home of the Unofficial Sky Broadband forums

How fast is your current internet connection?

Freedom 2 Surf are without doubt the worst ISP I have ever had the pleasure of paying, for an internet connection.
NewsreadeR is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 26-06-2007, 03:52 PM   #7
Tiscali User Admin
 
NewsreadeR's Avatar
 
Join Date: Feb 2006
Posts: 388
Thanks: 0
Thanked 0 Times in 0 Posts
Quote:
Originally Posted by acidtechno View Post
Yahoo! Slurp Spider is always searching the forums, does this use a lot of traffic? how can it know what to search for? I assume it's doing something similar to when a user types a search term in, but what on earth would the yahoo bot be searching for? does it know what kind of forum this is and searches for related keywords to gather pages on the forum for its search engine?

Hi Acid,

Without boring you to death, the role of a search engine bot is to find content.

Whilst it does consume bandwidth, it is essential that only known bad bots are blocked. Bots live on content and they thrive the more there is.

When you go to a search engine, and type in certain words, ie Freedom to support, then it looks in its cache and finds the most relevant results to your search.

A site like F2Support has 1000's of posts and these would change on a daily basis as people post to the threads. All Yahoo is doing is reindexing so it has the latest information available.

The fact that Yahoo spends so much time on this site, shows it has good content, you just have to manipulate that content to your advantage.

In essence, Yahoo or Google, should have every page of this site in its cache and then send traffic to this site if it is fitting.
__________________
SkyUser - Home of the Unofficial Sky Broadband forums

How fast is your current internet connection?

Freedom 2 Surf are without doubt the worst ISP I have ever had the pleasure of paying, for an internet connection.
NewsreadeR is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 26-06-2007, 03:54 PM   #8
Tiscali User Admin
 
NewsreadeR's Avatar
 
Join Date: Feb 2006
Posts: 388
Thanks: 0
Thanked 0 Times in 0 Posts
Adam,

Heres our robots.txt if you want to use?

Code:
User-agent: LinkWalker 
Disallow: /

User-agent: Googlebot
Disallow: /betaforum

User-agent: *
Disallow: /forum/ajax.php
Disallow: /forum/attachment.php
Disallow: /forum/calendar.php
Disallow: /forum/cron.php
Disallow: /forum/editpost.php
Disallow: /forum/global.php
Disallow: /forum/image.php
Disallow: /forum/inlinemod.php
Disallow: /forum/joinrequests.php
Disallow: /forum/login.php
Disallow: /forum/member.php
Disallow: /forum/misc.php
Disallow: /forum/moderator.php
Disallow: /forum/newattachment.php
Disallow: /forum/newreply.php
Disallow: /forum/newthread.php
Disallow: /forum/online.php
Disallow: /forum/poll.php
Disallow: /forum/postings.php
Disallow: /forum/printthread.php
Disallow: /forum/private.php
Disallow: /forum/profile.php
Disallow: /forum/register.php
Disallow: /forum/report.php
Disallow: /forum/reputation.php
Disallow: /forum/search.php
Disallow: /forum/sendmessage.php
Disallow: /forum/showgroups.php
Disallow: /forum/showpost.php
Disallow: /forum/subscription.php
Disallow: /forum/threadrate.php
Disallow: /forum/usercp.php
Disallow: /forum/usernote.php
Disallow: /admin/
Disallow: /images/
Disallow: /includes/
Disallow: /skins/

Sitemap: http://www.yourdomain.co.uk/forum/sitemap_index.xml.gz
The sitemap thingy is optional, but is really required to help index your site.
__________________
SkyUser - Home of the Unofficial Sky Broadband forums

How fast is your current internet connection?

Freedom 2 Surf are without doubt the worst ISP I have ever had the pleasure of paying, for an internet connection.
NewsreadeR is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 26-06-2007, 03:59 PM   #9
Tiscali User Admin
 
aos101's Avatar
 
Join Date: Jun 2004
Location: Kent
Posts: 3,760
Thanks: 1
Thanked 6 Times in 6 Posts
Thanks - I might use that to keep bots away from things they don't need to index. I've thought about doing a sitemap before but never got round to it. I'll maybe have a look at that when the new main site is done.
__________________
Adam
aos101 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 26-06-2007, 04:02 PM   #10
Tiscali User Admin
 
NewsreadeR's Avatar
 
Join Date: Feb 2006
Posts: 388
Thanks: 0
Thanked 0 Times in 0 Posts
Give me a shout if you need help Adam
__________________
SkyUser - Home of the Unofficial Sky Broadband forums

How fast is your current internet connection?

Freedom 2 Surf are without doubt the worst ISP I have ever had the pleasure of paying, for an internet connection.
NewsreadeR is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 26-06-2007, 04:22 PM   #11
Tiscali User Admin
 
aos101's Avatar
 
Join Date: Jun 2004
Location: Kent
Posts: 3,760
Thanks: 1
Thanked 6 Times in 6 Posts
Thanks Ged, will do.
__________________
Adam
aos101 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 27-06-2007, 09:04 AM   #12
Tiscali User Admin
 
aos101's Avatar
 
Join Date: Jun 2004
Location: Kent
Posts: 3,760
Thanks: 1
Thanked 6 Times in 6 Posts
Quote:
Most users ever online was 399, Yesterday at 10:12 PM.
Haha, they've managed to break the record for the most users online which had been the same for ages. This is getting rather silly. I'm gonna have to fix this.
__________________
Adam
aos101 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 27-06-2007, 09:31 AM   #13
Samizdata
Guest
 
Posts: n/a
Quote:
Originally Posted by aos101 View Post
I'm gonna have to fix this.
Putting this in robots.txt might do the trick:
Code:
User-agent: Slurp
Crawl-delay: 10
 
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 28-06-2007, 05:56 PM   #14
Tiscali User Admin
 
aos101's Avatar
 
Join Date: Jun 2004
Location: Kent
Posts: 3,760
Thanks: 1
Thanked 6 Times in 6 Posts
The redirect is now setup so any requests to the .co.uk or .com domains are sent to the corresponding page at www.freedom2support.net. Yahoo has just under 14,000 pages indexed on our .com domain and 17,000 indexed on our co.uk domain (without the www). They shouldn't need to index quite as much stuff now everyone will get sent to the .net address.

This will mean anyone who logs into the forums using the .com or .co.uk domains will get logged out when they visit as they are sent to the .net address, but they just need to login under the .net address. There is one case where a page is redirected twice but that shouldn't matter.

Thanks Samizdata. I'll see if this redirect has any effect, and if not I'll stick that in the robots.txt
__________________
Adam
aos101 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old 16-07-2007, 04:03 PM   #15
Tiscali User Admin
 
aos101's Avatar
 
Join Date: Jun 2004
Location: Kent
Posts: 3,760
Thanks: 1
Thanked 6 Times in 6 Posts
Stuck a crawl delay on. I don't see why Yahoo's spider feels the need to go crazy when most other spiders are more controlled.
__________________
Adam
aos101 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply

Tags
None


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may post new threads
You may post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Returned Hotmail emails TonyJ E-Mail 60 05-05-2007 09:45 PM
Problems With BT Yahoo Thomson SpeedTouch 330 ADSL Modem jolo0924 General Computing and Internet 0 10-03-2007 03:31 PM
Yahoo addresses and Nildram?? mdmcholet E-Mail 23 08-01-2007 08:58 AM


All times are GMT. The time now is 05:37 PM.