| Website Comments, Suggestions & News This is a discussion on, Yahoo! Slurp Spider within the Tiscali / TalkTalk User Announcements forum; Yahoo! Slurp Spider is always searching the forums, does this use a lot of traffic? how can it know what ... |
![]() |
![]() |
|
LinkBack | Thread Tools | ![]() |
|
|
#1 |
|
Join Date: Oct 2005
Location: Kent, England
Posts: 1,646
Thanks: 0
Thanked 0 Times in 0 Posts
|
Yahoo! Slurp Spider
Yahoo! Slurp Spider is always searching the forums, does this use a lot of traffic? how can it know what to search for? I assume it's doing something similar to when a user types a search term in, but what on earth would the yahoo bot be searching for? does it know what kind of forum this is and searches for related keywords to gather pages on the forum for its search engine?
|
|
|
|
|
|
#2 |
|
Join Date: Jun 2004
Location: Kent
Posts: 3,760
Thanks: 1
Thanked 6 Times in 6 Posts
|
From what I can see it's just searching for all posts by a user, probably using the link in the users profile. It's showing to me as something like "Searching the Forums User: aos101". I don't think the Yahoo spider is using too much traffic at the moment (we've had some search bots use over a GB in a few days, which we blocked), but it has been going crazy lately.
I guess it could be because we merged a couple of small forums into larger forums, and it's decided to re-index a lot of it. I think also it might be indexing the site multiple times, since we have have the .net, .com & .co.uk domains all with the same content.
__________________
Adam |
|
|
|
|
|
#3 | |
|
Join Date: Feb 2006
Posts: 388
Thanks: 0
Thanked 0 Times in 0 Posts
|
Quote:
Less resources on the server and you will be penalised by Search Bots for duplicate content.
__________________
SkyUser - Home of the Unofficial Sky Broadband forums How fast is your current internet connection? Freedom 2 Surf are without doubt the worst ISP I have ever had the pleasure of paying, for an internet connection. |
|
|
|
|
|
|
#4 |
|
Join Date: Feb 2006
Posts: 388
Thanks: 0
Thanked 0 Times in 0 Posts
|
http://www.google.co.uk/search?q=sit...L_enGB211GB211
Shows that you have 25,900 pages indexed by Google and unless you have blocked showposts in your robots.txt, you should have many 1000's more.
__________________
SkyUser - Home of the Unofficial Sky Broadband forums How fast is your current internet connection? Freedom 2 Surf are without doubt the worst ISP I have ever had the pleasure of paying, for an internet connection. |
|
|
|
|
|
#5 |
|
Join Date: Jun 2004
Location: Kent
Posts: 3,760
Thanks: 1
Thanked 6 Times in 6 Posts
|
Yeah we'll look at perhaps doing that sometime. It's always been done that way because the .co.uk and .com are just alias domains on the account (which is only for one domain) so only show the same content as the main domain. I'll have to check the host header on all requests and redirect clients to the .net URL of the page they requested. I'll have to lookup how I can do all that...
__________________
Adam |
|
|
|
|
|
#6 |
|
Join Date: Feb 2006
Posts: 388
Thanks: 0
Thanked 0 Times in 0 Posts
|
Adam,
If you go to whoever you have registered the domains with, you should be able to do it there, if you need any help give me a shout ![]() Or you can use .htaccess to achieve the same afaik
__________________
SkyUser - Home of the Unofficial Sky Broadband forums How fast is your current internet connection? Freedom 2 Surf are without doubt the worst ISP I have ever had the pleasure of paying, for an internet connection. |
|
|
|
|
|
#7 | |
|
Join Date: Feb 2006
Posts: 388
Thanks: 0
Thanked 0 Times in 0 Posts
|
Quote:
Hi Acid, Without boring you to death, the role of a search engine bot is to find content. Whilst it does consume bandwidth, it is essential that only known bad bots are blocked. Bots live on content and they thrive the more there is. When you go to a search engine, and type in certain words, ie Freedom to support, then it looks in its cache and finds the most relevant results to your search. A site like F2Support has 1000's of posts and these would change on a daily basis as people post to the threads. All Yahoo is doing is reindexing so it has the latest information available. The fact that Yahoo spends so much time on this site, shows it has good content, you just have to manipulate that content to your advantage. In essence, Yahoo or Google, should have every page of this site in its cache and then send traffic to this site if it is fitting.
__________________
SkyUser - Home of the Unofficial Sky Broadband forums How fast is your current internet connection? Freedom 2 Surf are without doubt the worst ISP I have ever had the pleasure of paying, for an internet connection. |
|
|
|
|
|
|
#8 |
|
Join Date: Feb 2006
Posts: 388
Thanks: 0
Thanked 0 Times in 0 Posts
|
Adam,
Heres our robots.txt if you want to use? Code:
User-agent: LinkWalker Disallow: / User-agent: Googlebot Disallow: /betaforum User-agent: * Disallow: /forum/ajax.php Disallow: /forum/attachment.php Disallow: /forum/calendar.php Disallow: /forum/cron.php Disallow: /forum/editpost.php Disallow: /forum/global.php Disallow: /forum/image.php Disallow: /forum/inlinemod.php Disallow: /forum/joinrequests.php Disallow: /forum/login.php Disallow: /forum/member.php Disallow: /forum/misc.php Disallow: /forum/moderator.php Disallow: /forum/newattachment.php Disallow: /forum/newreply.php Disallow: /forum/newthread.php Disallow: /forum/online.php Disallow: /forum/poll.php Disallow: /forum/postings.php Disallow: /forum/printthread.php Disallow: /forum/private.php Disallow: /forum/profile.php Disallow: /forum/register.php Disallow: /forum/report.php Disallow: /forum/reputation.php Disallow: /forum/search.php Disallow: /forum/sendmessage.php Disallow: /forum/showgroups.php Disallow: /forum/showpost.php Disallow: /forum/subscription.php Disallow: /forum/threadrate.php Disallow: /forum/usercp.php Disallow: /forum/usernote.php Disallow: /admin/ Disallow: /images/ Disallow: /includes/ Disallow: /skins/ Sitemap: http://www.yourdomain.co.uk/forum/sitemap_index.xml.gz
__________________
SkyUser - Home of the Unofficial Sky Broadband forums How fast is your current internet connection? Freedom 2 Surf are without doubt the worst ISP I have ever had the pleasure of paying, for an internet connection. |
|
|
|
|
|
#9 |
|
Join Date: Jun 2004
Location: Kent
Posts: 3,760
Thanks: 1
Thanked 6 Times in 6 Posts
|
Thanks - I might use that to keep bots away from things they don't need to index. I've thought about doing a sitemap before but never got round to it. I'll maybe have a look at that when the new main site is done.
__________________
Adam |
|
|
|
|
|
#10 |
|
Join Date: Feb 2006
Posts: 388
Thanks: 0
Thanked 0 Times in 0 Posts
|
Give me a shout if you need help Adam
__________________
SkyUser - Home of the Unofficial Sky Broadband forums How fast is your current internet connection? Freedom 2 Surf are without doubt the worst ISP I have ever had the pleasure of paying, for an internet connection. |
|
|
|
|
|
#11 |
|
Join Date: Jun 2004
Location: Kent
Posts: 3,760
Thanks: 1
Thanked 6 Times in 6 Posts
|
Thanks Ged, will do.
__________________
Adam |
|
|
|
|
|
#12 | |
|
Join Date: Jun 2004
Location: Kent
Posts: 3,760
Thanks: 1
Thanked 6 Times in 6 Posts
|
Quote:
__________________
Adam |
|
|
|
|
|
|
#13 |
|
Guest
Posts: n/a
|
|
|
|
|
#14 |
|
Join Date: Jun 2004
Location: Kent
Posts: 3,760
Thanks: 1
Thanked 6 Times in 6 Posts
|
The redirect is now setup so any requests to the .co.uk or .com domains are sent to the corresponding page at www.freedom2support.net. Yahoo has just under 14,000 pages indexed on our .com domain and 17,000 indexed on our co.uk domain (without the www). They shouldn't need to index quite as much stuff now everyone will get sent to the .net address.
This will mean anyone who logs into the forums using the .com or .co.uk domains will get logged out when they visit as they are sent to the .net address, but they just need to login under the .net address. There is one case where a page is redirected twice but that shouldn't matter. Thanks Samizdata. I'll see if this redirect has any effect, and if not I'll stick that in the robots.txt
__________________
Adam |
|
|
|
|
|
#15 |
|
Join Date: Jun 2004
Location: Kent
Posts: 3,760
Thanks: 1
Thanked 6 Times in 6 Posts
|
Stuck a crawl delay on. I don't see why Yahoo's spider feels the need to go crazy when most other spiders are more controlled.
__________________
Adam |
|
|
|
![]() |
| Tags |
| None |
«
Previous Thread
|
Next Thread
»
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Returned Hotmail emails | TonyJ | 60 | 05-05-2007 09:45 PM | |
| Problems With BT Yahoo Thomson SpeedTouch 330 ADSL Modem | jolo0924 | General Computing and Internet | 0 | 10-03-2007 03:31 PM |
| Yahoo addresses and Nildram?? | mdmcholet | 23 | 08-01-2007 08:58 AM | |
All times are GMT. The time now is 05:37 PM.











