Playing in Googlebots Sandbox with Slurp, Teoma, & MSNbot - Spiders Display Differing Personalities

There has been endless webmaster speculation and worry aboutthe so-called "Google Sandbox" - the indexing time delay fornew domain names - rumored to last for at least 45 days fromthe date of first "discovery" by Googlebot. This recognizedlisting delay came to be called the "Google Sandbox effect."

Ruminations on the algorithmic elements of this sandbox timedelay have ranged widely since the indexing delay was firstnoticed in spring of 2004. Some believe it to be an issue ofone single element of good search engine optimization suchas linking campaigns. Link building has been the focus ofmost discussion, but others have focused on the possibilityof size of a new site or internal linking structure or justspecific time delays as most relevant algorithmic elements.

Rather than contribute to this speculation and furthermuddy the Sandbox, we'll be looking at a case study of asite on a new domain name, established May 11, 2005 and thespecific site structure, submissions activity, external andinternal linking. We'll see how this plays out in search engine spider activity vs. indexing dates at the top foursearch engines.

Ready? We'll give dates and crawler action in daily lists andsee how this all plays out on this single new site over time.

* May 11, 2005 Basic text on large site posted on newlypurchased domain name and going live by days end. Searchfriendly structure implemented with text linking makingfull discovery of all content possible by robots. Homepage updated with 10 new text content pages added daily.Submitted site at Google's "Add URL" submission page.

* May 12 - 14 - No visits by Slurp, MSNbot, Teoma or Google.(Slurp is Yahoo's spider and Teoma is from Ask Jeeves)Posted link on WebSite101 to new domain at Publish101.com

* May 15 - Googlebot arrives and eagerly crawls 245 pageson new domain after looking for, but not finding therobots.txt file. Oooops! Gotta add that robots.txt file!

* May 16 - Googlebot returns for 5 more pages and stops.Slurp greedily gobbles 1480 pages and 1892 bad links!Those bad links were caused by our email masking meantto keep out bad bots. How ironic slurp likes these.

* May 17 - Slurp finds 1409 more masking links & only 209new content pages. MSNbot visits for the first time andasks for robots.txt 75 times during the day, but leaveswhen it finds that file missing! Finally get around to add robots.txt by days end & stop slurp crawling email masking links and let MSNbot know it's safe to come in!

* May 23 - Teoma spider shows up for the first time and crawls 93 pages. Site gets slammed by BecomeBot, a spiderthat hits a page every 5 to 7 seconds and strains ourresources with 2409 rapid fire requests for pages. Added BecomeBot to robots.txt exclusion list to keep 'em out.

* May 24 - MSNbot has stopped showing up for a week sincefinding the robots.txt file missing. Slurp is showing upevery few hours looking at robots.txt and leaving againwithout crawling anything now that it is excluded fromthe email masking links. BecomeBot appears to be honoringthe robots.txt exclusion but asks for that file 109 timesduring the day. Teoma crawls 139 more pages.

* May 25 - We realize that we need to re-allocate serverresources and database design and this requires changesto URL's, which means all previously crawled pages arenow bad links! Implement subdomains and wonder what now?Slurp shows up and finds thousands of new email maskinglinks as the robots.txt was not moved to new directorystructures. Spiders are getting errors pages upon newvisits. Scampering to put out fires after wide-rangingchanges to site, we miss this for a week. Spider actionis spotty for 10 days until we fix robots.txt

* June 4 - Teoma returns and crawls 590 pages! No others.

* June 5 - Teoma returns and crawls 1902 pages! No others.

* June 6 - Teoma returns and crawls 290 pages. No others.

* June 7 - Teoma returns and crawls 471 pages. No others.

* June 8-14 Odd spider behavior, looking at robots.txt only.

* June 15 - Slurp gets thirsty, gulps 1396 pages! No others.

* June 16 - Slurp still thirsty, gulps 1379 pages! No others.

So we'll take a break here at the 5 weeks point and take noteof the very different behavior of the top crawlers. Googlebotvisits once and looks at a substantial number of pages butdoesn't return for over a month. Slurp finds bad links and seems addicted to them as it stops crawling good pages untilit is told to lay off the bad liquor, er that is links bygetting robots.txt to slap slurp to its senses. MSNbot visitslooking for that robots.txt and won't crawl any pages untiltold what NOT to do by the robots.txt file. Teoma just crawlslike crazy, takes breaks, then comes back for more.

This behavior may imitate the differing personalities of thesoftware engineers who designed them. Teoma is tenacious and hard working. MSNbot is timid and needs instruction and somereassurance it is doing the right thing, picks up pages slowlyand carefully. Slurp has addictive personality and performserratically on a random schedule. Googlebot takes a good longlook and leaves. Who knows whether it will be back and when.

Now let's look at indexing by each engine. As of this writingon July 7, each engine also shows differing indexing behavioras well. Google shows no pages indexed although it crawled 250 pages nearly two months ago. Yahoo has three pages indexedin a clear aging routine that doesn't list any of the nearly8,000 pages it has crawled to date (not all itemized above.)MSN has 187 pages indexed while crawling fewer pages thanany of the others. Ask Jeeves has crawled more pages to datethan any search engine, yet has not indexed a single page.

Each of the engines will show the number of pages indexed ifyou use the query operator "site:publish101.com" without thequotes. MSN 187 pages, Ask none, Yahoo 3 pages, Google none.

The daily activity not listed in the three weeks since June 16above has not varied dramatically, with Teoma crawling a bitmore than other engines, Slurp erratically up and down and MSN slowly gathering 30 to 50 pages daily. Google is absent.

Linking campaign has been minimal with posts to discussionlists, a couple of articles and some blog activity. Lookingback over this time it is apparent that a listing delay isactually quite sensible from the view of the search engines.Our site restructuring and bobbled robots.txt implementationseems to have abruptly stalled crawling but the indexingbehavior of each engine displays distinctly differing policyby each major player.

The sandbox is apparently not just Google's playground, butit is certainly tiresome after nearly two months. I think I'dlike to leave for home, have some lunch and take a nap now.

Back to class before we leave for the day kiddies. What didwe learn today? Watch early crawler activity and be certainto implement robots.txt early and adjust often for bad bots.Oh yes, and the sandbox belongs to all search engines.

Mike Banks Valentine is a search engine optimization specialistwho operates http://WebSite101.com and will continue reports ofcase study chronicling search indexing of http://Publish101.com

RELATED ARTICLES

Martial Arts Webmasters: Time to Optimize Your Site!
A few months ago I was looking through the search engines to see if my website was even found for certain keywords. Well it wasn't. I knew something needed to be done, because I was losing potential customers to my Martial Art and Self Defense Classes. As the Internet grows exponentially, the important of a web presence is important.

Site Maps: A Force To Be Reckoned With
Another important component of search engine optimization is the use of site maps. If you want visitors -- and search engine spiders -- to find every page on your Web site, a site map can be your biggest ally especially if you have a lot of content on your site (and if you've been reading all the advice on our site, you should know by now that the more content you have the better your chances are for top ranking).

World of Website Promotion
Website promotion is a big and ongoing process. Every person who has website should have little knowledge about various elements involved in website promotion even if he had hired a SEO. In this series of articles I had tried to give an overview of all the entities of search engine promotion.

Website Optimization, Good Overall Optimization is Key
Good overall optimization, the right keyword phrases and quality content play the key roles in the success of any web design project. Link Popularity and Google PageRank are almost secondary for the overall success of a website.

History of World / Regional Search Engines and Directories
Computers have become a way of life for people around the world. They are used to research term papers, check weather forecasts, track military progress, exchange ideas (blogs and chat) and to find the cheapest price on items etc. It is no surprise that as the computer age takes hold computer usage has increased. The number of websites that are being developed on the World Wide Web is growing at an ever increasing exponential amount. And because we live in a quick-fix society, with limited time on our hands, we need something to make surfing the web a lot easier, something that will sort out all this influx of information into a logical order.

Search Engine Marketing: Choosing Keyword Phrases
Selecting the right keyword phrases is the key to a successful search engine marketing campaign.

Achieving Better Search Engine Optimization
The search engine giants are locked in an all out power struggle to get your attention and patronage.

Why You Should NOT Submit Your Site On Search Engines
Before to answer to this question we have to know what is the difference between a search engine and directory. Here is a brief explanation.

21 Search Engine Terms Every Web Marketer Should Know Part 1
1. Search Engine - Is a database of web sites that is ranked according to the computerized criteria that the programmers decide upon called an algorithm. Various search engines determine ranking on their own different factors of importance or relevancy. For the last few years the Google search engine was the most popular search engine supplying the search results for Yahoo and to a lesser extent MSN and AOL. This all changed recently after Yahoo purchased different search engine companies and developed its own search engine. Soon MSN will enter this market with its own search engine algorithm.

Search Engine Optimization - A Beginners Guide
Getting your site listed in the top search engines, such as Google, Yahoo, or MSN is no small job. There is lots of work that needs to be done to guarantee the highest placement possible, and even more work is needed to keep your ranking for any period of time. Here are some simple tips and strategies to keep your site listed, and listed well, without spending any extra cash on pay per clicks.

Getting Honest With The Search Engines
Getting Honest With The Search Engines

How Search Engines Connect Sellers and Buyers
Maggie knows how to find what she wants. She lets her fingers do the walking ? not in the Yellow Pages, but at Google.com. She wants to learn about bread baking, and you have just written Bread Baking Made Simple, and you sell some great baking tools. The good news is the Google and other search engines exist for one simple reason: to help Maggie find your website.

5 Things to Keep an Eye on in the SEO World in 2005...
After the latest PR update at Google and MSN's beta search going live, there is one thing for certain in 2005: the world of search is in for some major changes. There has been growing speculation around the SEO world that reciprocal linking is a thing of the past. Rumors are abound that PR means less and less, if anything. Bill Gates came out of his cave to say that "Today's search is nothing" and that it won't be that way for long. There are quiet rumblings in the SEO back alleys of a new, state-of-the-art search engine currently indexing the internet. Websites are dropping off the face of the planet. And we're all left to sit here and put together the pieces. So what is in store for 2005?

Googles Good-Writing Filter
I was recently struck by the fact that the top-ranking web pages on Google are consistently much better written than the vast majority of what one reads on the web. Yet traditional SEO wisdom has little to say about good writing. Does Google, the world's wealthiest media company, really rank web pages based primarily on arcane technical criteria such as keyword density, link text, or even PageRank?

Website Ranking With an Internet Marketing Specialist
On the internet, competition is stronger than ever. There was a time where paying a few bucks to get in Yahoo was enough to generate substantial traffic but marketing websites on the internet got much more complex since. Google is now a major player in the search engine industry and any serious internet marketing specialist and seo expert knows how important it is to get a good website ranking in that popular search engine. Understanding Google's algorythm along with having good html and writing skills can often make the difference between being an amateur or a good internet marketing specialist. Although, many other aspects that we will cover here should be taken into consideration when comes the time to find the right internet marketing specialist for your website.

Breaking the Myth About Page Rank (PR)
The most difficult challenge most web designers face is getting traffic to your site. There are plenty of companies who promise to send traffic your way. Sadly, most of this traffic is not qualified. Yes, your hit counter will move higher, however, if its not qualified, you may find you have unhappy visitors to your site. Unhappy visitors will not click on your ads or purchase your products.

SEO: The Good, The Bad And The Ugly
I seem to have created quite a stir, on a particular SEO forum recently. In fact, rumor has it, at one point, my article, "Google's Trap, DMOZ's Nap, And Yahoo!'s Crap" was the hottest topic discussed on this particular forum.

Linking for Traffic not Positioning!
With more and more experts and search engine enthusiastsclaiming the right way and the wrong way to handle linkswapping, link exchanging or reciprocal linking!

Get a Number One Google Ranking With This Simple Technique
You probably do this already - complete regular searches in Google for your key phrases and see how high you rank. It's well known that the first three results are far and away the sites that get the most clicks. If you can get one of the top three results in your key terms then you will have more targeted visitors coming to your site. If you can get the first result, well that is even better. Of course all your competitors want to do the same.

Keywords are the ?KEY? to a Popular and Profitable Web Site
Keyword Research will reveal answers to 3 critical questions:

home | site map | www.1001topwords.com