Google applied for a patent on their ranking algorithm as of 15months ago on December 31, 2003 and that application was postedon March 31st at the US Patent Office. It got the discussionforums buzzing this weekend. Even though I had substantial workto do and was behind on a project, I couldn't resist thetemptation to read the very long 14,000 word, 45 pageapplication and see what it could mean to the volatile world ofsearch.

So I tripped on over to the the US Patent & Trademark Office(USPTO) and started reading the document United States PatentApplication: 0050071741 seems to be Google applying for apatent on their search algorithm. There seems to be noreference to PageRank here, but it seems to be PageRankredefined with a few variations to limit link spamming andreduce stale results, along with multiple innovative elementsnot previously considered.

They discuss link spamming limitations extensively, which wouldbe a welcome relief as Linking Psychosis is rampant and I'dlike to see an end to it. Much of historical data related topages seems to be a bit onerous because it would appear tolimit the perceived value of a page unless it becomes wildlypopular over time. Bigger is better seems to be a enduringtheme of this algorithm as described generically in text oftheir application.

An odd addition to the historical ranking discussion isamazingly - the "Advertising Traffic" for a particulardocument! They will rank a site based on the advertiserchoosing to advertise on a particular site. If Amazon wants toadvertise on your site, then Google will rank you higher!

That's good, I guess, if you have a site that attracts highlyrated advertising, and don't rely on cross promotion of yourseparate products or those of suppliers to appear in your siteadvertising. Example: If I have a discussion forum on coffee,don't I want to advertise my coffee products? Why would I serveads from highly rated advertiser Starbucks to rank higher atGoogle? What if I sell thousands of products and simply crosspromote and upsell my own products sitewide? Odd stuff, rankingbased on advertisers.

How does affiliate advertising factor into that advertisingelement of the algorithm? Do they know you are advertising abook from Amazon as part of affiliate program through yourdirect Amazon affiliate program links and do they recognizetracking links through affiliate management companiesdifferently than the tracking URL's of ad serving monsters likeDoubleClick and confer higher ranking upon the big boys ofadvertising above affiliate tracking firms?

Also seems to call into question their own Adsense ads and howthat factors into this algorithm! Do the Adsense ads along myblog border gain more ranking score because it is from amonster advertising company - Google - or is it downgradedbecause I'm not a "Premium" advertiser serving over 20 millioncontent page views? Again, seems that reward for being largeoutweighs relevance in this formula. Or does it? How do theyvalue Overture advertising in the formula? Adbrite? Smaller adnetworks versus large advertising aggregators?

They extensively discuss historical data related to rankingsover time, looking at seasonality, popularity during spikes intraffic due to news coverage of a particular topics and changesin ranking related to those items. The historical data relatedto ranking over time are interesting since they refer to linkspamming, relevance, and topicality when they say:

"As a further measure to differentiate a document related to atopical phenomenon from a spam document, search engine may consider mentions of the document in news articles, discussiongroups, etc. on the theory that spam documents will not bementioned, for example, in the news. Any or a combination ofthese techniques may be used to curtail spamming attempts."

They've added another interesting element in the algorithm ofdetermining value of pages based on "user maintained/generateddata" (patent item 113) read that "bookmarks" and "favoriteslists" built into your browser. Is this one of the reasons thatGoogle recently hired Ben Goodger, the lead developer of Firefox?

Snooping into my favorites and cookies on my machine seems likea bit more than I want Google doing on MY machine. It strainsthe limits of privacy as well. We can stop sites from servingus cookies, but can't stop who reads them? Ouch!

Further, they reference user's browser cache files as a methodof determining value of a site. "For example, the "temp" orcache files associated with users could be monitored by searchengine to identify whether there is an increase or decrease ina document being added over time. Similarly, cookies associatedwith a particular document might be monitored by search engineto determine whether there is an upward or downward trend ininterest in the document." Apparently they can see this info,but I'd like them to stay out of my cache and cookies too!

It appears to apply further penalties to new sites by keepingthem poorly ranked for even longer periods and applies anapparently new item to algorithms not seen or (at leastdiscussed publicly) of long term purchase of domain names andhistorical data related to IP address and hosting company!Here's the snip about that longevity of domain registration to ranking:

"[0099] Certain signals may be used to distinguish betweenillegitimate and legitimate domains. For example, domains canbe renewed up to a period of 10 years. Valuable (legitimate)domains are often paid for several years in advance, whiledoorway (illegitimate) domains rarely are used for more than ayear. Therefore, the date when a domain expires in the futurecan be used as a factor in predicting the legitimacy of adomain and, thus, the documents associated therewith."

I'll be extending the term of my domain registrations ASAP!What a boon to registrars if that element of ranking becomes asvalued as linking has been! Everyone will get 10 yearregistrations if they want to rank well. The domain nameaftermarket will also be changed dramatically if this becomesas important as this element makes it appear to ranking. Peoplewill buy and sell domains when disposing of them rather thansimply letting them expire at the end of the registrationperiod, as most do now.

It appears they will be penalizing domains "associated" with"illegitimate" domains. Hopefully they have a method ofdetermining that it isn't a competitor linking to your domainfrom their "illegitimate" domain! That suggests they will beable to eliminate "Domain Scrapers" that have been known toscrape search engine results of high ranking domains andposting those on "illegitimate domains" which in effect dragsdown the ranking of those previously highly ranked domains. Howodd the search world is sometimes!

Altogether, it seems that older content will suffer overallbecause it hasn't changed, because nobody new is linking to itand because it will lose links over time. What if you areposting a historical document that you can't change or anauthored piece that is copyrighted? Does it decrease the valueof the information? Hmmmm. I guess links would continue toincrease if the information remains valuable, so there is someprotection in that. But older site content may be unchangedbecause it is popular, not because it is stale - that's an oddCatch-22.

The anchor text issue discussed in this patent applicationsuggests that "[0118] Unique Words, Bigrams, Phrases in AnchorText " are significant in determining rank, because if naturallinks develop, they would vary when webmasters link to adocument differently, some would use the URL and embed the linkin that, others would use requested text from the webmaster ifit were a link request that successfully garnered a link andstill others might simply use Google's own Blogger "Blog This"link which simply takes the page title. (I routinely changelink text generated by "Blog This" in my blog posts toemphasize the topic discussed and eliminatebusiness/publication names usually added ahead of the topic ofthe page.)

The US Patent office has a link to images includingillustrations and figures that are linked to the filing butthey are absurdly large and don't fit in the viewable framedwindow. This is silliness. Do they mean to hide it by making itunviewable?

I'll attempt to post a smaller version of images on my blog.

The final notable item seems to me to be the clickthroughdata that Google sees to sites from their own search results.They will rank site higher that get significant clickthroughrates from the Google SERP's.

"Google may monitor the number of times that a document isselected from a set of search results and/or the amount oftime one or more users spend accessing the document. Searchengine may then score the document based, at least in part, onthis information."

How will they know how long I spend accessing the documentunless they can monitor my actions AFTER I've left the GoogleSERP's to visit the linked site? Wonder what's at work in that?Do they have some way of tracking our actions after we leavetheir site? I wonder if this has anything to do with the Googleacquisition of Urchin traffic statistics company last week.

Well, it's back to work for now, but it will be interesting tosee where this patent application is discussed in forums andSEO blogs over the coming week.

