Googles Original Patent

Sunday, August 27th, 2006 by Shane Quigley

The following is Googles Original Patent regarding its methods of ranking sites. I have attached my thoughts in italics below it piece of legal wording.

1. A method for scoring a document, comprising: identifying a document; obtaining one or more types of history data associated with the document; and generating a score for the document based on the one or more types of history data.

This document details the various methods Google is either employing, planning to employ or has thought of employing to a document or group of related documents in order to attach a point score to that document, for the purposes of ranking.

2. The method of claim 1, wherein the one or more types of history data includes information relating to an inception date; and wherein the generating a score includes: determining an inception date corresponding to the document, and scoring the document based, at least in part, on the inception date corresponding to the document.

Historical data is to be used to rank pages, at least in part. The inception date of the document is to be taken into account. I assume this “inception date” would be the first date the page was crawled. Either that or Google is looking at meta data contained in each file which lists its creation date and modified date, or both!

Where a content management system is involved, it would be interesting to see what information is being passed to robots. Whether a creation date relates to the first time that page was dynamically created and then each further request for it writes a new modified date. Alternatively, Google might see the creation date and modified date as the same moment that it requested this dynamic page, as they are constructed on the fly.

Either way, this doesn’t reveal which set of values would create the highest point score.

3. The method of claim 2, wherein the document includes a plurality of documents; and wherein the scoring the document includes: determining an age of each of the documents based on the inception dates corresponding to the documents, determining an average age of the documents based on the ages of the documents, and scoring the documents based, at least in part, on a difference between the ages of the documents and the average age.

Documents, which are related in some fashion, would be scored according to their age, the average age and the relationship between their ages.

This could be referring to a group of documents with similar content, a group of linked documents on separate sites or quite simply a whole site.

4. The method of claim 2, wherein the generating a score for the document includes scoring the document based, at least in part, on an elapsed time measured from the inception date corresponding to the document.

Does this mean a score would rise or fall over time, against the age of a document?

5. The method of claim 2, wherein the inception date corresponding to the document is based on at least one of a date when a search engine first discovers the document, a date when a search engine first discovers a link to the document, and a date when the document includes at least a predetermined number of pages.

Here they expand upon the “inception date” mentioned in claim 2, which would now be defined as one or a mixture of:

1. The date a document is first discovered by the search engine.
2. The date a link is first discovered to the document.
3. The date the document includes at least a certain amount of pages.

This is all very interesting but now I am unsure exactly what “document’ refers to. In this context I hypothesise that they are referring to the first discovered page of a new site in their index. I come to this conclusion because a web page can’t really contain other pages; if they meant links they would have said links, surely. So, in conclusion, a site in Google’s eyes or a group of content related documents on the same site, will only score points to a preset level until they reach a desired number of pages in total.

6. The method of claim 1, wherein the one or more types of history data includes information relating to a manner in which a content of the document changes over time; and wherein the generating a score includes: determining a frequency at which the content of the document changes over time, and scoring the document based, at least in part, on the frequency at which the content of the document changes over time.

This is something most optimisers have believed for a long time, that a page ranks higher the more frequently it changes.

7. The method of claim 6, wherein the frequency at which the content of the document changes is based on at least one of an average time between the changes, a number of changes in a time period, and a comparison of a rate of change in a current time period with a rate of change in a previous time period.

More interestingly, the scoring for changes to a page is more sophisticated and based upon either:

1. An average time between changes.
2. The number of changes over a set period.
3. An analysis of the rate of change over time.

Which could mean a page will rank higher if it maintains a rate of change over time or steadily climbs, because if it climbs to soon, it could be a seasonal change perhaps and if it peaks to soon and then falls, a page will be seen as becoming less popular and start to fall in rank.

8. The method of claim 6, wherein the generating a score further includes: determining an amount by which the content of the document changes over time, and scoring the document based, at least in part, on the frequency at which and the amount by which the content of the document changes over time.

Now we are point scoring a document not only based upon changes over time but quantity of change against changes over time.

9. The method of claim 8, wherein the amount by which the content of the document changes is based on at least one of a number of new pages associated with the document within a time period, a ratio of a number of new pages associated with the document versus a total number of pages associated with the document, and a percentage of the content of the document that has changed during a time period.

Included in the evaluation of the change of this content are off page factors, such as:

1. Number of new pages associated with the document.
2. Number of new pages compared with existing pages associated to the document.

All evaluated, once more, over a time frame.

10. The method of claim 8, wherein the determining an amount by which the content of the document changes includes: weighting different portions of the content of the document differently based on a perceived importance of the portions, and determining the amount by which the content of the document changes as a function of the differently weighted portions of the content.

We all know different sections of a page score differently in search engines, but this discusses evaluating quantity changes of different portions of a page and scoring the portions according to their overall relevance of a documents rank against percentage change of that portion.

11. The method of claim 6, wherein the document includes a plurality of documents; and wherein the scoring the document includes: determining a date on which the content of each of the documents last changed, determining an average date of change based on the determined dates on which the contents of the documents last changed, and scoring the documents based, at least in part, on a difference between the dates on which the contents of the documents last changed and the average date of change.

This is an expansion on the comparison of dates of change of a group of documents, looking at the time between changes and the average date of change.

12. The method of claim 1, wherein the one or more types of history data includes information relating to a manner in which a content of the document changes over time; and wherein the generating a score includes: determining an amount by which the content of the document changes over time, and scoring the document based, at least in part, on the amount by which the content of the document changes over time.

Fairly straight forward, looking at changes the amount of change to a document over time.

13. The method of claim 12, wherein the amount by which the content of the document changes is based on at least one of a number of new pages associated with the document within a time period, a ratio of a number of new pages associated with the document versus a total number of pages associated with the document, and a percentage of the content of the document that has changed during a time period.

Repetition of point 8 I think?*******

14. The method of claim 12, wherein the determining an amount by which the content of the document changes includes: weighting different portions of the content of the document differently based on a perceived importance of the portions, and determining the amount by which the content of the document changes as a function of the differently weighted portions of the content.

Repetition of point 10 I think?*******

15. The method of claim 1, wherein the one or more types of history data includes information relating to how often the document is selected when the document is included in a set of search results; and wherein the generating a score includes: determining an extent to which the document is selected over time when the document is included in a set of search results, and scoring the document based, at least in part, on the extent to which the document is selected over time when the document is included in the set of search results.

This statement relates to what we know as click through rates, applied in most people’s frame of reference to Google Adwords Campaigns but suspected for a while now to be monitored in “natural” search results.

16. The method of claim 15, wherein the scoring the document includes assigning a higher score to the document when the document is selected more often than other documents in the set of search results over a time period.

Obviously a site chosen often within a set of search results, and therefore have a high click through rate would be awarded a high point score.

17. The method of claim 1, wherein the one or more types of history data includes information relating to search terms that increasingly appear in search queries over time; and wherein the generating a score includes: determining whether the document is associated with the search terms, and scoring the document based, at least in part, on whether the document is associated with the search terms.

I think this is saying that as a site / document starts to appear more often in searches performed, for various terms, it will have a higher probability of ranking highly for any term it is considered relevant for.

18. The method of claim 1, wherein the one or more types of history data includes information relating to queries that remain approximately constant over time but lead to results that change over time; and wherein the generating a score includes: determining whether the document is associated with queries that lead to results that change over time, and scoring the document based, at least in part, on whether the document is associated with queries that lead to results that change over time.

This, in my opinion, is outlining a further point scoring for a site or page that, in a highly competitive set of search results, that are changing a lot over time, yet remains within this set of results, i.e. is robust over time, will gain additional points for this.

19. The method of claim 1, wherein the one or more types of history data includes information relating to staleness of documents; and wherein the generating a score includes: determining whether the document is stale, and scoring the document based, at least in part, on whether the document is stale.

If a document is not changing over time, then it is deemed to be stale and will therefore fall in the rankings because it will lose points over time.

20. The method of claim 19, wherein the scoring the document includes: determining whether stale documents are considered favorable for a search query when the document is determined to be stale, and scoring the document based, at least in part, on whether stale documents are considered favorable for the search query when the document is determined to be stale.

This defines a modifier to claim 19 that although a document may be stale it may still be classed as “favourable” and therefore achieve points for this.

21. The method of claim 20, wherein the determining whether stale documents are considered favorable for the search query is based, at least in part, on how often stale documents were selected over recent documents over time for the search query.

“Favourable” means this stale document is still being selected from a set of search results over content that is viewed to be fresher. For instance, although a page / document may not have changed for 6 months, it could still occupy a high position because searchers are still reading that document, e.g. a health and safety notice would remain fairly constant over time but still be highly relevant and therefore warrant a high position within a set of results.

22. The method of claim 1, wherein the one or more types of history data includes information relating to behavior of links over time; and wherein the generating a score includes: determining behavior of links associated with the document, and scoring the document based, at least in part, on the behavior of links associated with the document.

The tracking of links to a document over time based on behaviour.

23. The method of claim 22, wherein the behavior of links relate to at least one of appearance and disappearance of one or more links pointing to the document.

This behaviour being the appearance and disappearance of links to a document.

24. The method of claim 23, wherein the appearance of one or more links relates to at least one of a date that a new link to the document appears, a rate at which the one or more links appear over time, and a number of the one or more links that appear during a time period, and the disappearance of one or more links relates to at least one of a date that an existing link to the document disappears, a rate at which the one or more links disappear over time, and a number of the one or more links that disappear during a time period.

Behaviour will be tracked based upon date of appearance and disappearance, rate of and quantity of.

25. The method of claim 22, wherein the determining behavior of links associated with the document includes monitoring at least one of time-varying behavior of links associated with the document, how many links associated with the document appear or disappear during a time period, and whether there is a trend toward appearance of new links associated with the document versus disappearance of existing links associated with the document.

Here we are looking for trends or patterns on the appearance and disappearance of links and comparing the two.

26. The method of claim 1, wherein the one or more types of history data includes information relating to freshness of links; and wherein the generating a score includes: determining freshness of links associated with the document, assigning weights to the links based on the determined freshness, and scoring the document based, at least in part, on the weights assigned to the links associated with the document.

Links are now having scoring weights attached to them by freshness but again, it does not specify whether a good or bad score for fresh links!

27. The method of claim 26, wherein the freshness of a link associated with the document is based on at least one of a date of appearance of the link, a date of a change to the link, a date of appearance of anchor text associated with the link, a date of a change to anchor text associated with the link, a date of appearance of a linking document containing the link, and a date of a change to a linking document containing the link.

Links as well as being monitored based on when they appeared, when they were changed, when anchor text appeared with the link, the date that changed, the date the document containing the link appeared and a date that document changed.

So we are starting to see that data relating to pages between links is influencing the scores attached to both pages.

28. The method of claim 26, wherein the weight assigned to a link is based on at least one of how much a document containing the link is trusted, how authoritative a document containing the link is, and a freshness of a document containing the link.

One of the first paragraphs written in English!!

29. The method of claim 26, wherein the scoring the document includes: determining an age of each link pointing to the document, determining an age distribution associated with the links based on the ages of the links, and scoring the document based, at least in part, on the age distribution associated with the links.

Scores are further defined by the age distribution of the various links that are pointing to the document.

30. The method of claim 1, wherein the one or more types of history data includes information relating to a manner in which anchor text changes over time; and wherein the generating a score includes: identifying a change in anchor text associated with a link to the document, and scoring the document based, at least in part, on the change in anchor text associated with a link to the document.

Scores are given to changing anchor text over time.

31. The method of claim 1, wherein the one or more types of history data includes information relating to differences in documents and anchor text associated with links to the documents; and wherein the generating a score includes: determining whether a content of the document changes such that the content differs from anchor text associated with one or more links to the document, and scoring the document based, at least in part, on whether the content of the document changes such that the content differs from the anchor text associated with one or more links to the document.

This discusses how anchor text of links to a document is related to the content of that document and whether over time as a document changes, does the anchor text change accordingly.

32. The method of claim 1, wherein the one or more types of history data includes information relating to freshness of anchor text; and wherein the generating a score includes: determining freshness of anchor text associated with one or more links to the document, and scoring the document based, at least in part, on the freshness of anchor text associated with one or more links to the document.

Attaching scores to the freshness of anchor text related to documents. I am struggling to see how this is relevant, who changes their anchor text on a regular basis??

33. The method of claim 32, wherein the freshness of anchor text associated with a link to the document is based on at least one of a date of appearance of the anchor text, a date of a change to the anchor text, a date of appearance of a link associated with the anchor text, a date of a change to a link associated with the anchor text, a date of appearance of the document, and a date of a change to the document.

This is just pulling together all the different freshness levels of the anchor text, the links and the document and scoring in relation to these factors own relationship.

34. The method of claim 1, wherein the one or more types of history data includes information relating to traffic associated with documents; and wherein the generating a score includes: determining characteristics of traffic associated with the document, and scoring the document based, at least in part, on the characteristics of traffic associated with the document.

Traffic is being brought into the scoring of a document and the characteristics of this traffic. At this point I assume most of this data is taken from Alexa, a close partner of Google’s owned by Amazon.

35. The method of claim 34, wherein the determining characteristics of traffic associated with the document includes analyzing a traffic pattern associated with the document to identify changes in the traffic pattern over time.

Defining that the characteristics of traffic relating to a document will be analysed over time.

36. The method of claim 1, wherein the one or more types of history data includes information relating to user behavior associated with documents; and wherein the generating a score includes: determining user behavior associated with the document, and scoring the document based, at least in part, on the user behavior associated with the document.

Scores allocated to documents relevance are also related to how they behave when they visit a document. An example of this would be if a person chose a site from a page of Google results and then immediately hit the back button, indicating this content was not what they were looking for. Google tracks this over time and a document would fall in the ranking if this repeatedly happened.

37. The method of claim 36, wherein the user behavior relates to at least one of a number of times that the document is selected within a set of search results and an amount of time that one or more users spend accessing the document.

Click through rate is being combined here with total time spent on a page found through Google and a weighting attached.

38. The method of claim 1, wherein the one or more types of history data includes domain-related information corresponding to domains associated with documents; and wherein the generating a score includes: analyzing domain-related information corresponding to a domain associated with the document over time, and scoring the document based, at least in part, on a result of the analyzing.

Domains are now being used to score an associated site or page.

39. The method of claim 38, wherein the scoring the document includes: determining whether the domain associated with the document is legitimate, and scoring the document based, at least in part, on whether the domain associated with the document is legitimate.

Difficult to not what Google sees as legitimate?

40. The method of claim 38, wherein the domain-related information is related to at least one of an expiration date of the domain, a domain name server record associated with the domain, and a name server associated with the domain.

Domains will be scored based on:

1. Expiry Date
2. The DNS record of the domain
3. The Name Server associated with the domain.

Presumably the longer the expiry date the better, indicates a more serious intent to provide reliable content, as will using a respected host and name server.

41. The method of claim 1, wherein the one or more types of history data includes information relating to a prior ranking history of documents; and wherein the generating a score includes: determining a prior ranking history of the document, and scoring the document based, at least in part, on the prior ranking history of the document.

Previous ranking of a page / site can influence the rank for future searches. This must mean for other terms otherwise this doesn’t really make sense.

42. The method of claim 41, wherein the scoring the document includes: determining a quantity or rate that the document moves in rankings over a time period, and scoring the document based, at least in part, on the quantity or rate that the document moves in the rankings.

How rapidly a document moves up the rankings over time can create a multiplier effect by the sounds of it.

43. The method of claim 41, wherein the prior ranking history is based on at least one of a number of queries for which the document is selected as a search result over time, a rate at which the document is selected as a search result over time, seasonality, burstiness, and changes in scores over time for a URL-query pair.

Combine and appraising sets of results for a documents propensity to be chosen in a set of results and its movement rate and style through the results.

44. The method of claim 41, wherein the determining a prior ranking history of the document includes monitoring a rank of the document over time for spikes in the rank.

Monitoring spikes in a documents rank presumably for seasonal fluctuations, a document may rank higher at certain times of a year or day??? That is purely speculation.

45. The method of claim 1, wherein the one or more types of history data includes information relating to user maintained or generated data; and wherein the generating a score includes: determining whether user maintained or generated data indicates that the document is of interest to a user, and scoring the document based, at least in part, on whether the user maintained or generated data indicates that the document is of interest to a user.

Monitoring the interest users pay to documents and weighting them accordingly?

46. The method of claim 45, wherein the user maintained or generated data relates to at least one of favorites lists, bookmarks, temp files, and cache files associated with one or a plurality of users.

Google will be monitoring user data in the following formats:

1. Favourites lists
2. Bookmarks
3. Temp files
4. Cached files

The only way to do this is by monitoring people’s locally stored files relating to their Internet browser activity. I would guess this is achieved either through the Google Toolbar or by installing what is commonly described as spyware.

47. The method of claim 45, wherein the scoring the document includes: analyzing the user maintained or generated data over time to identify at least one of trends to add or remove the document, a rate at which the document is added to or removed from the user maintained or generated data, and whether the document is added to, deleted from, or accessed through the user maintained or generated data, and scoring the document based, at least in part, on a result of the analyzing.

This covered in my previous notes.

48. The method of claim 1, wherein the one or more types of history data includes information relating to growth profiles of anchor text; and wherein the generating a score includes: determining a growth profile of anchor text associated with one or more links to the document, and scoring the document based, at least in part, on the growth profile of anchor text associated with one or more links to the document.

Monitoring the growth profiles of anchor text, presumably to see if they are all the same to identify people trying to manipulate their index, this is something I have always disagreed with because there are numerous reasons why the majority of anchor text would be identical.

49. The method of claim 1, wherein the one or more types of history data includes information relating to linkage of independent peers; and wherein the generating a score includes: determining a growth in a number of independent peers that include the document, and scoring the document based, at least in part, on the number of independent peers.

Simply tracking the growth of independent links pointing to a site or page.

50. The method of claim 1, wherein the one or more types of history data includes information relating to document topics; and wherein the generating a score includes: performing topic extraction relating to the document, monitoring a topic of the document for changes over time, and scoring the document based, at least in part, on changes to the topic of the document.

Scoring a document based upon its topic and the growth of that topic over time. So is relevant information growing on the site and how fast is it growing?

51. The method of claim 1, further comprising: obtaining a search query, where the identified document is identified as relevant to the search query; and generating a relevancy score for the document based on how relevant the document is to the search query; and wherein the generating a score for the document is based, at least in part, on the one or more types of history data and the relevancy score.

Starting to talk about content being relevant to the search made.

52. A system for scoring a document, comprising: means for identifying a document; means for obtaining a plurality of types of history data associated with the document; and means for generating a score for the document based, at least in part, on the plurality of types of history data.

Documents are assigned a score based on the historical elements we have been discussing.

53. A system for scoring a document, comprising: a history component configured to obtain one or more types of history data associated with a document; and a ranking component configured to: generate a score for the document based, at least in part, on the one or more types of history data.

54. A method for ranking a linked document, comprising: determining an age of linkage data associated with the linked document; and ranking the linked document based on a decaying function of the age of the linkage data.

55. The method of claim 54, wherein the linkage data includes at least one link.

56. The method of claim 54, wherein the linkage data includes anchor text.

57. The method of claim 54, wherein the linkage data includes a rank based, at least in part, on links and anchor text provided by one or more linking documents and related to the linked document.

58. The method of claim 57, further comprising: determining longevity of the linkage data; deriving an indication of content update for a linking document providing the linkage data; and adjusting the ranking of the linked document based on the longevity of the linkage data and the indication of content update for the linking document.

59. The method of claim 58, wherein the adjusting the ranking includes penalizing the ranking if the longevity indicates a short life for the linkage data and boosting the ranking if the longevity indicates a long life for the linkage data.

This is the first mention of good or bad results based on any of the factors discussed. It also confirms what most SEO’s call the Google Sandbox Effect. Where a new site, no matter how well optimised or how many links have been built, experiences poor results for a good few months, sometimes more than six, at the start of its existence.

60. The method of claim 59, wherein the adjusting the ranking further includes penalizing the ranking if at least a portion of content from the linking document is considered stale over a period of time and boosting the ranking if the portion of content from the linking document is considered updated over the period of time.

Rather than just penalising pages for staleness, they are also rewarding pages for freshness. This creates a huge gap between sites and shows even further why it is so hard to make ground on long standing competitors in Google.

61. The method of claim 54, further comprising: determining an indication of link churn for a linking document providing the linkage data; and based on the link churn, adjusting the ranking of the linked document.

62. The method of claim 61, wherein the indication of link churn is computed as a function of an extent to which one or more links provided by the linking document change over time.
Defining link churn as the rate of change of links pointing to a document over time. This must mean the appearance and disappearance of links as opposed to the anchor text, otherwise that would be stated.

63. The method of claim 62, wherein adjusting the ranking includes penalizing the ranking if the link churn is above a threshold.

If links appear and disappear on a frequent basis then you will be penalised.

These are Epiphanys thoughts and are not proven, only born out of our experience in the market place.

Be the first to rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars

Bookmark and share ' Googles Original Patent '

Posted in: SEO, Search Engine Marketing, Search Engine Optimisation

Respond to 'Googles Original Patent'




Top Rated Blog Posts
Latest Blog Posts
Recent Comments & Trackbacks
  • dil okulu: is there any information about this in other languages, maybe german or other else?
  • epurplemedia: it is a very good tool. Bust sometime there are problems in showing the exact traffic to your...
  • Outsource SEO or Handle SEO In-House? How to Decide: on the Search Marketing blog, Daniel Peden wonders if...
  • Daniel Peden: I agree that some bigger companies will bring an SEO / SEM person into their web teams, but...
  • Eloi: Good post there Daniel, I do think experience is one of the most important things in SEO, as there is...