Hitech News, Technology News, IT News, SEO News

Menu
Home
Tech
SEO
Google Adsense
VOIP


SEO News

Introducing SEOmoz's Updated Page Authority and Domain Authority

2011-11-23 22:02

Posted by Matt Peters

Here at Moz, we take metrics and analytics seriously and work hard to ensure that our metrics are first rate. Among our most important link metrics are Page Authority and Domain Authority. Accordingly, we have been working to improve these so that they more accurately reflect a given page or domain's ability to rank in search results. This blog entry provides an overview of these metrics and introduces our new Authority models with a deep technical dive.

What are Page and Domain Authority?

Page and Domain Authority are machine learning ranking models that predict the likelihood of a single page or domain to rank in search results, regardless of page content. Their input is the 41 link metrics available in our Linkscape URL Metrics API call and their output is a score on a scale from 1 to 100. They are keyword agnostic because they do not use any information about the page content.

Why are Page and Domain Authority being updated?

Since these models predict search engine position, it is important to update them periodically to capture changes in the search engines' ranking algorithms. In addition, this update includes some changes to the underlying models resulting in increased accuracy. Our favorite measure of accuracy is the mean Spearman Correlation over a collection of SERPs. The next chart compares the correlations on several previous indices and the next index release (Index 47).

The new model out performs the old model on the same data using the top 30 search results, and performs better if more results are used (top 50). Note that these are out of sample predictions.

When will the models change? Will this affect my scores?

The models will be updated when we roll out the next Linkscape index update, sometime during the week of November 28. Your scores will likely change a little, and may potentially change by as many as 20 points or more. I'll present some data later in this post that shows most PRO and Free Trial members with campaigns will see a slight increase in their Page Authority.

What does this mean if I use Page Authority and Domain Authority data?

First, the metrics will be better at predicting search position, and Page Authority will remain the single highest correlated metric with search position that we have seen (including mozRank and the other 100+ metrics we examined in our Search Engine Ranking Factors study). However, since we don't yet have a good web spam scoring system, sites that manipulate search engines will slip by us (and look like an outlier), so a human review is still wise.

Before presenting some details of the models, I'd like to illustrate what we mean by a "machine learning ranking model." The table below shows the top 26 results for the keyword "pumpkin recipes" with a few of our Linkscape metrics (Google-US search engine; this is from an older data set and older index, but serves as a good illustration).

Pumpkin Recipes SERP result

As you can see, there is quite a spread among the different metrics illustrated, with some of the pages having a few links and others 1,000+ links. The Linking Root Domains are also spread from only 46 Linking Root Domains to 200,000+. The Page Authority model takes these link metrics as input (plus 36 other link metrics not shown) and predicts the SERP ordering. Since it only takes into account link metrics (and explicitly ignores any page or keyword content), but search engines take many ranking factors into consideration, the model cannot be 100% accurate. Indeed, in this SERP, the top result benefits from an exact domain match to the keyword and helps explain its #1 position despite its relatively low link metrics. However, since Page Authority only takes link metrics as input, it is a single aggregate score that explains how likely a page is to rank in search based only on links. Domain Authority is similar for domain wide ranking. The models are trained on a large collection of Google-US SERP results.

Despite restricting to only link metrics, the new Page and Domain Authority models do a good job of predicting SERP ordering and improve substantially over the existing models. This increased accuracy is due in part to the new model's ability to better separate pages with moderate Page Authority values into higher and lower scores.

This chart shows the distribution of the Page Authority values for the new and old models over a data set generated from 10,000+ SERPs that includes 200,000+ unique pages (similar to the one used in our Search Engine Ranking Factors). As you can see, the new model has "fatter tails" and moves some of the pages with moderate scores to higher and lower values resulting in better discriminating power. The average Page Authority for both sets is about the same, but the new model has a higher standard deviation, consistent with a larger spread. In addition to the smaller SERP data set, this larger spread is also present in our entire 40+ billion page index (plotted with the logarithm of page/domain count to see the details in the tails):

One interesting comparison is the change in Page Authority for the domains, subdomains and sub-folders PRO and Free Trial members are tracking in our campaign based tools.

The top left panel in the chart shows that the new model shifts the distribution of Page Authority for the active domains, subdomains and sub-folders to the right. The distribution of the change in Page Authority is included in the top right panel, and shows that most of the campaigns have a small increase in their scores (average increase is 3.7), with some sites increasing by 20 points or more. A scatter plot of the individual campaign changes is illustrated in the bottom panel, and shows that 82% of the active domains, subdomains and sub-folders will see an increase in their Page Authority (these are the dots above the gray line). It should be noted that these comparisons are based solely on changes in the model, and any additional links that these campaigns have acquired since the last index update will act to increase the scores (and conversely, any links that have been dropped will act to decrease scores).

The remainder of this post provides more detail about these metrics. To sum up this first part, the models underlying the Page and Domain Authority metrics will be updated with the next Linkscape index update. This will improve their ability to predict search position, due in part to the new model's better ability to separate pages based on their link profiles. Page Authority will remain the single highest correlated metric with search position that we have seen.

 


The rest of the post provides a deeper look at these models, and a lot of what follows is quite technical. Fortunately, none of this information is needed to actually use these Authority scores (just as understanding the details of Google's search algorithm is not necessary to use it). However, if you are curious about some of the details then read on.

The previous discussion has centered around distributions of Page Authority across a set of pages. To gain a better understanding of the models' characteristics, we need to explore its behavior on the inputs. However, the inputs are a 41 dimensional space and it's impossible (for me at least!) to visualize anything in 41 dimensions. As an alternative, we can attempt to reduce the dimensionality to something more manageable. The intuition here is that pages that have a lot of links probably have a lot of external links, followed links, a high mozRank, etc. Domains that have a lot of linking root domains probably have a lot of linking IPs, linking subdomains, a high domain mozRank, etc. One approach we could take is simply to select a subset of metrics (like the table in the "pumpkin recipes" SERP above) and examine those. However, this throws away the information from the other metrics and will inherently be more noisy then something that uses all of them. Principal Component Analysis (PCA) is an alternate approach that uses all of the data. Before diving into the PCA decomposition of the data, I'll take a step back and explain what PCA is with an example.

Principal Component Analysis is a technique that reduces dimensionality by projecting the data onto Principal Components (PC) that explain most of the variability in the original data.  This figure illustrates PCA on a small two dimensional data set:


Google
 
Web newsbundle.com
NewsBundle
Dish Network Satellite
experience the dish network difference. free, dvr, hbo, showtime, cinemax, & more

© 2004 NewsBundle.com · All rights reserved