A theory about how Penguin and the disavow tool work.

There is so much confusion about how the disavow tool works.  Does it only start working when there is either a reconsideration request filed or an update of the Penguin algorithm, or does it really start working right away?  In this article I want to put forward a theory I have that could explain some of the unusual things that people have seen when using this tool.  A lot of my discussion centers around Cyrus Shepard’s interesting experiment where he disavowed all of his site’s links.

Warning: This post contains a lot of theory.  I debated on whether or not to publish it because it is confusing.  Yet, my hope is that it will generate some good discussion and help us to understand more about the Penguin algorithm.

How the disavow tool is supposed to work.

If you’re reading this article you probably have a fairly good idea of how the disavow tool is supposed to work.  The tool allows you to upload a text file containing either urls or entire domains that you would like Google to ignore in regards to calculating PageRank to your site.  According to the official documentation for the disavow tool, what should be happening is that immediately following the uploading of a text file, the next time a link from that file is crawled the disavow will be applied.  When Google disavows a link, essentially what they do is add an invisible nofollow tag to that link.  If you have disavowed on the domain level, then whenever Google crawls that domain, if they find any links pointing to your site, they add an invisible nofollow tag.

Does the disavow tool only work when there is a major change like a reconsideration request or a Penguin update?

There are many people who have extensive experience with disavowing links who believe that the disavow only starts working when a major change happens such as the filing of a reconsideration request or a Penguin algorithm update.  Tim Grice and Cyrus Shepard had this discussion on Twitter:

 

But, this theory contradicts what Google says.  In the disavow tool documentation, they say:

…this information will be incorporated into our index as we recrawl the web and reprocess the pages that we see

To me that sounds like it happens fairly immediately.

Also, Google employee John Mueller has mentioned several times things that imply that the disavow tool starts working right away.  Here is John in a Webmaster Central Hangout on March 15, 2013.  Start watching at 31:09:

 

Site owner: You said that the disavow file is not being crawled until someone submits a reconsideration request.  Is that correct?

John: No.  As soon as you submit it, we use that when we recrawl the pages.

Here’s another.  Start watching at 8:27:

 

Site owner: Does the disavow tool only work after it is switched on manually by somebody?

John: That’s not the case. It essentially runs automatically and granularly as we reprocess those urls.  It’s not something where someone has to manually click a button.

This question was asked about Cyrus’ experiment.  Start watching at 13:44:

 

Site owner: Does Google really use data from the disavow tool only by major updates?

John: No. Essentially this is something that’s always used on an ongoing basis.  So, if you add links or domains to your disavow file and you upload that then the next time we crawl those urls we’ll essentially treat those links kind of like a nofollow link.  So, it’s not something that is only run periodically.  It’s essentially a part of our normal websearch systems that run all the time.

 

But why am I seeing things like this:

Google says that the disavow starts working immediately and not at the time of a major update.  If this is the case, then why am I seeing things like this -Here is a site that disavowed a number of links on May 14.  Nothing happened until May 22 which is the date of the Penguin 2.0 update:

 

Here is another site that did a fairly large disavow on May 3, and once again you can see that nothing really changed in site impressions until Penguin updated on May 22:

 

 

It really looks like the disavow tool accomplished nothing until Penguin hit.

My theory:

To understand my theory of why it appears that the disavow tool appears to start working when there is a major change like a Penguin update, first, we need to understand Penguin.  I wrote that and started laughing, because really, no one understands Penguin.  But, let’s start by explaining some things that we think we know about Penguin.

Trying to understand Penguin:

When Google introduced the Penguin algorithm update on April 24, 2012, what happened was that a lot of sites that had created large numbers of self made backlinks suddenly started to lose their rankings.  Each time the algorithm updates or refreshes there are more reports of sites dropping out of their high ranking positions.

Some people believe that all that Penguin does is devalue the links that are perceived as unnatural.  Others believe that it actually penalizes a site that has too many unnatural links.  Although we don’t have a direct answer from Google on this, here are some comments from John Mueller that are relevant:

 

 

Site owner: Does Penguin just dampen/reduce the power/effect of unnatural links, rendering them useless, thus causing one’s rankings to drop?  Or, does it also add an additional penalty drop to this existing reduced rank?

John: Essentially the two are kind of similar in that when our Penguin algorithm recognizes webspam that it has to react to we try not to show that site in the search results so frequently.  So, essentially, why you are not showing up so high in search results, if that’s because we are demoting the site in general it doesn’t really matter that much to the webmaster.  It’s more the case that you just see that the site is kind of dropping in ranking.

 

 

Site owner: Is Penguin a filter that is placed on a site that makes it difficult for that site to rank well and then can that filter be lifted when Penguin refreshes, provided that the site owner has done the work to clean the backlink profile?

(In other words – Is Penguin a switch that is turned off or on to say whether or not a site is penalized?)

John: [Penguin] isn’t something that is either turned on or turned off.  It’s really something that is more of a granular algorithmic change.  Sometimes it may be that a site is lightly affected by algorithms like this and sometimes very strongly.  If you work on it you can slowly move that bar forwards with the Penguin algorithm.  That’s similar to all of our algorithms in that they’re not something that is either on or off but that they’re trying to be very granular based on what it finds.

And one more:

 

Site owner: When a site is suffering under the Penguin algorithm, are new good links to that site treated with suspicion (or perhaps pass less value) until the bad links are cleaned up?  Is this why you said it’s like having an anchor that is pulling you down?

John: We don’t specifically treat [links] in a bad way, but essentially, but if our algorithms determine that your web [...cuts out...] problematic then we’re looking at your whole website and kind of treating it as problematic.  It’s not tied to specific links like that.  It’s more something that is being done on a website basis there.  So, what I said there about having an anchor that is pulling you do is essentially that you’re trying to move forward with your website but you still have the handbrake on.

 

Are you still with me?  I am rambling a little with this post and the reason is that part of why I am writing it is to get my ideas on “paper” to help me try to understand Penguin a little better.  Don’t worry…we’re getting to my theory on what happened with Cyrus’ site after he used the disavow tool.  I’m just taking the long road in getting there!

Here is what I have concluded after listening to these three hangout answers:

Penguin looks at the overall quality of a site’s backlinks and can cause the overall ranking of that site to be affected depending on how unnatural the backlink profile is perceived to be.

How does Penguin determine the overall quality of a site’s backlinks?

Obviously no one outside of Google knows the exact answer to this.  It is probably a very complicated process.  There are some people who believe it is something as simple as looking at the percentage of links with exact match keyword anchor text.  Some believe that if you have xx% brand anchored links and xx% url anchored links that you will avoid Penguin.  Personally, I think it is much more complicated.  I have a theory that Google places links in one of three categories:

  1. Links that are almost certainly natural
  2. Links that are almost certainly unnatural
  3. Links that are somewhere in the middle

I also think that the vast majority of most sites’ links fall into class #3 – somewhere in the middle.  As Penguin evolves, Google will likely get better at classifying these links but for now, in most cases the algorithm simply doesn’t know.

Why is this important to our discussion?  How does this relate to what happened with Cyrus’ site?  Remember, Cyrus disavowed all of his links and absolutely nothing happened.  But, when Penguin hit, BOOM.  It appeared that that’s when the disavow tool kicked in.

It’s possible that you can’t disavow a natural link!

Here’s where we get to the exciting part of my theory.  I think it’s possible that Google doesn’t allow you to disavow a link that they believe is natural.  When Google first announced the disavow tool, here is a quote that makes me think that it is possible that some links can’t be disavowed:

Q: If I disavow links, what exactly does that do? Does Google definitely ignore them?
A: This tool allows you to indicate to Google which links you would like to disavow, and Google will typically ignore those links. Much like with rel=”canonical”, this is a strong suggestion rather than a directive—Google reserves the right to trust our own judgment for corner cases, for example—but we will typically use that indication from you when we assess links.

If I add a rel canonical tag to my site and I mistype my domain name in my coding, Google may recognize that I have made a mistake and just decide not to pay any attention to that tag.  Similarly, there are cases where I may decide to disavow a link (or domain) and Google ignores that decision.  It’s a “strong suggestion rather than a directive“.  I believe that when we try to disavow a link that Google has not flagged as unnatural then Google ignores that suggestion.

Going back to Cyrus’ experiment where he disavowed all of the links to his site, if Google viewed these links as natural then it’s possible that they ignored his suggestion to disavow those links.  This is why nothing happened immediately after he filed the disavow file.

Perhaps if we have told Google we want to disavow a link, it is just one more piece in the puzzle in helping Google determine whether a link is unnatural?

Here’s where this article starts to get confusing.  I’ll be impressed if you can follow along with me.  :)

We don’t know how Google decides that a link is unnatural (and therefore, amenable to being disavowed.) The algorithm may look at things like this:

  • whether or not the site on which that link is hosted is deindexed or being penalized for link selling.
  • whether there are a large number of other links on that site that are suspicious.
  • whether that site has a large number of links that are pointing to sites that are in really competitive niches like casino, porn or payday loans sites.
  • whether the link contains exact match keyword anchor text.
  • whether the url of the page linking out contains certain keywords such as “links”, “seo”, “bookmark”, etc.
  • whether the links from this site are ever actually clicked on by real users.

There are probably MANY factors that are weighted when deciding how to classify a link.  It’s probably a super complicated process.  But I wonder if one of the factors is:

  • whether or not the site owner has elected to disavow this link.

If Google is on the fence about whether or not a link is natural and then the site owner decides to disavow it then it may be that this is the final straw that pushes the link over into the “most likely unnatural” category.

This decision making process likely happens during a Penguin update:

It would be a very time consuming process for Google to granularly assess each link on the web and make a decision of whether or not it is natural.  If my theory is right, then I believe that this decision is made at the time of a Penguin update.  When a Penguin update happens, then I believe that Google re-evaluates how much of a site’s backlink profile is unnatural.  In Cyrus’ case, the first time that Penguin ran, the majority of his links were probably in the category of “most certainly natural” or “somewhere in the middle”.  But, when he told Google that he wanted to disavow all of his links, that decision to disavow may have pushed many links into the “most certainly unnatural” category, and as such, they are now allowed to be disavowed.  Thus, when Penguin refreshes, Cyrus now has a pile of links that were formerly untouchable by the disavow tool because they were seen as natural, but are now able to be disavowed.

 

 

Will Cyrus recover?

Cyrus has now removed his disavow file.  This means that as links get crawled again, they now start counting towards his site’s PageRank.  However, he did not see any improvement in his rankings at all since removing his disavow file.  *If* he was not affected by Penguin, then, as links in his disavow file get recrawled, he really should see a gradual improvement in his site’s rankings. The Penguin algorithm must be thinking that his links are untrustworthy.

Now, here’s the part where my brain gets fuzzy.   Penguin should not be taking into account links that are disavowed.  In the eyes of Penguin, Google should be seeing that Cyrus has zero links.  What this means is that as he gets new links, he should see an improvement in rankings.  If there was absolutely no improvement then there are only two possible reasons for this:

1. Cyrus did not obtain enough new links to make a difference.

2. Penguin is distrusting the site despite the fact that links were disavowed and is making it so that it is very difficult to rank.

Looking at Cyrus’ new links gained on ahrefs, I can’t see that #1 is possible.

I have no explanation for what is happening here.  In fact, this troubling fact has kept me from publishing this post for a while.  I believe that Cyrus’ site dropped coincidental with Penguin 2.0 because the disavow tool was now seeing his links as now available for being disavowed.  But, I can’t understand why the Penguin algorithm would be affecting his site.  If his bad links are disavowed, they should not be counting towards Penguin.  In fact, here is what John Mueller says (13:48):

 John: In regards to algorithms that are looking at those links, obviously cleaning up those links is a good thing because we don’t have to take a look at them, but if they’re in the disavow file and we’ve recrawled them then obviously they won’t be used for that algorithm.

Site owner: Do those two things equate though?  Removing the link or adding it to the disavow file…or is there any difference?

John: That’s pretty much the same with regards to an algorithm….Essentially if you can’t have a link removed, then putting it in the disavow file is pretty much equivalent.

If Cyrus has disavowed all of his links, he should not be currently affected by the Penguin algorithm.  It should be only the disavowing of links that has affected him which means that the new links that Cyrus has obtained in the last few months since Penguin refreshed really should cause him to see some improvement.  If he hasn’t seen any improvement then it means that he is somehow still under the effects of Penguin. When Penguin runs again, that directive to disavow the links is now gone and most likely, the links will go back into the “most certainly natural” category.  I predict that the next time Penguin runs Cyrus will see an improvement.  He probably will be doing even better than before he applied the disavow file because he has since gained even more natural links.

tl;dr

If you didn’t read this article I don’t blame you.  It’s confusing and I even debated whether to publish it.  Here are the main points:

  • Cyrus disavowed every single link to his site.
  • Nothing happened until May 22, 2013 when Penguin 2.0 updated and he saw a dramatic drop in rankings.
  • It was postulated that the disavow tool did not take effect until Penguin refreshed, but Google says that is not true.
  • It is possible that the disavow tool will not allow you to disavow a natural link.
  • It is possible that Google uses several criteria to determine whether a link is natural.  If a link is in a debatable area, then us telling Google we want to disavow it could push it into the realm of “most certainly unnatural”.
  • When Penguin refreshed, the algorithm now saw that most of Cyrus’ links were unnatural.
  • As they were disavowed they should not have counted towards Penguin.  However, the drop may have been because the links were now allowed to be disavowed because Google was no longer completely certain that they were natural links.
  • I think that the drop in rankings was completely due to the disavow file.  But, I can’t explain why new links have not had any effect on his rankings.  This makes me think that for some reason Penguin is causing Google to distrust his links.
  • I predict that now that Cyrus has removed his disavow file, the next time Penguin refreshes he will see a complete recovery.

 

Conclusions.

This has been a long, rambling post.  If you made it to the end then kudos to you!  Please note that this is just a theory.  I am stupidly obsessed with trying to understand Penguin.  There are many days where it completely consumes my thoughts.  Yet, I still feel that I have only scratched the surface in understanding it.  I wrote this post to try to reconcile the differences between what Google was saying in regards to the disavow tool and what people in the SEO world were seeing in real life.

There are still a number of questions left unanswered after I have written this article.  I have explained why it appears that the disavow tool kicks in after Penguin refreshes, but not why some site owners are only seeing a change once a reconsideration request is filed.  (I have not experienced this happening myself so I can’t comment here.)  I am also still really confused about how Penguin can affect a site where every link is disavowed if disavowing means that those links are not factored into the algorithm.

I am still really cautious in my use of the disavow tool.  I use it for getting manual unnatural links penalties removed.  At this point though I still do not condone using the tool as a method to recover from Penguin unless you have a site with which you are willing to experiment.

I’d love to hear your thoughts below.

 

Added October 18, 2013: Penguin refreshed on October 4, 2013 and today Cyrus is reporting that he has not seen any improvement at all.  There are a few possible reasons for this.  It’s possible that he just needs more time for Google to recrawl his previously disavowed links and start to attribute value to them again.  It’s also possible that there were other issues with the site that caused Penguin to hit it.  I have not seen this, but some people have said that Cyrus’ site previously had some pharma links pointed at it.  That would certainly complicate the issue.

 

Want to keep up to date on Google Penalty News?

Sign up for the Google Penalty Information Newsletter and Dr. Marie Haynes will send you regular updates on the latest in Google Penalties and algorithm changes such as Penguin and Panda:

  • Amarjit Kapur

    Very strange..if it is not useful to use disvow tool to recover from Penguine then what can one use? Its all getting very confusing by the minute.

    A number of our sites have been affected by Penguin. TBH I have been waiting for the new refresh to see the positive changes take place but so far it hasn’t happened. We have checked GWT again and it seems all those disvowed links are still showing.

    • http://www.hiswebmarketing.com/ Marie Haynes

      Hi Amarjit. When you disavow a link you will still be able to see it in your WMT backlinks. Unfortunately the disavow is completely invisible.

      This article definitely is confusing and shouldn’t be relied upon to understand Penguin. I was mostly trying to reconcile why the disavow tool only seemed to kick in with a Penguin refresh for Cyrus. But, I feel that something is not right because we should be seeing more sites making some kind of recovery with Penguin refreshes.

      • Amarjit Kapur

        Yes. Right now, all we doing is disvowing wherever needed and waiting for Penguin refresh or algorithm change.

      • Ron

        Hi Marie,

        I’m sooo confused! I disavowed all of my bad links back in June and was excited to see this recent penguin refresh. Unfortunately instead of the improvement in rank I dropped again. I just don’t understand it… I should have seen an improvement right. I’m not seeing the disavow helping anyone are you?

        • http://www.hiswebmarketing.com/ Marie Haynes

          I hear your frustration Ron. I have seen a good number of sites that did a lot of disavowing but either saw no improvement with Penguin refreshing. There are a few possible reasons for this (and to fully address this topic I should probably write a new article.) It’s possible that you did not address enough of your unnatural links. It’s also possible that many of your disavowed links have not been recrawled by Google. It can take up to a year for some links to get recrawled. If this is the case then you could possibly see some improvement with the next refresh.

          It’s also possible that Google wants to see a certain percentage of links removed as well. I’m not sure about this, but the few sites where I am seeing recovery are ones that had gone through a thorough removal campaign along with a disavow. But, these sites also had a really good base of natural links along with the bad ones.

          In the past I have yet to take on clients who primarily have Penguin issues because we have not seen any recoveries, but I am now putting together the data I have and I think that in *some* cases I can help. However, there is still so much that we don’t know!

  • http://blog.bloxxter.cz/ Pavel Ungr

    Even if it was difficult to read all and catch your line, It was amazing journey. Thank you for great article – it gives me a lot to thiniking about.

  • Tasha Harrison

    Hi Marie

    Your post is excellent and I think you really capture the conflicting nature of these updates and how Google’s communication isn’t necessarily reflecting what we’re seeing.

    I have a client who was badly hit by the latest Penguin (only a month I started working with them! very unlucky). I ran a detox report on Link Research Tools on my client and on their competitor and the competitor had a worse link profile but weren’t affected at all by Penguin.

    Any ideas why this would be? They had very similar good links – both from industry magazines and some big news sites. In fact my client has more positive links and more links overall.

    • http://www.hiswebmarketing.com/ Marie Haynes

      Hi Tasha. That’s a tough question to answer. Because we don’t know exactly what Penguin is looking for it’s hard to say whether one site is clean and another is not. I’ve seen some sites where the link profile looks horrible but when you weed out the nofollows the rest of the profile is good. It could also be that the competitor’s site has more truly good links.

      While I think that Link Detox can be a decent screening tool for some sites I have found that it often classifies links that I would have called toxic as healthy ones.

    • Alan Ng

      Link tools won’t be a good indicator of toxicity of a profile. they fall down even more for a non English sites. You’ll need to manually check the links and this is where experience counts. We had a client where we ran a detox report and it came back as 8%toxic, let’s just say it was a lot higher.

      You need to take a closer look at anchor text, side bars and also possibly if they are redirecting any domains into their site.

    • Tasha Harrison

      Thanks guys, I really appreciate your feedback.

  • Jeff

    First off – you did an amazing job at documenting this theory of yours. Kudos. Since this post was more of a “rant”… it’s only fair that my comment be a bit
    of the same.

    I should preface this by I have not really been keeping up with Cyrus’s little
    experiment except for reading this post.

    Think of links to his site as roots to a tree. The more roots = the bigger/sturdier the tree. In his experiment, he removed all of his roots. But, if you remove all of
    the roots of a tree, the tree still stands just as tall as it did with the
    roots – just not as sturdy.

    Penguin is like an earthquake that shook the ground and knocked his tree to the ground. Bad links would be similar to bad roots. With enough bad links (or no links), the “Penguin Earthquake” could take out the whole tree. If there are just a few bad links poisoning certain parts of the tree, then the tree could just lose a few limbs (or keywords) during the earthquake.

    I think that the Trust of a site is not as easily lost as people think. After all, it takes forever to gain trust. Google, in the past few years, has come out with all
    sorts of spam fighting techniques and ways to penalize sites that don’t deserve
    it. But not much research has been done about sites that remove “good links.”
    After all, who would want to remove good links coming to their site? (besides
    Cyrus).

    Here’s another experiment I’d like to see. What would happen to site http://www.site-a.com if sites http://www.site-b.com through http://www.site-z.com (25 other sites in this example) used the disavow tool to disavow domain:www.site-a.com? Is a disavow equivalent to a “bad link”? Who knows…