Call Us +44(0) 3302 231 322

Malicious Intent + Duplication = One Bad Headache.

Question – Why do Panda’s & Penguins like football?

Answer – Penalties.

Unlike the colours of the animals they’re named after, the reported accuracy of these Google algorithm updates in reaching their desired targets has not been black and white. In some cases you can argue justice has been done and in other cases a number of innocent casualties have been created. Heavily related to the two Google updates mentioned above is the topic of duplication which I’m going to attempt to unravel here. We know the basics but there are a number of scenarios that raise questions which I’ve discussed further down, so…

Duplication SEOWhat is duplication?
What it says on the tin; identical or practically identical content. You can test your content for this by using a percentage scale via an online tool. Duplication can either be intentional or unintentional.

Why did duplicated content start getting penalized?
Matt Cutts’ never-ending battle with web spam. Corrupt content duplication pollutes cyber-space and gains websites an unfair advantage via black hat methods that manipulate the search engines to give high rankings to sites that don’t provide optimum benefit to the user. Of course there’s the argument that SEO is in itself a manipulative practice…can of worms trust me! For today though, we’ll just focus on malicious manipulation with a blatant disregard for the rules.

One of Google’s biggest public battles is with combating this intentional manipulative practice in SEO although with such fine lines, complexities and detection difficulties coupled with the occasional moving of the goals posts by the search engines and an evolving black hat industry, this task is monumental at best. You can read some of my thoughts on this subject in relation to link building on a guest post I wrote here.

How did duplication used to benefit SEO?
Imagine you have a web page. You optimize it brilliantly; include some great content, fabulous keywords, on-page links, well optimized images and perfect header tags. This page provides great value to your website and associates wonderfully with your industry, the page gets indexed and recognized as a great page by the search engines, consequently helping you to rank. If one page has this effect…why don’t you double it with two? Or three even? That’s one of the simple (albeit outdated) ideas around duplication…

Panda’s and Penguin’s
We’re equally baffled by the names and in some senses the updates! However these updates have targeted in addition to other aspects of SEO, duplication or “thin content” as it’s also known. The problem came as it always does for those sites that were innocently putting duplicate content on their site.

What innocents got hit?
eCommerce. Imagine an online clothing store, they have this really cool range of t-shirts (insert shameless plug for client here 😉 ). You click to view the t-shirts and they sell them in blue, pink, green, orange and yellow – they’re exactly the same t-shirt but in a different colour. To show the website visitor what they look like in each colour you create a different page for these, they’re identical pages, the only difference is the colour – whoops that’s duplicate content. You see the problem! This was predominantly a feature of the Panda update.

The Penguin update hit low quality links, private blog networks, over-optimsation and excessive use of anchor text. There were actually protests in India over this which you can read about here, you can also read more detailed information about Penguin in general here. This more recent update has again has caused ripples throughout digital. Build My Rank famously got hit and there was a de-indexing demon culling its way through spammy links, either de-valuing them or going one step further and penalizing the sites. They got fair warning in the form of an email to their webmaster tools account and you can also fill out a form reporting this practice or if you believe you’ve been unfairly hit – there are links to the form and more in-depth information on this blog by Search Engine Land.

A few scenarios:
Steering away from recent events throughout my work I’ve come across some issues and questions around duplication and arousing suspicion with the search engines in general that I think are interesting, puzzling and do occur in our industry. Please add your thoughts and examples of your own, and correct me if I’m wrong!!

1. eCommerce: you use manufacturers descriptions when you sell their products on your site.
If you sell shoes which you source from the manufacturer, you might not feel the need to change the descriptions when you sell the shoes on your own site. This can cause a duplication issue. However, I believe consensus is that this issue is less of one to worry about than if that content was on the same domain.
2. Google reviews, you’ve collected these genuinely on email and you want to put them on Google Reviews yourself.
A company I know was researching into rich snippets; they had collected genuine reviews over a period of time on email. They wanted to transfer these to Google Reviews to give them a star rating under their organic listing and encourage click throughs. However sitting on Google and adding in all the reviews themselves could arouse suspicion as a manipulative action to the search engines. This is something to be considered and in the end this client used Schema code. This allows people to post reviews on the website which Google draws data from to give the star rating. The existing reviews have been left off for now, but for any ongoing they are put directly on to the site using this method. This could still be easy to fake and I’m sure there will be initiatives put in place if this becomes an issue however presently, this is a good way to go.
3.You’re pulling in data from another site to display on another IP address with an iframe or something similar.
If you rest part of your site on another IP address with the intention of pulling in data form your site to use the infrastructure and technology of another site, effectively you have a blank page that pulls in data. In terms of duplication, when this content shows it’s actually being pulled in from somewhere else, hence I believe it’s not duplicated.
4. You’ve got a white label site.
So, you have part of your website hosted on another IP address; I recently had this conundrum for a site that needed software from one of its partner businesses pronto. A page of their site therefore rested on another site, effectively having the same page appear on two different IP addresses. This can be an issue with duplication and depending on whether you use your own domain name on the secondary site or have a sub-name on the host domain it can also be an issue with SEO.

By directing your SEO efforts to your page which has the hosts domain name but is branded to you, you are actually benefiting their domain. If you have your own domain name on someone else’s site however, your SEO efforts will still benefit your site.

Additionally, if you have a standardised page on someone else’s website – for example if the site owner creates more pages that are copies, if all of these are standardized, unless you make them your own with unique content – you may have duplication issues here too.

Let’s end constructively. What can you do about duplication?
Firstly, don’t panic (all who know me know I am TERRIBLE at taking this advice). It’s a good tip though. Analyse the situation as a whole and do not assume whatever’s in the news is the reason your site may have slipped down the rankings. Get advice, complete a site audit and reach out to the authorities – we’re in the fortunate position of being in an industry full of generosity and people are always willing to help :).

If you know you have duplicate content on your site such as the t-shirt example above but each page is still needed on your site, you can use a canonical tag. Decide on which page is the main page and put the tag in the <head> tag of the other pages referencing the main page. Only the main page should then be counted and the search engines get a nice heads up that you’re aware of the duplication and that this isn’t intentional.

If you have two copies of the same page, for example you have transferred to a new website or your site responds to AND and you don’t need one of them you can put in a 301 direct which will transfer any of the traffic to the main URL you specify and the vast majority of the link equity.

Finally, from a content perspective Copyscape is a great tool which you can pass your content through to check for any duplication.

And with that, I’d like to finish. Once you open this up duplication is a broad and complex topic, please leave your comments and contribute to the discussion!

  1. Good post on a relavant & complicated subject. I’d have agreed with you on point 1 until about 6 weeks ago, but I’m seeing more & more evidence of duplicate content issues across different domains dramatically affecting traffic.

    While not a penalty as such, Google is getting fussier about the content it is showing, so if you take a catalogue that is shared by others, my advice would be to publish it first or you’re going to be very busy changing a lot of product descriptions!

    • Thanks for the comment – really interesting points! So you’ve noticed duplicate content across different domains being targeted/penalised? Do you think it’s another effect of the recent updates? Thanks for the catalogue tip also! It’s quite frustrating in a way though as that’s a completely innocent accident by most! :)

  2. It doesn’t strike me as being a “penalty” as such, this implies that Google has targeted a site and delivered a punishment for bad practice.

    What appears to be happening is products that are replicated on other sites are simply not being indexed where they are found to be duplicates. Rather than being a penalty, I believe Google is simply saying “We have found this content elsewhere so we don’t need to display it again.”

    Bad news if you have a large site with many products from (for example) a catalogue group where many sites carry the same product, as it will mean either getting your site indexed first or making wholesale changes to every product.

    PS. Loving the joke at the top 😉

    • I see – if that’s the case then there might be a big push to see who can get the content displayed first if Google only displays things once? That sets things up for a lot of competition perhaps?!

      And thanks about the joke! I made it up myself and wasn’t sure how it would go down so that’a a relief! :)

  3. That’s it, if you’re the first to get indexed then you’re seen as the “originator” of that content, gives you a much better chance of high visibility. With Panda updated all the time I expect this to be an ever changing issue, you’ve covered it really well here and I have duly tweeted it! :-)

    • I really appreciate your comments and kind words, thank you! Yes social life was decidedly lacking this weekend :) Haha worth it for a good post though!

  4. Great post – you cover some very important points here.
    On Canonical links – If I have an eCommerce site with the same product in two categories, the url’s reflect the category structure – is this a scenario to use Canonical links?

    I thought the joke was good too!

Leave a Reply

Latest Blog Posts

How To Find Us

Phone Us: +44(0) 3302 231 322 Email Us: and Find Us: Lant Street, Central London, SE1
" We wanted a maintenance of our current organic ranking which has not only been achieved in the time we've been working together but surpassed. Oh...and we also enjoy an extra 10,000 visitors a month to the site "