Malicious Intent + Duplication = One Bad Headache.

Question – Why do Panda’s & Penguins like football?

Answer – Penalties.

Unlike the colours of the animals they’re named after, the reported accuracy of these Google algorithm updates in reaching their desired targets has not been black and white. In some cases you can argue justice has been done and in other cases a number of innocent casualties have been created. Heavily related to the two Google updates mentioned above is the topic of duplication which I’m going to attempt to unravel here. We know the basics but there are a number of scenarios that raise questions which I’ve discussed further down, so…

Duplication SEOWhat is duplication?
What it says on the tin; identical or practically identical content. You can test your content for this by using a percentage scale via an online tool. Duplication can either be intentional or unintentional.

Why did duplicated content start getting penalized?
Matt Cutts’ never-ending battle with web spam. Corrupt content duplication pollutes cyber-space and gains websites an unfair advantage via black hat methods that manipulate the search engines to give high rankings to sites that don’t provide optimum benefit to the user. Of course there’s the argument that SEO is in itself a manipulative practice…can of worms trust me! For today though, we’ll just focus on malicious manipulation with a blatant disregard for the rules.

One of Google’s biggest public battles is with combating this intentional manipulative practice in SEO although with such fine lines, complexities and detection difficulties coupled with the occasional moving of the goals posts by the search engines and an evolving black hat industry, this task is monumental at best. You can read some of my thoughts on this subject in relation to link building on a guest post I wrote here.

How did duplication used to benefit SEO?
Imagine you have a web page. You optimize it brilliantly; include some great content, fabulous keywords, on-page links, well optimized images and perfect header tags. This page provides great value to your website and associates wonderfully with your industry, the page gets indexed and recognized as a great page by the search engines, consequently helping you to rank. If one page has this effect…why don’t you double it with two? Or three even? That’s one of the simple (albeit outdated) ideas around duplication…

Panda’s and Penguin’s
We’re equally baffled by the names and in some senses the updates! However these updates have targeted in addition to other aspects of SEO, duplication or “thin content” as it’s also known. The problem came as it always does for those sites that were innocently putting duplicate content on their site.

What innocents got hit?
eCommerce. Imagine an online clothing store, they have this really cool range of t-shirts (insert shameless plug for client here 😉 ). You click to view the t-shirts and they sell them in blue, pink, green, orange and yellow – they’re exactly the same t-shirt but in a different colour. To show the website visitor what they look like in each colour you create a different page for these, they’re identical pages, the only difference is the colour – whoops that’s duplicate content. You see the problem! This was predominantly a feature of the Panda update.

The Penguin update hit low quality links, private blog networks, over-optimsation and excessive use of anchor text. There were actually protests in India over this which you can read about here, you can also read more detailed information about Penguin in general here. This more recent update has again has caused ripples throughout digital. Build My Rank famously got hit and there was a de-indexing demon culling its way through spammy links, either de-valuing them or going one step further and penalizing the sites. They got fair warning in the form of an email to their webmaster tools account and you can also fill out a form reporting this practice or if you believe you’ve been unfairly hit – there are links to the form and more in-depth information on this blog by Search Engine Land.

A few scenarios:
Steering away from recent events throughout my work I’ve come across some issues and questions around duplication and arousing suspicion with the search engines in general that I think are interesting, puzzling and do occur in our industry. Please add your thoughts and examples of your own, and correct me if I’m wrong!!

1. eCommerce: you use manufacturers descriptions when you sell their products on your site.
If you sell shoes which you source from the manufacturer, you might not feel the need to change the descriptions when you sell the shoes on your own site. This can cause a duplication issue. However, I believe consensus is that this issue is less of one to worry about than if that content was on the same domain.
2. Google reviews, you’ve collected these genuinely on email and you want to put them on Google Reviews yourself.
A company I know was researching into rich snippets; they had collected genuine reviews over a period of time on email. They wanted to transfer these to Google Reviews to give them a star rating under their organic listing and encourage click throughs. However sitting on Google and adding in all the reviews themselves could arouse suspicion as a manipulative action to the search engines. This is something to be considered and in the end this client used Schema code. This allows people to post reviews on the website which Google draws data from to give the star rating. The existing reviews have been left off for now, but for any ongoing they are put directly on to the site using this method. This could still be easy to fake and I’m sure there will be initiatives put in place if this becomes an issue however presently, this is a good way to go.
3.You’re pulling in data from another site to display on another IP address with an iframe or something similar.
If you rest part of your site on another IP address with the intention of pulling in data form your site to use the infrastructure and technology of another site, effectively you have a blank page that pulls in data. In terms of duplication, when this content shows it’s actually being pulled in from somewhere else, hence I believe it’s not duplicated.
4. You’ve got a white label site.
So, you have part of your website hosted on another IP address; I recently had this conundrum for a site that needed software from one of its partner businesses pronto. A page of their site therefore rested on another site, effectively having the same page appear on two different IP addresses. This can be an issue with duplication and depending on whether you use your own domain name on the secondary site or have a sub-name on the host domain it can also be an issue with SEO.

By directing your SEO efforts to your page which has the hosts domain name but is branded to you, you are actually benefiting their domain. If you have your own domain name on someone else’s site however, your SEO efforts will still benefit your site.

Additionally, if you have a standardised page on someone else’s website – for example if the site owner creates more pages that are copies, if all of these are standardized, unless you make them your own with unique content – you may have duplication issues here too.

Let’s end constructively. What can you do about duplication?
Firstly, don’t panic (all who know me know I am TERRIBLE at taking this advice). It’s a good tip though. Analyse the situation as a whole and do not assume whatever’s in the news is the reason your site may have slipped down the rankings. Get advice, complete a site audit and reach out to the authorities – we’re in the fortunate position of being in an industry full of generosity and people are always willing to help :).

If you know you have duplicate content on your site such as the t-shirt example above but each page is still needed on your site, you can use a canonical tag. Decide on which page is the main page and put the tag in the <head> tag of the other pages referencing the main page. Only the main page should then be counted and the search engines get a nice heads up that you’re aware of the duplication and that this isn’t intentional.

If you have two copies of the same page, for example you have transferred to a new website or your site responds to AND and you don’t need one of them you can put in a 301 direct which will transfer any of the traffic to the main URL you specify and the vast majority of the link equity.

Finally, from a content perspective Copyscape is a great tool which you can pass your content through to check for any duplication.

And with that, I’d like to finish. Once you open this up duplication is a broad and complex topic, please leave your comments and contribute to the discussion!