WordPress has the potential to mess up your websites SEO by generating archive pages (home archives, monthly archives, daily calendar archives, categories, tags and search results) with duplicate content.
There’s also other ways WordPress can generate duplicate content, but in recent versions of WordPress those have been fixed (as long as your theme/plugins don’t break the WordPress core fix).
For Stallion Theme users everything below is dealt with by Stallion other than over using content on multiple categories/tags which is a user issue (see later).
WordPress SEO Canonical URLs on Paged Comments
Before dealing with duplicate content on archive pages let’s check your WordPress theme and plugins don’t break the current WordPress core canonical URL fix to duplicate content on posts and pages with lots of comments and paged comments activated.
WordPress core adds canonical URLs to posts and pages with paged comments, paged comments occur when you use the setting
Settings >> Discussion
And the setting
Break comments into pages with XX top level comments per page and the Last/First page displayed by default
Is ticked on and there’s XX top level comments on a post/page.

This setting breaks posts with multiple comments into multiple pages, an example can be seen at:
Stallion WordPress Theme Feature Requests
Currently there are just shy of 100 comments broken over 4 paged comments pages which results in 5 URLs to similar content (partially duplicate content because the main content, the post is always the same only the comments shown is unique).
http://www.stallion-theme.com/stallion-wordpress-theme-feature-requests
http://www.stallion-theme.com/stallion-wordpress-theme-feature-requests/comment-page-1#comments
http://www.stallion-theme.com/stallion-wordpress-theme-feature-requests/comment-page-2#comments
http://www.stallion-theme.com/stallion-wordpress-theme-feature-requests/comment-page-3#comments
http://www.stallion-theme.com/stallion-wordpress-theme-feature-requests/comment-page-4#comments
Load any of those 5 URLs and view source and you’ll find an identical canonical URL to the main post in the head (near the top of the code).
<link rel='canonical' href='http://www.stallion-theme.com/stallion-wordpress-theme-feature-requests' />
You will find this code on all 4 paged comments. The canonical URL tells search engines like Google that they should spider all pages, BUT all link benefit and SERPs should be redirected to the main post http://www.stallion-theme.com/stallion-wordpress-theme-feature-requests (the preferred canonical URL).
This was a good SEO move by WordPress development team to add canonical URL support this way because having multiple pages with almost identical content isn’t going to generate extra search engine traffic, but would waste link benefit and might trip duplicate content filters: it’s unlikely to trip the duplicate content filters, Google is very good at combining similar pages into one indexed URL, but better SEO safe than SEO sorry
If you have paged comments on your site view source of page 2 for example and check for a canonical URL code to the preferred canonical URL (the main post), if it’s missing either you are using an old version of WordPress (think it was WordPress 2.8 canonical URL support as added to WordPress) or the WordPress theme or a plugin you are using is removing the WordPress core canonical URls.
WordPress Duplicate Content on Archives
To the main WordPress duplicate content issue, WordPress archives.
Because WordPress reuses content on archive parts (categories, tags etc…) of a site there’s the potential for duplicate content issues.
There are many WordPress themes including the default WordPress theme TwentyEleven (and TwentyTen) that reuses the full content of a post on archive parts of a site. Basically if you view an archive page like your categories and tags and you see multiple full posts, you have the potential for a duplicate content issue.
Using the full content means every post is duplicated in full on one or more parts of a site, if you have a site with monthly archives, categories, tags and the default home page archives (ten posts on the home page) all your posts will be reused in full 4 times assuming you don’t add your posts to multiple categories and tags (worse if you do) and don’t use the calendar widget!!!
There’s also the issue of having ten full posts on archive pages is a massive page to load, especially if you add rich content (images for example) on many of your posts.
Fortunately this duplicate content SEO problem can be easily reduced (practically removed) by using a WordPress theme (like the Stallion WordPress SEO Theme) that rather than using the full content of a post uses a short excerpt on archived parts of the site. For an example take a look at this category archive Stallion Theme Settings, you can see 10 archived posts, but each post is a short excerpt of the post significantly reducing the possibility of duplicate content issues. Search Google for “Stallion Theme Settings” and you’ll find that category page is number one in Google for that SERP (it’s not a money SERP, but it shows Google indexes and ranks these fine).
If your posts tend to be small (not a lot of content) you still run the risk of duplicate content issues, imagine your excerpts are set to 155 characters and every post is 155 or fewer characters, your posts will be repeated in full on all archives. Not much you can do about this beyond creating bigger posts, I would suggest minimizing the number of archive types, add each post to only one category OR tag and don’t use any other type of archives: no dated archives and no calendar widget.
Code Fix to WordPress Duplicate Content
There’s a very easy (easy when you know how
) code fix for this duplicate content issue at theme level. Each theme is built differently, so the easiest way to fix a theme that uses full content on archive pages is to use the Post Teaser plugin which I’ve made an SEO version at WordPress SEO Plugins. The Post Teaser WordPress SEO Plugin generates an excerpt instead of the full content on archive posts, you can set the excerpt to any size and with my SEO version the anchor text of the continue reading link is SEO’d.
So you want to fix this issue at theme level.
Search through the php files of the theme for this code:
the_content();
and replace it with
the_excerpt();
You’ll need to do this for all code related to archive posts ONLY, but NOT on template files for Posts and Pages which usually are generated by the files single.php and page.php (don’t change those two files). For most themes you’ll be looking to change the files index.php, archives.php, categories.php, tags.php, search.php, but for some newer themes the code can be located in files like content.php, content-image.php, content-*.php.
It really is that simple
Reducing WordPress Duplicate Content Further
It’s quite easy with WordPress to generate duplicate content even with the above fixes. Here’s a few tips for avoiding the obvious pit falls.
Monthly Archives Widget : If you use the default home page archives (ten archived posts on the home page) and use monthly archives, they are pretty much identical. I NEVER use a monthly archive, not only do you run the risk of duplicate content (copying the home page) they add ZERO SEO benefit, monthly archives never rank for anything. Don’t use monthly archives, but if you do edit the widget so it only shows on the home page and other dated archive pages (this is built into the Stallion theme for example) so you aren’t wasting as much PR/link benefit if you loaded them sitewide.
Calendar Archives Widget : The Calendar archives are even worse, for starters the Calendar widget is broken (IMO when the title attribute/hoverover tooltip of a link includes the entire post it’s broken!). The Calendar archive breaks yours posts into days, on most sites you aren’t going to publish multiple posts every day, so the content of the daily archives are basically duplicates of the post if the theme you use uses the full post content on archives. Like the monthly archive there’s no SEO value in having daily archives, so don’t use them. The Calendar widget is so bad both user and SEO wise I’ve removed it from the Stallion theme.
Too Many Categories/Tags : I see a tendency for those in the make money online niche to over use Categories and Tags. You will find sites where posts are added to multiple categories and loads of tags for barely relevant categories/tags. An example might be a post added to
Categories > Make Money Online, Earn Money, etc…
Tags > Money, Wealth, Earnings, Online, Earn etc…
You might think this is a good SEO idea because the post is linked from more pages (easier to find) and you feel like you have a page targeting those single keyword SERPs, but it’s a waste of link benefit getting all those tags and categories indexed for no traffic gain. Do you honestly believe your site is going to gain one keyword SERPs like Money, Wealth, Earnings, Online, Earn just by creating a tag or category archive page? You might be able to gain long-tail SERPs like “Make Money Online Easily”, but those one keyword SERPs above are hard and if you want a SERP like “Money” or even “Make Money Online” that’s almost certainly going to need to be targeted on the sites home page where most links are generated to. Basically you target the hardest SERPs on the page with the most backlinks/link benefit (usually the home page).
Add to that if you add every post to a handful of categories and 20+ tags your tags archive pages in particular are going to be practically identical. Think about it, if you have two tags “Earn” and “Earnings” you are going to add the exact same posts to both tags, they will be identical AKA you are generating duplicate content by over tagging.
I have an SEO question when thinking about creating a category or a tag. SEO wise there’s no difference between the structure of a tag or a category page.
Will this new category/tag be capable of generating search engine traffic in it’s own right and/or does it serve a role to my visitors?
If you can’t answer yes to this question, don’t create the category/tag.
Example, should I create a tag or category on this site with the one keyword “WordPress”? Well, very easy one this, it’s a big NO. A tag or category is highly unlikely to rank high for the one word SERP WordPress and it adds nothing to my visitors experience because pretty much every page of this site is about WordPress. My only chance of ranking well for the WordPress SERP is the home page and I know it’s such a hard SERP it’s not worth my time only optimizing for it.
Another example, should I create a tag or category on this site with the two keywords “WordPress SEO”? This is a harder one, but it’s a no currently (might change in the future if I add a lot more content). A tag or category is highly unlikely to rank high for the two word SERP “WordPress SEO” (it’s a hard SERP and needs backlinks, not many webmasters are going to naturally link to a category/tag) and there’s already pages on this site like the home page and WordPress SEO (a relatively new page I plan to build into a WordPress SEO tutorial) that to some degree targets the WordPress SEO SERP. I’d be better spending my PR/link benefit on the WordPress SEO page above and creating categories that might stand a chance of gaining SERPs or are useful to my visitors.
Stallion WordPress SEO Plugin
The Stallion WordPress SEO Plugin can also help with duplicate content issues. If you have made the mistake of creating too many tags and categories (especially tags) consider using the Stallion SEO plugin to redirect their SERPs and link benefit back to the home page. The Stallion plugin can also redirect link benefit and SERP from dated archives to the home page as well, so if you’ve been using the monthly archives widget and.or the calendar widget you can fix the mistake.


9 responses to WordPress SEO Canonical URLs
Template signature - duplicate content - Google Panda
The next issue is Google Panda is looking for a ‘heavy template footprint’. That of course does not mean WordPress per say. However, many websites like eHow got hit by Panda perhaps because of the template model of applying content.
Set up a template, add content and repeat. This was SEO of cira 2009. Now the rules have changed a bit and Google is not just looking for unique content between website, but also in your own site.
This is good news actually as it gives SEOs control over another on site factor.
The bad news is most people have set up websites not too far from the eHow model. That is write content and drop it in a template structure.
You can check duplicate content with something like this between your website pages. duplicatecontent.net Chances are most people will come up with 90% plus, even 95%.
I think one issues here is having sidebar widgets and code that is more autogenerated than hand coded. That is one almost every page of your website the same sidebar widgets. Now if you write super long articles this is not as bad, as the ratio is decreased. However, must people do not write 8 hours a day.
Therefore, my question is, as SEO is a moving target and needs to be aware of the changes taking place, is there any way to make WordPress seem more human?
That is less template structure? I was thinking maybe having rotating sidebar widgets. That is on one page views maybe ‘popular posts’ appear. While on another page the widget shows ‘recent posts’. That is the ability to make the pages between websites more unique.
This is the new SEO challenge. What can be done for these new 2011 rules and how can Stallion continue to improve perhaps in this regard. Any suggestions or plugin recommendations are highly welcomed. I think this would be one more thing to get an edge over the next website out there who uses a 2009 Modus operandi. So would love the help in this regard as I am trying to get a number of websites recovered from Panda.
Unique pages in a non-template format might help out people a lot avoid a site wide Panda ranking demotion. I have read a lot on Panda and I think template websites were hit, that is a set up and content drop it in.
Along the same lines, to get you thinking about the future of Stallion, what features can be added to take this to the next step. That is make content more human, even if it is not. I do not know if Search engines can tell if a related posts is autogenerated or someone hand-made links in the website.
I think the old alinks model is less powerful than the add links by hand.
As Always a big thank you and sincere appreciation for your efforts helping others.
P.S. Tags have been eliminated on my websites and well constructed categories remain. I am trying to make my websites less cluttered and chance of duplicate content or low quality tag pages with one or two posts sitting in them.
WordPress SEO Canonical URLs
Google Panda Update and Duplicate Content
You inspired an article at Google Panda Update and Duplicate Content.
From a Stallion theme perspective quite a bit of the template content is unique that in most WordPress themes isn’t.
Comment headings like “Leave a reply to” include the title of the post, the heading for the related posts plugins I’ve added support for include the title of the posts. On archives the read more link is the title of the article.
If you use the Stallion 2011 Header Image area (added to Stallion 6.1) every post can have a unique header image.
It would be difficult and not user friendly to change the header beyond the above on a page by page basis, similar for the footer. If for example I could code a unique navigation menu for every article (which is probably not possible with WordPress) it would make navigation for visitors confusing, so I wouldn’t touch the top navigation menus.
For the sidebars I have been thinking about different widgets for different page types. This wouldn’t help with potential too much duplicate template content because posts would still use the same widgets. Doesn’t sound practical to be able to choose a custom sidebar setup on a post by post basis and could lead to confusion for visitors navigating a site.
I would tend to avoid random widgets, going to make navigation confusing.
eHow content isn’t very good. I looked at their article “How to Cancel a Credit Card Payment” and the first thing they suggest is:
Well duh! I thought the way to cancel a credit card payment was randomly phone banks until you happen to hit the right one
The whole site is filled with low quality articles like that, failing on these Google high quality sites factors:
The big question is exactly how the Google Panda update made it possible for Google to penalize low quality content rather than duplicate content (the eHow content isn’t duplicate).
You are also reading too much into the duplicate content checker site you found. Of course pages on this site and all Stallion sites using the same layout are going to share the same HTML markup. There’s nothing wrong with that, most sites are built that way, headers the same, sidebars the same, footers the same and the basic HTML of the main content is the same. If Google did add duplicate HTML as part of the Panda Update it would have banned most sites!
What’s important is the non-markup content is unique and that’s text and images basically, some HTML markup like H1 headers add extra value to the content, but it’s the content not the HTML per se that’s ranked. If I were to use that duplicate checking tool (which I wouldn’t) I’d only look at the “Smart text similarity” figure. Comparing this page to the home page for example gave a 33% similarity while “HTML fingerprint” was at 90% as you’d expect. Making a few assumptions what they are referring to since there’s no key.
Comparing the Stallion home page to WordPress SEO Themes – AdSense Templates home page gives 94% “HTML fingerprint”, 37% “Smart text similarity”, yet the pages are unique. Pinch of salt comes to mind.
I would be more concerned with not adding enough content to posts as I mentioned at Google Panda Update and Duplicate Content because the template could ‘drown out’ the main content suggesting low quality.
Based on what Google is asking for they are looking for high quality reasonable sized content. If you can’t create reasonable sized articles why would Google want to index them? Do you like finding short articles with no substance when doing online research?
Try to think like Google, what do they want: high quality content.
David
Duplicate Content and Canonical URLs
Semantic links Vs Tags duplicate content, Google Panda
I have printed and read in detail your SEO ideas on duplicate content. I appreciate these peals of wisdom.
As stated, last week after reading your article to change my navigation structure from hundreds of tags and categories to about ten well planned categories. The result was the elimination of over a thousand tags, yes more than 1000 tag pages on my websites in aggregate.
I did this all on my own accord because I was thinking you are right. Why would a user need that many tag pages. They were created as a few years ago when I was operating under the SEO idea that, “semantic related links” produced by tags in WordPress, would help on-site factors.
I think it did, however, I have to weigh this all now against ‘is it useful to the users’ SEO idea. I think now having one category and five tags per post, are not as useful as having one well thought out category match that the user might explore. Rather as you correctly stated focus on building content of greater quality and length better than on-site factors as your WordPress theme seems to take care of a lot of on-site in proper measure.
Another thing you wrote somewhere on your website was what use to work before Gppgle Panda still works. So it does not hurt to have semantic text in your posts and keywords, just make sure that this does not violate the SEO principle of usefulness to the reader.
As a side note, I do tend to think super comments also helps duplicate content as the structure of these pages created are a big different than the rest of the website usually and if they are well written by users they have little SEO in mind and hence rank on long tail keywords you might not expect.
The other thing about Panda which I keep getting back to is you are right. The Web is filled with misinformation about SEO. SEO misinformation replicates and spreads until many are building their websites off of it. Reading forums on SEO are good but staying objective and separating the wisdom from the hype is hard.
I will be very curious to see if my tag zapping experiment will help with Google Panda. For sure I have to strive to improve the quality of my pages as always.
I also want to diversify my traffic sources as I believe one of the ironies of Panda is Google ranks websites that have high direct traffic or traffic not just from SERPs. I need to think of how I can diversify traffic sources. Social media is good everyone says, but you know how this is.
I am looking into more videos and ways to increase direct traffic. I am thinking for direct traffic it is helped by having a quality product to offer. This will also push one’s site up in organic search, post-panda.
WordPress SEO Canonical URLs
Hide tags on archive pages for SEO benefit?
Dave, was wondering whether to use the hide tags on archives pages feature. I get the feeling from what you’ve written above and elsewhere that it may be useful if you want to downplay the importance of those tag pages.
I likely have too many tag pages to begin with (something like 90) and I’d rather focus link benefit on certain other pages I’ve linked to in my header menu and footer.
I was wondering if there are any other effects to be aware of besides eliminating link benefit going to these tag pages from the archives pages. For example if having these tags visible is important for spreading link benefit throughout the site. Thanks for any thoughts you might have.
Erik
WordPress SEO Canonical URLs
WordPress SEO Categories and Tags
I don’t use WordPress Tags because in structure they are the same as Categories.
A lot of webmasters overload their sites with Tags/Categories which means they need more link juice to power their sites SEO. You want most of your SEO benefit to find it’s way to the pages that gain SERPs, those tend to be the home page and single posts (articles). Look at your logs and determine do your Tags/Categories pull in search engine traffic (some will, most won’t).
So you want most link benefit to get to the articles. You can’t eliminate some form of navigation on a site (Categories/Tags), but limiting it makes sense because every Category/Tag will take more link benefit than your most important articles.
On many of my sites with a PR4 home page I tend to find the Categories (which I try to keep to a minimum) are PR3 (sitewide links to them) and most articles are below PR2 (NOT sitewide links) with those linked from widgets like the Popular Posts Widget PR3 (sitewide links). If I added even more Categories the amount of link benefit that would get to articles would be even worse.
If you have a lot of Categories/Tags and they aren’t needed (you need one Category or Tag for each article, not 20 Tags per article) look at ways to reduce them and not link to them so much.
David
Duplicate Content and Canonical URLs
Reducing tags - Page Rank Distribution
Thanks Dave, I know you’ve covered this before. A few of my tag pages actually rank quite highly for fairly important keywords, but as you say most do not. Collectively the bring about 3% of visits. You are right these tag and category pages tend to be among the higher Page Rank pages on the site.
My only reluctance is whether turning off tags on archive pages will hurt this albeit small segment of highly-ranking tag pages.
I already did a tag cull awhile back, but will look to reduce them more.
Duplicate Content SEO
Delete WordPress Tags and Categories with NO Search Engine Traffic
If some WordPress tags get search engine traffic keep theme and keep the same layout, but delete the tags that get no traffic.
This way your tags with traffic loose no internal backlinks and you waste no link benefit on tags that aren’t generating traffic, which means the important tags (and other pages on the site) gain more link benefit since less is wasted.
Same argument for categories, not much point having a tag or a category that doesn’t generate traffic OR doesn’t serve a spreading link benefit function: every post needs to be in one category or tag, ideally each category/tag would be limited to only 10 posts (or whatever number you set archives to show, doesn’t have to be 10) so they don’t go over to category page 2, page 3 etc… This maximises link benefit to all deep content assuming you have a sitewide widget of categories/tags.
Category One has a sitewide link and has no more than 10 posts (or the number you have it set to so it doesn’t go to page 2), this means all the posts in Category One are no more than 2 clicks away from home page and since all categories are linked sitewide all content receives a fair share of link benefit.
If you use a popular posts widget those posts will also have sitewide internal links, so will be no more than one link from home.
In practice it tends not to be this perfect setup, but if you keep it in mind you won’t go far wrong. If you want specific posts to gain more internal links so it ranks higher there’s custom menus where you can link to specific URLs to make sure they have more internal links.
David
WordPress SEO Duplicate Content
Using category pages to distribute deep link benefit
Excellent answer Dave. I have begun reducing tag pages.
I think part of my problem is that I have only 8 categories, and 1200+ posts.
The smallest category contains 40 posts, while the largest, very generic category has over 400. These are all currently displaying just 10 posts per page, so you are getting some categories that go for 40 pages or more into the abyss.
Is the answer to create numerous new category pages to split these out? I could probably come up with suitable new categories using keywords, and reassign posts from these big categories to the new categories.
This would have the effect of adding a lot of category links in my sidebar widget (thus reducing link benefit to other links I’ve deemed important).
But I suppose it would distribute link benefit more deeply through the site (on your advice from this previous discussion – http://www.stallion-theme.com/stallion-wordpress-theme-layoutdesign-options?cid=10854 –
I don’t display tags in any site-wide widget so benefit is not flowing in that manner).
A second issue then is deciding how many posts to display per category/tag page.
It looks like there are plugins or other hacks that can be used to display a specific number of posts per page (I’d like to keep my main index page just showing the standard 10, but increase the tags/categories).
Is there an upper limit for how many posts should be displayed on a page?
I realize you might not have a specific answer, but is it reasonable to have a category page with, say, 50 or more post excerpts on a single page?
Many thanks
Erik
Duplicate Content SEO
WordPress SEO of Categories
1,200+ posts in only 8 categories averages at 150 posts per category IF you have them spread evenly (which you won’t).
With the standard 10 posts per category that’s 15 pages deep, unless you have a high PR site (loads of backlinks) there’s a very good chance a significant amount of your deeper content rarely sees a search engine spider (can’t find them, some content will be over 10 clicks from home!).
I try to keep the number of posts shown in a category to under 20 and try to keep the number of page 2, 3 category to as low as possible. With your site for example I’d rather have 30 categories with on average 40 posts a category with the number of posts shown per page set to 20 which would result in most categories going 2 or 3 pages deep than your current setup of 15 pages deep.
This would have a sitewide impact, posts that are currently getting very little link benefit will get significantly more and those that get most link benefit could see significantly less. So there’s going to be ups and downs in current SERPs, I would recommend compiling a list of the pages that gain most traffic and add them to a custom menu and add them as a sitewide widget so they don’t loose their current internal backlinks. Basically if you have a post that with your current setup has a sitewide link and you change it so it only has one link from page 2/3 of a category it might loose rankings.
I bought this site Mobile Phone Reviews last year and reorganised it.
157 posts
10 categories
15 posts shown in a category
Categories range from 8 to 15 posts per category (most are around 10-12) meaning there are no page 2 categories.
Since I own over 100 domains I don’t have the time to spend reorganising sites to the level I’d suggest for a webmaster with one site. If I had the time I’d push posts that gain no search engine traffic because they aren’t really targeting SERPs into categories with more than the ideal number of posts so they get even less link benefit. I’m sure you have lots of posts that don’t really target anything, in a perfect SEO world they either need targeting at some SERPs (a rewrite which takes time) or partially removed from the site so they don’t waste too much link benefit.
For example if you went with around 20 categories, 20 posts per archive page and designate 4 categories to push less important posts into (100 posts per category for example) the number of posts in your ‘important’ categories that are targeting SERPs is dramatically reduced: 400 posts in the unimportant categories means 800 in the important categories, 800 posts into 16 categories = 50 posts per category, if set to 20 posts shown per category it averages 3 pages deep which is similar to having 30 categories and evenly spreading all posts. In the real world you won’t get perfect numbers, keep them in mind and will help spread link benefit more efficiently.
David
Canonical URLs SEO
Leave a reply to WordPress SEO Duplicate Content