Forum Topic: Problem redirecting content ending in .html

Forum: .htaccess Forum : Redirecting • Posted by James Kockelbergh • Updated:

Hi there I have a real problem that I have being trying to solve on and off for the past months, but so far I have had no luck ;).

I will try and give you as much info as possible so you can get the full picture. I have a website that is a small online local paper that has been publishing content since 1998 that originally has always been hosted under the domain name http://www.esdiari.com.

The original CMS was a custom built solution done by a very good friend of mine called Tom?s Rotger who now concentrates his passion on photography. Getting back to the point when we decided to migrate to WordPress we kept the original website under http://www.esdiari.com and built a new website using WordPress under the new http://www.esdiari.es domain name. This was not to much trouble because we redirected the main categories from http://www.esdiari.com to the http://www.esdiari.es ones.

The big mistake I made was losing a load of domain authority from the .com domain to the .es move. The .com has been an established domain for a long time. I was advised to turn back to the .com domain which we did a few months ago.

To the problem:

The original CMS system had and archive of over 19.000 articles in a so called “hemeroteca” (spanish for archive), which we have kept. The old system generated the following url structure: http://www.esdiari.com/ArticleId-ArticleTitle.html

We have relegated the old CMS to http://www.esdiari.com/hemeroteca/hemeroteca.php so the new URL’s are http://www.esdiari.com/hemeroteca/ArticleId-ArticleTitle.html

The new CMS is powered with WordPress from the root of the site and the old CMS resides in a subdirectory called /hemeroteca this explains the URL structure pointed out earlier on.

The problem I have is Google Webmaster Tools is finding all of the original, lets say old content, that ends in “.html” articles as broken (404’s), and as I cannot find a way to redirect any file ending in “.html” (http://www.esdiari.com/ArticleId-ArticleTitle.html) to the new http://www.esdiari.com/hemeroteca/ArticleId-ArticleTitle.html without breaking the WordPress installation I am having to have to do a line per line 301 redirect which is making a massive .htaccess file.

Is there a way o doing all of these files in one go with one line of command I have tried variants of this command that I have found in the book but have had no luck so far:

RedirectMatch 301 /(.*)\.html?$ http://example.com/$1/
RedirectMatch 301 /(.*)\.html?$ http://www.esdiari.com/hemeroteca/$1/

Please does anybody have some ideas??? they would be most appreciated.

Thanks beforhand,

James.

9 Replies to “Problem redirecting content ending in .html”

Posted by Jeff Starr

Hi James,

First, are you sure that .htaccess and mod_rewrite are both active on the machine?

Then it’s important to know if the .html URLs are involved with any sort of CMS redirect (such as the way Joomla or WP redirects permalinks).

Then the first thing I would try is matching the .html files using rules that are placed before any existing CMS rules. Then perform a similar test only with the rules placed after any existing rules.

Let me know how it goes!

Posted by James Kockelbergh

Hi Jeff,

Yes .htaccess and mod_rewrite are both active on the server.

I have no idea what you mean by the following two paragraphs all I am looking for is a solution for redirecting.

http://www.esdiari.com/ArticleId-ArticleName.html

to

http://www.esdiari.com/hemeroteca/ArticleId-ArticleName.html

Without breaking the WordPress installation which is what happens every time I try to attack any file ending in .html.

Could you please help me as my .htaccess file is currently 8.517 lines long and Google has found another 16.000 broken URL’s, it is currently full of things similar to this:

Redirect 301 /6777-estatuto-autonomia-baleares-pasa-examen-probablemente-definitivo-congreso-diputados.html http://www.esdiari.com/hemeroteca/6777-estatuto-autonomia-baleares-pasa-examen-probablemente-definitivo-congreso-diputados.html

Hope to hear from you soon.

Best regards,

James.

Posted by Jeff Starr

Try this:

RewriteCond %{REQUEST_URI} ^/(.*)-(.*)\.html [NC]
RewriteRule .* http://www.esdiari.com/hemeroteca/%1-%2.html [R=301,L]
Posted by James Kockelbergh

Jeff you are a Starr take a look it seems to work fine.

This was one of the many broken links that I had:

http://www.esdiari.com/15487-celebra-aniversario-iscomar-30-cada-mes-grandes-descuentos.html

And know it redirects to:

http://www.esdiari.com/hemroteca/15487-celebra-aniversario-iscomar-30-cada-mes-grandes-descuentos.html

Like magic…

Thanks a million you cannot imagine what a nightmare this has been.

Best regards,

James.

Posted by Jeff Starr

Thank you James, glad it worked!

Let me know if I may help further, glad to do so.

Posted by James Kockelbergh

Hi Jeff,

Sorry to bother you again but I have hit another dead end.

I have managed to sort out this kind of broken URL:

http://www.esdiari.com/noticia_menorca.php?12314

With this:

RewriteCond %{REQUEST_URI} ^/noticia_menorca\.php
RewriteCond %{QUERY_STRING} ^id=(.*)
RewriteRule .* http://www.esdiari.com/hemeroteca/noticia_menorca.php?%1 [R=301,L]

But I am really stuck on solving this one as it has two variables and I am not sure how to declare both of them without breaking the previous instruction that I have declared in my .htaccess file. This is an example of the URL that I am trying to redirect:

http://www.esdiari.com/noticia_menorca.php?_dd=1&id=1731

I would Like it to go to:

http://www.esdiari.com/hemroteca/noticia_menorca.php?_dd=1&id=1731

Could you give me a hand?

Thanks beforehand.

Best regards,

James.

Posted by Jeff Starr

Hi James,

Try this:

RewriteCond %{REQUEST_URI} ^/noticia_menorca\.php
RewriteCond %{QUERY_STRING} ^_dd=(.*)&id=(.*)
RewriteRule .* http://www.esdiari.com/hemeroteca/noticia_menorca.php?_dd=%1&id=%2 [R=301,L]

For the variables, we use %1 for the first, %2 for the second, and so forth..

Let me know how it goes!

Posted by James Kockelbergh

Hi Jeff,

It has worked wonders.

With your help we have gone from 16 odd thousand broken links to just 6.000, and rapidly decreasing, in the past few days. Traffic is going back up to where it should be and we are ranking again.

Thanks a million.

Posted by Jeff Starr

Awesome!!

Glad to hear it :)