Forum Topic: 500 – 404 errors after updating HTML site to WordPress

Forum: .htaccess Forum : WordPress • Posted by Peter Tremblay • Updated:

Thanks for writing the HT Access Book, it is an excellent resource and I'm using it a lot.

We recently updated one of our sites from an HTML based site usig Front Page to a Word Press site using a Studio Press theme. I recently noticed we have a lot of 404 errors in the Google Webmaster Tools. About 500 - eek!

I would say 50 to 100 are from old article pages that are no longer on the site and I?m putting in 301 redirects in the HT Access file now to fix these up.

Then there are two other 404s that are causing most of the problems and I can't find any information on the Internet on these ones or how to redirect them. I wanted to know if you have seen them before and any ideas if they can be resolved in the HT Access file.

Here are the two different cases that I'm seeing in the Google Webmasters Tools Crawl Error Report:

1. The word "out" in front of the site name.

Example: out/www.sample.com/page.htm

2. The article name is in front of the domain name. The article domain does not have the .htm suffix, like I would expect for an HTML page, maybe it is a WordPress issue.

Example: page/www.sample.com

Thanks for your help,

Peter

4 Replies to “500 – 404 errors after updating HTML site …”

Jeff Starr
Posted by Jeff Starr

Hi Peter,

This is interesting, may I ask if those are the complete URLs? I ask because URL requests should only work if they begin with a domain or subdomain, for example:

http://example.com/ loads the homepage

http://subdomain.example.com/ loads the subdomain's homepage

http://anything/example.com/ is invalid and won't request anything from example.com (because of the invalid forward slash)

So I'm guessing that either there is more to the URLs reported in Google webmaster tools, or else it is an issue with existing redirects, either thru .htaccess or WordPress. The only way these requests would appear at your site is if they were included in the query string rather than the main part of the request.

So to continue the investigation, will you double-check webmaster tools for the complete URL requests.. perhaps more info is available there, and if not, the access logs for your server will reveal more information to help figure this out.

Posted by Peter Tremblay •

Hi Jeff,

Thanks for your quick review and response of my question.

I followed your suggestion to double-check the webmaster tools for the complete URL request and this provided me two pieces of very useful information:

1. The full link is: http://www.example.com/page/www.example.com

2. I was able to identify one of the pages that was causing the issue by clicking on the tab titled "linked from".

From there I was able to find a footer link that was incorrectly constructed as per the example above. Since it is a footer link it is on every page. This solved the second issue I reported above.

I then looked at the first issue and here is an example of the full real URL from webmaster tools:

http://www.colormecontacts.com/out/www.colormecontacts.com/splash.htm

I found a link that is incorrectly point to an old html page, i.e. www.colormecontacts.com/splash.htm. But I could not find the entire link on the page with the "out" in it. I will keep investigating this one and let me know if you have any additional advice on resolving.

I downloaded the zipped access log file from CPanel and when I unzipped the file it is a dos application. The help file says I can open it with a text editor. When I tried opening the file with Textpad I just got hex jibberish. Any thoughts on opening the access log file.

Thanks again for all of your help.

Cheers,

Peter

Jeff Starr
Posted by Jeff Starr

Yes, http://www.example.com/page/www.example.com makes much more sense, glad to hear it lead to a solution.

For the second issue, any instance of "out" (with or without the other characters) should be carefully scrutinized. A number of steps are involved in the processing of URL requests, such that "out" could end up in the URL even though it's not initially displayed literally anywhere on the site.

It could also be that your site/server is being scanned by scripts and bots that tend to target weird made-up URLs like the ones you're seeing. If this is the case, find those requests in the access log and see if the "referrer" is your domain or somewhere else.

And speaking of access logs.. not sure why that may be happening, but can take a look if you can send a zipped copy of the file via my contact form: https://perishablepress.com/contact/

Posted by Peter Tremblay •

Hi Jeff,

Thanks for the offer to take a look at the log file. Yesterday when I was in CPanel I set it to archive logs in my home directory at the end of each stats run. I downloaded that file and I was able to unzip it and view it in Weblog Expert. Now I just need to figure out what I'm looking for :-)

The other file that I can't view is close 4Megs and I'm guessing it is the error log file from the beginning of the web site. If you are interested I can still send it your way, but for now I'm going to download and review the daily logs.

Thanks again for your excellent support. It is very appreciated.

Cheers,

Peter