Http url image

Checking for broken links on your website – Introduction

After a website has been running for a while, website owners often find that some of the links they have created to external sites start to break. This introduction provides information on the best tools available to deal with this and gives specific information and examples of using the W3C Link Checker. More detailed information on the other tools is supplied in later posts.

Broken links happen because the owners of the sites that they have linked to either reorganise their site or sometimes just break some of the links by accident. Checking for these broken links by hand can be a very time-consuming exercise but never fear, help is at hand! This post will provide you with a number of ways of automatically checking these for both WordPress and non WordPress users.

WordPress based site

If you are developing your website using WordPress then the easiest approach is to use the Broken Link Checker plugin. This offers a regular check of links and provides email notifications of any new links that are broken. It also provides a direct visual indication of the broken link on a webpage (such as showing it with a strike through). This is the way that this site checks its links. Check back on Linear Thoughts in the future as there will be a more detailed article on how to use this plugin and the options that it provides.

Non WordPress based site

There are a number of sites available for checking links and a couple of recommended ones are listed below.

Google WebMaster Tools

Google provides Webmaster Tools for searching your site. In addition to checking the links (refered to as “Crawl Errors” by Google) this also gives access to how searches found your site and how to set up a sitemap. You are only able to use this tool if you have a Google login and are the owner of the site to be checked. You can verify ownership in a number of ways from uploading an additional html file, adding a meta tag to the home page, linking your google analytics account or by signing into your domain name provider. Note that this tool gives detailed information but none of it is available immediately, you have to wait for Google to crawl your site. These tools provide a comprehensive suite of capabilities and Linear Thoughts will run more detailed articles on how to access them in future posts.

W3C Link Checker

The W3C Link Checker page provides an immediately available and detailed way to check for problem links. By entering the link for your website in the Link Checker page and selecting “Check”, you will be supplied with a report on your website. To give you an example of how this works, a deliberately broken link has been added to this post. By clicking here the Link Checker will run a report on the page you are reading now, the report will take between one and two minutes to run. Note that you should leave all of the settings on the Link Checker page unchanged. Once it has finished, go to the results section to see its output.

The summary table on the results section will look similar to the following, note that it will not be identical as it is dependant on what is in the sidebars at the time:

W3C summary page

The summary will be followed by the specific issues, some examples are shown below:

W3C 404 not found

W3C broken fragment

W3C na example

W3C permanent redirect

Based on the outputs, you may need to make some changes to your site.

Server Error (Status 500). You should not see examples when running this page but when running the Link Checker on your own site these may appear. These occur because the web server you are testing is too busy to respond at that time. An example of that might be Twitter if there is a lot of activity at a particular point in time. These tend to only be a problem at that particular point in time but are worth keeping an eye on!

Link Not Found (Status 404). This is a broken link and will need to be fixed in the post or menu that is using it. Note that this tool does not identify where on the page the link is!

Broken Fragment (Status 200). The status 200 means that the link is ok. Webpage creation tools sometimes create fragments like this to bookmark a part on the page. Generally not a problem.

Not Checked (Status 302). Certain rules on websites prohibit these from being checked by a website like this. Generally not a problem.

Redirected (Status 301). This can occur for a number of reasons. One is that the destination website has redirected the webpage to a new location. The information provided here will allow you to see the new location, and based on that you may choose to update the link but it not essential. In other cases, as is the case in the example shown below, the redirect took place because a URL shortening website had been used so that the link can be best used on Twitter for example. In these cases the link should not be changed.

Finally, note that depending on the size of your site, the link check may take a good while longer to run that the example shown here. Also be aware that the W3C site does not place any time limit on a request and some requests can take a long time to get a response and the site will stop until it gets that. We have found that web links to publicly available webcams are an example where the response can take a number of minutes or in the worst case not respond at all. If you find a link is behaving in this way then it is generally worth considering its removal from your website as any user will just find their browser will time out on this link anyway.

We hope this post has provided you with enough information to look at finding broken links on your own site and that you will check back for our more detailed articles on some of these tools in later posts!

Copyright 2013 Kev Scott
All Rights Reserved, Content created by Kev Scott for Linear Thoughts