This Check finds broken or invalid hyperlinks. These might be:
- Links to pages which no longer exist (404 / 410 errors)
- Other HTTP error codes
- Malformed URLs
- Non-contactable servers
- Links with too many chained redirections
- Links to pages with specified error text (useful when a server doesn't return proper error codes)
- Missing bookmarks (anchors)
- Links that take too long to download
Types of hyperlinks checked are:
- Text links
- Images which activate a link when clicked
- Image maps
DeepTrawl can check many other types of links too. See Check Dependencies.
Invalid hyperlinks are a major problem for any website. At best they make the site look unprofessional. At worst, they can shake user confidence so much that they leave the site. A recent study by Deep Cognition showed that 92% of fortune 500 sites have broken links, in fact 13% of all pages in fortune 500 sites have at least one broken link. This makes broken links the worst quality issue facing site owners and viewers. It's so bad, there's even a TED talk devoted to it!
Most of the time a 404 error is caused because the owner of the destination website has removed or moved the target page. Either remove the link from or point it to an alternative page.
Malformed URL errors
Most of the time, malformed URL errors are caused by a typo in your HTML. Try reviewing the link URL to make sure it is valid.
Non-contactable server errors
A server may only be temporarily non-contactable. If you are worried the server is unavailable too often, it is recommended you remove this link from your site.
Solving missing bookmarks (anchor) errors
These errors are caused because the destination page does not contain an appropriate bookmark/anchor. If you own the page, address this using the following HTML:
The following URL points to the specific part of the webpage where the above tag is located:
Note: Anchor checking is turned off by default.
Configuring this check
Select the Settings link next to the check in the Checks tab or select: Settings > Check settings > Check Hyperlinks from the menus.
Note: De-selecting any of the following may hide other errors.
Find malformed URLs
Check for badly structured URLs (for example full URLs which have a protocol which is not widely recognized)
Find errors contacting remote servers
Stop DeepTrawl from showing a problem when a server cannot be reached over the Internet. This should usually be left on.
An error will be shown if the link times out (the timeout setting is in the advanced settings, connection tab).
Error if linked page download takes longer than x
Shows an error if the download of a linked page takes longer than the specified time. It's a good idea to set this too a high value; remember DeepTrawl could be taking a lot of your local bandwidth, making the download seem slower.
Note: Applies to linked html pages only, not other linked files
Follow HTTP redirections and report any errors which occur with any of the redirections or the resource found at the end of the redirection(s).
All redirections are errors
Ignore any HTTP redirections when downloading links. An error will be shown for any redirection when downloading a link.
Error if too many redirections
Switch this off to suppress errors where DeepTrawl could not reach a linked resource because too many redirections were used. For changing the maximum number of redirections, see advanced settings, connection tab.
Find missing anchors (bookmarks)
Switch this off to avoid checking for the existence of anchors in linked pages. Anchors allow a link to point at a specific section of a webpage, but they require a special tag to be included in the destination page. In some url schemes this can lead to false positives & so is disabled by default.
Find links to pages with text:
If a link points to a page containing the specified text, treat it as invalid. Use this option to find error pages with known text which would otherwise be missed because the HTTP code returned with them indicates that they are fine.
This is useful when...
- Your server is set to redirect requests for non-existent pages (404s) to a special error page, but the server does not return the 404 HTTP error code when this occurs.
- Your network or ISP redirects requests to non-existent servers to a standard error page which doesn't use an http error code
Enter the text using basic boolean search terms, indicating whether capitalization matters.
Error HTTP codes
Switch on or off to search for specific HTTP error codes. For instance you may wish to never see errors caused by 401 (unauthorized) HTTP codes.