Help > Advanced Settings

The Advanced Settings dialog allows you to set many options which many users may wish to leave at their default settings. The dialog can be found on the Settings drop down menu.

The Advanced Settings dialog contains several tabs. The settings found on each tab are explained below. The links below allow quick navigation to information on each tab.


Scope Tab

Screen shot item number 1

If selected this option will stop pages from being trawled more than once, because they are linked to with URLs which contain varying case.

If you use URLs where the returned page is different when the case of some of the URL changes, you should turn this option off.

Screen shot item number 2

When using the Judge Scope by original link URL option a page will be judged to be in/out of scope based on the URL of the link pointing to it.

Using the Judge Scope by redirected URL option, scope will be judged by the HTTP redirected URL of the page (if any). This option results in slower trawls.

The Scope tab of the advanced settings dialog

Above: The Scope tab of the advanced settings dialog

Screen shot item number 3

Limiting the maximum length of the URL of page means that pages with longer URLs will not be trawled. This prevents recursion in dynamically generated web sites.

Recursion causes DeepTrawl to keep trawling identical pages with ever growing URLs, generated specifically to be linked from the link source page.

If your site uses very long URLs you may want to raise this limit or switch off this option.

Screen shot item number 4

When trawling pages found on the local disk, there is no MIME type given for linked files, therefore DeepTrawl attempts to assess whether a linked file is a web page.

If DeepTrawl is not finding some of your content you might want to experiment with turning this option off.



Spider Tab

Screen shot item number 1

The number of threads sets how many pages DeepTrawl should concurrently search.

A lower number of threads will probably mean DeepTrawl will take longer to complete most trawls but will reduce the amount of system resources used.

Using a higher number of threads will mean many trawls are completed more quickly, but may cause the DeepTrawl interface to become less responsive and in extreme circumstances could even cause DeepTrawl to crash.

The Spider tab of the advanced settings dialog

Above: The Spider tab of the advanced settings dialog

Screen shot item number 2

DeepTrawl keeps information about downloaded images cached in memory in order to prevent the same image being downloaded many times. This speeds up Trawls but also takes up computer memory.

The initial setting of 10,000 items is sufficient for most web sites, but if your web site is very large you may want to increase the cache size.

Screen shot item number 3

DeepTrawl also keeps information about downloaded pages cached in memory in order to prevent the same page being downloaded many times. This speeds up Trawls but also takes up computer memory.

The initial setting of 10,000 items is sufficient for most web sites, but if your web site is very large you may want to increase the cache size.



Errors Tab

The Errors tab of the advanced settings dialog

Above: The Errors tab of the advanced settings dialog

Screen shot item number 1

When attempting to download information from the web, a server may not respond for a long period of time. After the Timeout DeepTrawl will stop listening for a response and carry on downloading other information.

Screen shot item number 2

When a server does not give any response to a request, DeepTrawl can be set to try several times in case this problem was caused by a glitch in the network between you and the server.

The initial setting of five retries can be reduced to speed up trawls.



Parser Tab

The Parser is the part of DeepTrawl which interprets HTML into meaningful instructions. You many not wish for DeepTrawl to process information contained within HTML quotes or an array of other tags. These settings can be made here.

The Parser tab of the advanced settings dialog

Above: The Parser tab of the advanced settings dialog



Interface Tab

The Interface tab of the advanced settings dialog

Above: The Interface tab of the advanced settings dialog

Screen shot item number 1

By default, when a trawl is started, DeepTrawl will automatically switch to the Trawl for... tab. When the first problem is found, the Problems tab will be shown.

The option enables / disables this function.

Screen shot item number 2

By default, DeepTrawl creates a beeping sound when attention is needed (e.g. when a trawl is finished).

This options turns this sound on and off.

Screen shot item number 3

Pages / errors found by DeepTrawl are divided into pages for easy reading.

This option sets how many results are shown on each page.



Export/Email Tab

The Export tab of the advanced settings dialog

Above: The Export tab of the advanced settings dialog

Screen shot item number 1

This option specifies the maximum size of an individual exported HTML file. If this is size is exceeded the file will be truncated.

Screen shot item number 2

This option specifies the maximum size of an individual exported CSV file. If this is size is exceeded the file will be truncated.