Help > Scope


When starting a trawl or setting a timer, you will have the oportunity to set the scope of the trawl. The scope defines how much or how little of your website should be trawled. For instance if you only wish to trawl all the pages in one directory, the scope can achieve this.

The Include tab

When you first see the Scope dialog, the include tab is selected by default. Using the include tab you can define exactly which pages should be included in your trawl. The settings for this tab are explained below...

The Scope dialog

Above: The Scope dialog



Screen shot item number 1

  

Domain / any sub-domain

This scope setting provides the widest scope for a trawl. All pages from the same domain as the start page will be trawled, even if they are in a different sub-domain.

For instance if the start page:

http://MyDomain.com/index.htm

was given, all the pages below would be included in the scope of the trawl:

http://MyDomain.com/contact.htm

http://MyDomain.com/products/p1.htm

http://MySubDomain.MyDomain.com/index.htm

This scope setting was designed for sites which include multiple directories and may have multiple sub-domains. It is the default scope level as it is most commonly used and is recommended if you're not sure what scope setting to use.

If DeepTrawl finds pages which you do not wish to be listed because you consider them to be part of a different website you might consider using one of the other, more restrictive, scope settings.

Dividing line

Screen shot item number 2

  

Same Domain only

Using this scope setting, all pages from the same domain as the start page will be trawled, except those in different sub-domains.

For instance if the start page:

http://MyDomain.com/index.htm

was given, all the pages below would be included in the scope of the trawl:

http://MyDomain.com/contact.htm

http://MyDomain.com/products/p1.htm

The page below would be excluded:

http://MySubDomain.MyDomain.com/index.htm

This scope setting was designed for sites which include multiple directories within the same domain but not sub-domains.

If DeepTrawl finds pages which you do not wish to be listed because you consider them to be part of a different website you might consider using one of the other, more restrictive, scope settings.

Dividing line

Screen shot item number 3

  

Same Directory / any sub-directory

Using this scope setting, all pages from the same directory / any subdirectory of the start page will be trawled.

For instance if the start page:

http://MyDomain.com/MyArea/index.htm

was given, the pages below would be included in the scope of the trawl:

http://MyDomain.com/MyArea/contact.htm

http://MyDomain.com/MyArea/SubDir/contact.htm

The page below would be excluded from the trawl:

http://MyDomain.com/AnotherArea/index.htm

This scope setting was designed for sites which only occupy a few directories on the server. This is common for sites where the domain name is shared, for example this is often the case for web space given by an ISP or blogging site.

Note: If you enter a URL as the starting page which ends in only a directory name, e.g.

http://www.MyDomain/myDir/

You must make sure to add / (forward slash) after the directory name.

Dividing line

Screen shot item number 4

  

Same Directory only

Using this scope setting, all pages from the same directory of the start page will be trawled, but not those from sub-directories.

For instance if the start page:

http://MyDomain.com/MyArea/index.htm

was given, the page below would be included in the scope of the trawl:

http://MyDomain.com/MyArea/contact.htm

The pages below would be excluded from the trawl:

http://MyDomain.com/MyArea/MyDir/index.htm

http://MyDomain.com/AnotherArea/index.htm

This scope setting was designed for sites which only occupy a single directory on the server.

Note: If you enter a URL as the starting page which ends in only a directory name, e.g.

http://www.MyDomain/myDir/

You must make sure to add / (forward slash) after the directory name.

Dividing line

Screen shot item number 5

  

Start page only

Using this scope setting, only the start page will be trawled.

For instance if the start page:

http://MyDomain.com/index.htm

was given, only this page would be trawled.

Dividing line

Screen shot item number 6

  

URL includes text

Using this scope setting, only pages whose URL includes a user defined piece of text will be trawled.

This scope setting is especially useful when trawling sites whose HTML is dynamically generated on the server side. In this case the structure of the site is often not defined by directories but instead by keywords, set in the design of the site. One of those keywords might be entered to define which pages should be trawled.

This scope setting is also useful for web sites which span several domains. The common part of the domain name can be entered to trawl all the required pages. For example, the text Microsoft could be entered to include all the pages within the microsoft.com and microsoft.co.uk domains in the scope.


The exclude tab

The exclude tab allows you to enter full or part URLs which should be excluded from the trawl. This may be used to exclude anything from a single page to an entire directory or more.

The exclude tab is shown below, with each feature explained beneath...

The Scope exclude tab

Above: The Scope exclude tab


Screen shot item number 1

  

The main text area allows you to add one or more full or part URLs which should be excluded from the trawl. For example, you could add:

http://www.MyDomain.com/myFile.htm

to exclude only myFile.htm, or...

http://www.MyDomain.com/myDir/

to exclude all the pages in the myDir directory.

Note: All full/part URLs must be written one per line.

Dividing line

Screen shot item number 2

  

The Load list button allows you to load a list of exclusions from a text file.

Dividing line

Screen shot item number 3

  

The Clear list button allows you to remove all the entries in the list.

Dividing line

Screen shot item number 4

  

The Save list button allows you to save the list of entries into a text file.