When starting a trawl or setting a timer, you will have the oportunity to set the scope of the trawl. The scope defines how much or how little of your website should be trawled. For instance if you only wish to trawl all the pages in one directory, the scope can achieve this.
When you first see the Scope dialog, the include tab is selected by default. Using the include tab you can define exactly which pages should be included in your trawl. The settings for this tab are explained below...

Above: The Scope dialog
|
|
|
Domain / any sub-domain
This scope setting provides the widest scope for a trawl. All pages from the same domain as the start page will be trawled, even if they are in a different sub-domain. For instance if the start page: http://MyDomain.com/index.htm was given, all the pages below would be included in the scope of the trawl: http://MyDomain.com/contact.htm http://MyDomain.com/products/p1.htm http://MySubDomain.MyDomain.com/index.htm This scope setting was designed for sites which include multiple directories and may have multiple sub-domains. It is the default scope level as it is most commonly used and is recommended if you're not sure what scope setting to use. If DeepTrawl finds pages which you do not wish to be listed because you consider them to be part of a different website you might consider using one of the other, more restrictive, scope settings. |
![]()
|
|
|
Same Domain only
Using this scope setting, all pages from the same domain as the start page will be trawled, except those in different sub-domains. For instance if the start page: http://MyDomain.com/index.htm was given, all the pages below would be included in the scope of the trawl: http://MyDomain.com/contact.htm http://MyDomain.com/products/p1.htm The page below would be excluded: http://MySubDomain.MyDomain.com/index.htm This scope setting was designed for sites which include multiple directories within the same domain but not sub-domains. If DeepTrawl finds pages which you do not wish to be listed because you consider them to be part of a different website you might consider using one of the other, more restrictive, scope settings. |
![]()
|
|
|
Same Directory / any sub-directory
Using this scope setting, all pages from the same directory / any subdirectory of the start page will be trawled. For instance if the start page: http://MyDomain.com/MyArea/index.htm was given, the pages below would be included in the scope of the trawl: http://MyDomain.com/MyArea/contact.htm http://MyDomain.com/MyArea/SubDir/contact.htm The page below would be excluded from the trawl: http://MyDomain.com/AnotherArea/index.htm This scope setting was designed for sites which only occupy a few directories on the server. This is common for sites where the domain name is shared, for example this is often the case for web space given by an ISP or blogging site. Note: If you enter a URL as the starting page which ends in only a directory name, e.g. http://www.MyDomain/myDir/ You must make sure to add / (forward slash) after the directory name. |
![]()
|
|
|
Same Directory only
Using this scope setting, all pages from the same directory of the start page will be trawled, but not those from sub-directories. For instance if the start page: http://MyDomain.com/MyArea/index.htm was given, the page below would be included in the scope of the trawl: http://MyDomain.com/MyArea/contact.htm The pages below would be excluded from the trawl: http://MyDomain.com/MyArea/MyDir/index.htm http://MyDomain.com/AnotherArea/index.htm This scope setting was designed for sites which only occupy a single directory on the server. Note: If you enter a URL as the starting page which ends in only a directory name, e.g. http://www.MyDomain/myDir/ You must make sure to add / (forward slash) after the directory name. |
![]()
|
|
|
Start page only
Using this scope setting, only the start page will be trawled. For instance if the start page: http://MyDomain.com/index.htm was given, only this page would be trawled. |
![]()
|
|
|
URL includes text
Using this scope setting, only pages whose URL includes a user defined piece of text will be trawled. This scope setting is especially useful when trawling sites whose HTML is dynamically generated on the server side. In this case the structure of the site is often not defined by directories but instead by keywords, set in the design of the site. One of those keywords might be entered to define which pages should be trawled. This scope setting is also useful for web sites which span several domains. The common part of the domain name can be entered to trawl all the required pages. For example, the text Microsoft could be entered to include all the pages within the microsoft.com and microsoft.co.uk domains in the scope. |
The exclude tab allows you to enter full or part URLs which should be excluded from the trawl. This may be used to exclude anything from a single page to an entire directory or more.
The exclude tab is shown below, with each feature explained beneath...

Above: The Scope exclude tab
|
|
|
The main text area allows you to add one or more full or part URLs which should be excluded from the trawl. For example, you could add: http://www.MyDomain.com/myFile.htm to exclude only myFile.htm, or... http://www.MyDomain.com/myDir/ to exclude all the pages in the myDir directory. Note: All full/part URLs must be written one per line. |
![]()
|
|
|
The Load list button allows you to load a list of exclusions from a text file. |
![]()
|
|
|
The Clear list button allows you to remove all the entries in the list. |
![]()
|
|
|
The Save list button allows you to save the list of entries into a text file. |
Want to find out more about DeepTrawl? Please use the links below...