org.archive.crawler.scope
Class FilterScope

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Filter
                      extended by org.archive.crawler.framework.CrawlScope
                          extended by org.archive.crawler.scope.ClassicScope
                              extended by org.archive.crawler.scope.FilterScope
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean

Deprecated. Use DecidingScope

public class FilterScope
extends ClassicScope

A core CrawlScope suitable for the most common crawl needs. Roughly, its logic is that a URI is included if: (( isSeed(uri) || focusFilter.accepts(uri) ) || transitiveFilter.accepts(uri) ) && ! excludeFilter.accepts(uri) The focusFilter may be specified by either: - adding a 'mode' attribute to the scope element. mode="broad" is equivalent to no focus; modes "path", "host", and "domain" imply a SeedExtensionFilter will be used, with the scope element providing its configuration - adding a focus subelement If unspecified, the focusFilter will default to an accepts-all filter. The transitiveFilter may be specified by supplying a transitive subelement. If unspecified, a TransclusionFilter will be used, with the scope element providing its configuration. The excludeFilter may be specified by supplying a exclude subelement. If unspecified, a accepts-none filter will be used -- meaning that no URIs will pass the filter and thus be excluded.

Author:
gojomo
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
static java.lang.String ATTR_FOCUS_FILTER
          Deprecated.  
static java.lang.String ATTR_TRANSITIVE_FILTER
          Deprecated.  
(package private)  Filter focusFilters
          Deprecated.  
(package private)  Filter transitiveFilter
          Deprecated.  
 
Fields inherited from class org.archive.crawler.scope.ClassicScope
ATTR_EXCLUDE_FILTER, ATTR_FORCE_ACCEPT_FILTER, ATTR_MAX_LINK_HOPS, ATTR_MAX_TRANS_HOPS
 
Fields inherited from class org.archive.crawler.framework.CrawlScope
ATTR_NAME, ATTR_REREAD_SEEDS_ON_CONFIG, ATTR_SEEDS, DEFAULT_REREAD_SEEDS_ON_CONFIG, seedListeners
 
Fields inherited from class org.archive.crawler.framework.Filter
ATTR_ENABLED
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definitionMap
 
Constructor Summary
FilterScope(java.lang.String name)
          Deprecated.  
 
Method Summary
protected  boolean focusAccepts(java.lang.Object o)
          Deprecated. Check if URI is accepted by the focus of this scope.
protected  boolean transitiveAccepts(java.lang.Object o)
          Deprecated.  
 
Methods inherited from class org.archive.crawler.scope.ClassicScope
additionalFocusAccepts, exceedsMaxHops, excludeAccepts, forceAccepts, innerAccepts, kickUpdate
 
Methods inherited from class org.archive.crawler.framework.CrawlScope
addSeed, addSeedListener, checkClose, getSeedfile, initialize, isSameHost, isSeed, listUsedFiles, refreshSeeds, seedsIterator, seedsIterator, toString
 
Methods inherited from class org.archive.crawler.framework.Filter
accepts, getFilterOffPosition, returnTrueIfMatches
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

ATTR_FOCUS_FILTER

public static final java.lang.String ATTR_FOCUS_FILTER
Deprecated. 
See Also:
Constant Field Values

ATTR_TRANSITIVE_FILTER

public static final java.lang.String ATTR_TRANSITIVE_FILTER
Deprecated. 
See Also:
Constant Field Values

focusFilters

Filter focusFilters
Deprecated. 

transitiveFilter

Filter transitiveFilter
Deprecated. 
Constructor Detail

FilterScope

public FilterScope(java.lang.String name)
Deprecated. 
Method Detail

transitiveAccepts

protected boolean transitiveAccepts(java.lang.Object o)
Deprecated. 
Overrides:
transitiveAccepts in class ClassicScope
Parameters:
o -
Returns:
True if transitive filter accepts passed object.

focusAccepts

protected boolean focusAccepts(java.lang.Object o)
Deprecated. 
Description copied from class: ClassicScope
Check if URI is accepted by the focus of this scope. This method should be overridden in subclasses.

Overrides:
focusAccepts in class ClassicScope
Parameters:
o -
Returns:
True if focus filter accepts passed object.


Copyright © 2003-2005 Internet Archive. All Rights Reserved.