org.archive.crawler.scope
Class BroadScope
java.lang.Object
javax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.framework.Filter
org.archive.crawler.framework.CrawlScope
org.archive.crawler.scope.ClassicScope
org.archive.crawler.scope.BroadScope
- All Implemented Interfaces:
- java.io.Serializable, javax.management.DynamicMBean
public class BroadScope
- extends ClassicScope
A CrawlScope instance defines which URIs are "in"
a particular crawl.
It is essentially a Filter which determines, looking at
the totality of information available about a
CandidateURI/CrawlURI instamce, if that URI should be
scheduled for crawling.
Dynamic information inherent in the discovery of the
URI -- such as the path by which it was discovered --
may be considered.
Dynamic information which requires the consultation
of external and potentially volatile information --
such as current robots.txt requests and the history
of attempts to crawl the same URI -- should NOT be
considered. Those potentially high-latency decisions
should be made at another step. .
- Author:
- gojomo
- See Also:
- Serialized Form
Constructor Summary |
BroadScope(java.lang.String name)
Constructor. |
Method Summary |
protected boolean |
focusAccepts(java.lang.Object o)
Check if URI is accepted by the focus of this scope. |
protected boolean |
transitiveAccepts(java.lang.Object o)
|
Methods inherited from class org.archive.crawler.framework.CrawlScope |
addSeed, addSeedListener, checkClose, getSeedfile, initialize, isSameHost, isSeed, listUsedFiles, refreshSeeds, seedsIterator, seedsIterator, toString |
Methods inherited from class org.archive.crawler.settings.ComplexType |
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
getName, hashCode |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
BroadScope
public BroadScope(java.lang.String name)
- Constructor.
- Parameters:
name
- Name of this crawlscope.
transitiveAccepts
protected boolean transitiveAccepts(java.lang.Object o)
- Overrides:
transitiveAccepts
in class ClassicScope
- Parameters:
o
- the URI to check.
- Returns:
- True if transitive filter accepts passed object.
focusAccepts
protected boolean focusAccepts(java.lang.Object o)
- Check if URI is accepted by the focus of this scope.
This method should be overridden in subclasses.
- Overrides:
focusAccepts
in class ClassicScope
- Parameters:
o
- the URI to check.
- Returns:
- True if focus filter accepts passed object.
Copyright © 2003-2011 Internet Archive. All Rights Reserved.