org.archive.crawler.postprocessor
Class SupplementaryLinksScoper
java.lang.Object
javax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.framework.Processor
org.archive.crawler.framework.Scoper
org.archive.crawler.postprocessor.SupplementaryLinksScoper
- All Implemented Interfaces:
- java.io.Serializable, javax.management.DynamicMBean
public class SupplementaryLinksScoper
- extends Scoper
Run CandidateURI links carried in the passed CrawlURI through a filter
and 'handle' rejections.
Used to do supplementary processing of links after they've been scope
processed and ruled 'in-scope' by LinkScoper. An example of
'supplementary processing' would check that a Link is intended for
this host to crawl in a multimachine crawl setting. Configure filters to
rule on links. Default handler writes rejected URLs to disk. Subclass
to handle rejected URLs otherwise.
- Author:
- stack
- See Also:
- Serialized Form
Methods inherited from class org.archive.crawler.framework.Processor |
checkForInterrupt, getController, getDecideRule, getDefaultNextProcessor, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn |
Methods inherited from class org.archive.crawler.settings.ComplexType |
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
getName, hashCode |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
ATTR_LINKS_DECIDE_RULES
public static final java.lang.String ATTR_LINKS_DECIDE_RULES
- See Also:
- Constant Field Values
SupplementaryLinksScoper
public SupplementaryLinksScoper(java.lang.String name)
- Parameters:
name
- Name of this filter.
innerProcess
protected void innerProcess(CrawlURI curi)
- Description copied from class:
Processor
- Classes subclassing this one should override this method to perform
their custom actions on the CrawlURI.
- Overrides:
innerProcess
in class Processor
- Parameters:
curi
- The CrawlURI being processed.
isInScope
protected boolean isInScope(CandidateURI caUri)
- Description copied from class:
Scoper
- Schedule the given
CandidateURI
with the Frontier.
- Overrides:
isInScope
in class Scoper
- Parameters:
caUri
- The CandidateURI to be scheduled.
- Returns:
- true if CandidateURI was accepted by crawl scope, false
otherwise.
getLinkRules
protected DecideRule getLinkRules(java.lang.Object o)
outOfScope
protected void outOfScope(CandidateURI caUri)
- Called when a CandidateUri is ruled out of scope.
- Overrides:
outOfScope
in class Scoper
- Parameters:
caUri
- CandidateURI that is out of scope.
Copyright © 2003-2011 Internet Archive. All Rights Reserved.