|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectjavax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.framework.Processor
org.archive.crawler.prefetch.PreconditionEnforcer
public class PreconditionEnforcer
Ensures the preconditions for a fetch -- such as DNS lookup or acquiring and respecting a robots.txt policy -- are satisfied before a URI is passed to subsequent stages.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType |
|---|
ComplexType.MBeanAttributeInfoIterator |
| Field Summary | |
|---|---|
static java.lang.String |
ATTR_CALCULATE_ROBOTS_ONLY
|
static java.lang.String |
ATTR_IP_VALIDITY_DURATION
seconds to keep IP information for |
static java.lang.String |
ATTR_ROBOTS_VALIDITY_DURATION
seconds to cache robots info |
static java.lang.Boolean |
DEFAULT_CALCULATE_ROBOTS_ONLY
whether to calculate robots exclusion without applying |
| Fields inherited from class org.archive.crawler.framework.Processor |
|---|
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules |
| Fields inherited from class org.archive.crawler.settings.ComplexType |
|---|
definition, definitionMap |
| Constructor Summary | |
|---|---|
PreconditionEnforcer(java.lang.String name)
|
|
| Method Summary | |
|---|---|
long |
getIPValidityDuration(CrawlURI curi)
Get the maximum time a dns-record is valid. |
long |
getRobotsValidityDuration(CrawlURI curi)
Get the maximum time a robots.txt is valid. |
protected void |
innerProcess(CrawlURI curi)
Classes subclassing this one should override this method to perform their custom actions on the CrawlURI. |
boolean |
isIpExpired(CrawlURI curi)
Return true if ip should be looked up. |
boolean |
isRobotsExpired(CrawlURI curi)
Is the robots policy expired. |
| Methods inherited from class org.archive.crawler.framework.Processor |
|---|
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn |
| Methods inherited from class org.archive.crawler.settings.ModuleType |
|---|
addElement, listUsedFiles |
| Methods inherited from class org.archive.crawler.settings.Type |
|---|
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
| Methods inherited from class javax.management.Attribute |
|---|
getName, hashCode |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public static final java.lang.String ATTR_IP_VALIDITY_DURATION
public static final java.lang.String ATTR_ROBOTS_VALIDITY_DURATION
public static final java.lang.Boolean DEFAULT_CALCULATE_ROBOTS_ONLY
public static final java.lang.String ATTR_CALCULATE_ROBOTS_ONLY
| Constructor Detail |
|---|
public PreconditionEnforcer(java.lang.String name)
| Method Detail |
|---|
protected void innerProcess(CrawlURI curi)
Processor
innerProcess in class Processorcuri - The CrawlURI being processed.public long getIPValidityDuration(CrawlURI curi)
curi - the uri this time is valid for.
public boolean isIpExpired(CrawlURI curi)
curi - the URI to check.
public long getRobotsValidityDuration(CrawlURI curi)
curi -
public boolean isRobotsExpired(CrawlURI curi)
curi -
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||