| 
 | ||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectjavax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.framework.Processor
org.archive.crawler.prefetch.PreconditionEnforcer
public class PreconditionEnforcer
Ensures the preconditions for a fetch -- such as DNS lookup or acquiring and respecting a robots.txt policy -- are satisfied before a URI is passed to subsequent stages.
| Nested Class Summary | 
|---|
| Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType | 
|---|
| ComplexType.MBeanAttributeInfoIterator | 
| Field Summary | |
|---|---|
| static java.lang.String | ATTR_CALCULATE_ROBOTS_ONLY | 
| static java.lang.String | ATTR_IP_VALIDITY_DURATIONseconds to keep IP information for | 
| static java.lang.String | ATTR_ROBOTS_VALIDITY_DURATIONseconds to cache robots info | 
| static java.lang.Boolean | DEFAULT_CALCULATE_ROBOTS_ONLYwhether to calculate robots exclusion without applying | 
| Fields inherited from class org.archive.crawler.framework.Processor | 
|---|
| ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules | 
| Fields inherited from class org.archive.crawler.settings.ComplexType | 
|---|
| definition, definitionMap | 
| Constructor Summary | |
|---|---|
| PreconditionEnforcer(java.lang.String name) | |
| Method Summary | |
|---|---|
|  long | getIPValidityDuration(CrawlURI curi)Get the maximum time a dns-record is valid. | 
|  long | getRobotsValidityDuration(CrawlURI curi)Get the maximum time a robots.txt is valid. | 
| protected  void | innerProcess(CrawlURI curi)Classes subclassing this one should override this method to perform their custom actions on the CrawlURI. | 
|  boolean | isIpExpired(CrawlURI curi)Return true if ip should be looked up. | 
|  boolean | isRobotsExpired(CrawlURI curi)Is the robots policy expired. | 
| Methods inherited from class org.archive.crawler.framework.Processor | 
|---|
| checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn | 
| Methods inherited from class org.archive.crawler.settings.ModuleType | 
|---|
| addElement, listUsedFiles | 
| Methods inherited from class org.archive.crawler.settings.Type | 
|---|
| addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient | 
| Methods inherited from class javax.management.Attribute | 
|---|
| getName, hashCode | 
| Methods inherited from class java.lang.Object | 
|---|
| clone, finalize, getClass, notify, notifyAll, wait, wait, wait | 
| Field Detail | 
|---|
public static final java.lang.String ATTR_IP_VALIDITY_DURATION
public static final java.lang.String ATTR_ROBOTS_VALIDITY_DURATION
public static final java.lang.Boolean DEFAULT_CALCULATE_ROBOTS_ONLY
public static final java.lang.String ATTR_CALCULATE_ROBOTS_ONLY
| Constructor Detail | 
|---|
public PreconditionEnforcer(java.lang.String name)
| Method Detail | 
|---|
protected void innerProcess(CrawlURI curi)
Processor
innerProcess in class Processorcuri - The CrawlURI being processed.public long getIPValidityDuration(CrawlURI curi)
curi - the uri this time is valid for.
public boolean isIpExpired(CrawlURI curi)
curi - the URI to check.
public long getRobotsValidityDuration(CrawlURI curi)
curi - 
public boolean isRobotsExpired(CrawlURI curi)
curi - 
| 
 | ||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||