org.archive.crawler.postprocessor
Class LowDiskPauseProcessor

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Processor
                      extended by org.archive.crawler.postprocessor.LowDiskPauseProcessor
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean

public class LowDiskPauseProcessor
extends Processor

Processor module which uses 'df -k', where available and with the expected output format (on Linux), to monitor available disk space and pause the crawl if free space on monitored filesystems falls below certain thresholds.

See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
static java.lang.String ATTR_MONITOR_MOUNTS
          List of mounts to monitor; should match "Mounted on" column of 'df' output
static java.lang.String ATTR_PAUSE_THRESHOLD
          Space available level below which a crawl-pause should be triggered.
static java.lang.String ATTR_RECHECK_THRESHOLD
          Amount of content received between each recheck of free space
static java.util.regex.Pattern AVAILABLE_EXTRACTOR
           
protected  int contentSinceCheck
           
static java.lang.String DEFAULT_MONITOR_MOUNTS
           
static int DEFAULT_PAUSE_THRESHOLD
           
static int DEFAULT_RECHECK_THRESHOLD
           
static java.util.regex.Pattern VALID_DF_OUTPUT
           
 
Fields inherited from class org.archive.crawler.framework.Processor
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
LowDiskPauseProcessor(java.lang.String name)
           
 
Method Summary
protected  void innerProcess(CrawlURI curi)
          Notes a CrawlURI's content size in its running tally.
 
Methods inherited from class org.archive.crawler.framework.Processor
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement, listUsedFiles
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

ATTR_MONITOR_MOUNTS

public static final java.lang.String ATTR_MONITOR_MOUNTS
List of mounts to monitor; should match "Mounted on" column of 'df' output

See Also:
Constant Field Values

DEFAULT_MONITOR_MOUNTS

public static final java.lang.String DEFAULT_MONITOR_MOUNTS
See Also:
Constant Field Values

ATTR_PAUSE_THRESHOLD

public static final java.lang.String ATTR_PAUSE_THRESHOLD
Space available level below which a crawl-pause should be triggered.

See Also:
Constant Field Values

DEFAULT_PAUSE_THRESHOLD

public static final int DEFAULT_PAUSE_THRESHOLD
See Also:
Constant Field Values

ATTR_RECHECK_THRESHOLD

public static final java.lang.String ATTR_RECHECK_THRESHOLD
Amount of content received between each recheck of free space

See Also:
Constant Field Values

DEFAULT_RECHECK_THRESHOLD

public static final int DEFAULT_RECHECK_THRESHOLD
See Also:
Constant Field Values

contentSinceCheck

protected int contentSinceCheck

VALID_DF_OUTPUT

public static final java.util.regex.Pattern VALID_DF_OUTPUT

AVAILABLE_EXTRACTOR

public static final java.util.regex.Pattern AVAILABLE_EXTRACTOR
Constructor Detail

LowDiskPauseProcessor

public LowDiskPauseProcessor(java.lang.String name)
Parameters:
name - Name of this writer.
Method Detail

innerProcess

protected void innerProcess(CrawlURI curi)
Notes a CrawlURI's content size in its running tally. If the recheck increment of content has passed through since the last available-space check, checks available space and pauses the crawl if any monitored mounts are below the configured threshold.

Overrides:
innerProcess in class Processor
Parameters:
curi - CrawlURI to process.


Copyright © 2003-2011 Internet Archive. All Rights Reserved.