A B C D E F G H I J K L M N O P Q R S T U V W X Z

A

A_ANNOTATIONS - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
shorthand string tokens indicating notable occurences, separated by commas
A_CONTENT_STATE_KEY - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
Key to use getting state of crawluri from the CrawlURI alist.
A_CONTENT_TYPE - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
Extracted MIME type of fetched content; should be set immediately by fetching module if possible (rather than waiting for a later analyzer)
A_DELAY_FACTOR - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
Multiplier of last fetch duration to wait before fetching another item of the same class (eg host)
A_DISTANCE_FROM_SEED - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_DNS_FETCH_TIME - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_DNS_SERVER_IP_LABEL - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_FETCH_BEGAN_TIME - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_FETCH_COMPLETED_TIME - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_FETCH_OVERDUE - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_HTML_BASE - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_HTTP_TRANSACTION - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_LAST_CONTENT_DIGEST - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
Designates a field in the CrawlURIs AList for the content digest of an earlier visit.
A_LAST_DATESTAMP - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_LAST_ETAG - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_LOCALIZED_ERRORS - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_META_ROBOTS - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_MINIMUM_DELAY - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
Minimum delay before fetching another item of th same class (eg host).
A_MIRROR_PATH - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
Define for org.archive.crawler.writer.MirrorWriterProcessor.
A_NUMBER_OF_VERSIONS - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_NUMBER_OF_VISITS - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_PREREQUISITE_URI - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_RETRY_DELAY - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_RRECORD_SET_LABEL - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_RUNTIME_EXCEPTION - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_TIME_OF_NEXT_PROCESSING - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_WAIT_INTERVAL - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_WAIT_REEVALUATED - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
aboutToLog() - Method in class org.archive.crawler.datamodel.CrawlURI
Notify CrawlURI it is about to be logged; opportunity for self-annotation
ABSOLUTE_OFFSET_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Record absolute offset into arc file.
AbstractFrontier - Class in org.archive.crawler.frontier
Shared facilities for Frontier implementations.
AbstractFrontier(String, String) - Constructor for class org.archive.crawler.frontier.AbstractFrontier
 
AbstractLongFPSet - Class in org.archive.util
Shell of functionality for a Set of primitive long fingerprints, held in an array of possibly-empty slots.
AbstractLongFPSet() - Constructor for class org.archive.util.AbstractLongFPSet
To support serialization TODO: verify needed?
AbstractLongFPSet(int, float) - Constructor for class org.archive.util.AbstractLongFPSet
Create a new AbstractLongFPSet with a given capacity and load Factor
AbstractTracker - Class in org.archive.crawler.framework
A partial implementation of the StatisticsTracking interface.
AbstractTracker(String, String) - Constructor for class org.archive.crawler.framework.AbstractTracker
 
ACCEPT - Static variable in class org.archive.crawler.deciderules.DecideRule
 
accept(File, String) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment.CaseInsensitiveFilenameFilter
 
ACCEPTABLE_ASCII_DOMAIN - Static variable in class org.archive.net.UURIFactory
Characters we'll accept in the domain label part of a URI authority: ASCII letters-digits-hyphen (LDH) plus underscore, with single intervening '.' characters.
ACCEPTABLE_FORCE_QUEUE - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
ACCEPTABLE_FORCE_QUEUE - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Acceptable characters in forced queue names.
AcceptDecideRule - Class in org.archive.crawler.deciderules
Rule which responds ACCEPT to anything passed in.
AcceptDecideRule(String) - Constructor for class org.archive.crawler.deciderules.AcceptDecideRule
 
accepts(Object) - Method in class org.archive.crawler.filter.FilePatternFilter
 
accepts(Object) - Method in class org.archive.crawler.framework.Filter
 
accepts(CrawlURI) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
acquireContinuePermission() - Method in class org.archive.crawler.framework.CrawlController
Proceed only if allowed, giving CrawlController a chance to enforce single-thread mode.
ACTION - Static variable in class org.archive.crawler.admin.ui.JobConfigureUtils
 
actions - Variable in class org.archive.crawler.extractor.CustomSWFTags
 
activeHosts() - Method in interface org.archive.crawler.framework.FrontierHostStatistics
Total number of hosts that are currently active.
activeThreadCount() - Method in class org.archive.crawler.admin.StatisticsTracker
 
activeThreadCount() - Method in interface org.archive.crawler.framework.StatisticsTracking
Get the number of active (non-paused) threads.
AdaptiveRevisitAttributeConstants - Interface in org.archive.crawler.frontier
Defines static constants for the Adaptive Revisiting module defining data keys in the CrawlURI AList.
AdaptiveRevisitFrontier - Class in org.archive.crawler.frontier
A Frontier that will repeatedly visit all encountered URIs.
AdaptiveRevisitFrontier(String) - Constructor for class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
AdaptiveRevisitFrontier(String, String) - Constructor for class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
AdaptiveRevisitHostQueue - Class in org.archive.crawler.frontier
A priority based queue of CrawlURIs.
AdaptiveRevisitHostQueue(String, Environment, StoredClassCatalog, int) - Constructor for class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Constructor
AdaptiveRevisitHostQueueTest - Class in org.archive.crawler.frontier
A JUnit test for AdaptiveRevisitHostQueue class.
AdaptiveRevisitHostQueueTest() - Constructor for class org.archive.crawler.frontier.AdaptiveRevisitHostQueueTest
 
AdaptiveRevisitQueueList - Class in org.archive.crawler.frontier
Maintains an ordered list of AdaptiveRevisitHostQueues used by a Frontier.
AdaptiveRevisitQueueList(Environment, StoredClassCatalog) - Constructor for class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
add(String, CandidateURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Add given uri, if not already present.
add(SinkHandlerLogRecord) - Method in interface org.archive.crawler.framework.AlertManager
 
add(CrawlURI, boolean) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Add a CrawlURI to this host queue.
add(int, Double) - Method in class org.archive.crawler.settings.DoubleList
Add a new Double at the specified index to this list.
add(int, double) - Method in class org.archive.crawler.settings.DoubleList
Add a new double at the specified index to this list.
add(Double) - Method in class org.archive.crawler.settings.DoubleList
Add a new Double at the end of this list.
add(double) - Method in class org.archive.crawler.settings.DoubleList
Add a new double at the end of this list.
add(int, Float) - Method in class org.archive.crawler.settings.FloatList
Add a new Float at the specified index to this list.
add(int, float) - Method in class org.archive.crawler.settings.FloatList
Add a new float at the specified index to this list.
add(Float) - Method in class org.archive.crawler.settings.FloatList
Add a new Float at the end of this list.
add(float) - Method in class org.archive.crawler.settings.FloatList
Add a new float at the end of this list.
add(int, Integer) - Method in class org.archive.crawler.settings.IntegerList
Add a new Integer at the specified index to this list.
add(int, int) - Method in class org.archive.crawler.settings.IntegerList
Add a new int at the specified index to this list.
add(Integer) - Method in class org.archive.crawler.settings.IntegerList
Add a new Integer at the end of this list.
add(int) - Method in class org.archive.crawler.settings.IntegerList
Add a new int at the end of this list.
add(Object) - Method in class org.archive.crawler.settings.ListType
Appends the specified element to the end of this list.
add(int, Object) - Method in class org.archive.crawler.settings.ListType
Inserts the specified element at the specified position in this list.
add(int, Long) - Method in class org.archive.crawler.settings.LongList
Add a new Long at the specified index to this list.
add(int, long) - Method in class org.archive.crawler.settings.LongList
Add a new long at the specified index to this list.
add(Long) - Method in class org.archive.crawler.settings.LongList
Add a new Long at the end of this list.
add(long) - Method in class org.archive.crawler.settings.LongList
Add a new long at the end of this list.
add(int, String) - Method in class org.archive.crawler.settings.StringList
Add a new String at the specified index to this list.
add(String) - Method in class org.archive.crawler.settings.StringList
Add a new String at the end of this list.
add(String, CandidateURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
add(String, CandidateURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
add(long) - Method in class org.archive.util.AbstractLongFPSet
Add the given value to this set
add(CharSequence) - Method in interface org.archive.util.BloomFilter
Adds a character sequence to the filter.
add(CharSequence) - Method in class org.archive.util.BloomFilter32bit
Adds a character sequence to the filter.
add(CharSequence) - Method in class org.archive.util.BloomFilter32bitSplit
Adds a character sequence to the filter.
add(CharSequence) - Method in class org.archive.util.BloomFilter32bp2
Adds a character sequence to the filter.
add(CharSequence) - Method in class org.archive.util.BloomFilter32bp2Split
Adds a character sequence to the filter.
add(CharSequence) - Method in class org.archive.util.BloomFilter64bit
Adds a character sequence to the filter.
add(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
add(long) - Method in interface org.archive.util.fingerprint.LongFPSet
Add a fingerprint to the set.
add(Iterator) - Method in class org.archive.util.iterator.CompositeIterator
Add an iterator to the internal chain.
add(Object) - Method in class org.archive.util.SurtPrefixSet
Maintains additional invariant: if one entry is a prefix of another, keep only the prefix.
addAlistPersistentMember(Object) - Static method in class org.archive.crawler.datamodel.CrawlURI
Add the key of alist items you want to persist across processings.
addAll(DoubleList) - Method in class org.archive.crawler.settings.DoubleList
Appends all of the elements in the specified list to the end of this list, in the order that they are returned by the specified lists's iterator.
addAll(Double[]) - Method in class org.archive.crawler.settings.DoubleList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(double[]) - Method in class org.archive.crawler.settings.DoubleList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(FloatList) - Method in class org.archive.crawler.settings.FloatList
Appends all of the elements in the specified list to the end of this list, in the order that they are returned by the specified lists's iterator.
addAll(Float[]) - Method in class org.archive.crawler.settings.FloatList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(float[]) - Method in class org.archive.crawler.settings.FloatList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(IntegerList) - Method in class org.archive.crawler.settings.IntegerList
Appends all of the elements in the specified list to the end of this list, in the order that they are returned by the specified lists's iterator.
addAll(Integer[]) - Method in class org.archive.crawler.settings.IntegerList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(int[]) - Method in class org.archive.crawler.settings.IntegerList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(ListType) - Method in class org.archive.crawler.settings.ListType
Appends all of the elements in the specified list to the end of this list, in the order that they are returned by the specified lists's iterator.
addAll(Collection) - Method in class org.archive.crawler.settings.ListType
 
addAll(int, Collection) - Method in class org.archive.crawler.settings.ListType
 
addAll(LongList) - Method in class org.archive.crawler.settings.LongList
Appends all of the elements in the specified list to the end of this list, in the order that they are returned by the specified lists's iterator.
addAll(Long[]) - Method in class org.archive.crawler.settings.LongList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(long[]) - Method in class org.archive.crawler.settings.LongList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(StringList) - Method in class org.archive.crawler.settings.StringList
Appends all of the elements in the specified list to the end of this list, in the order that they are returned by the specified lists's iterator.
addAll(String[]) - Method in class org.archive.crawler.settings.StringList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAnnotation(String) - Method in class org.archive.crawler.datamodel.CrawlURI
Add an annotation: an abbrieviated indication of something special about this URI that need not be present in every crawl.log line, but should be noted for future reference.
addAttribute(String) - Method in class org.archive.configuration.Configuration
 
addAttributeInfos(List<OpenMBeanAttributeInfo>) - Method in class org.archive.configuration.Configuration
Override, call super and then add Configurable attributes.
addAttributeInfos(List<OpenMBeanAttributeInfo>) - Method in class org.archive.configuration.registry.CrawlOrder.CrawlOrderConfiguration
 
addAttributeInfos(List<OpenMBeanAttributeInfo>) - Method in class org.archive.configuration.registry.CrawlOrderSubClass.CrawlOrderSubClassConfiguration
 
addBdbjeAttributes(List, List, List) - Method in class org.archive.crawler.admin.CrawlJob
 
addBdbjeOperations(List, List, List) - Method in class org.archive.crawler.admin.CrawlJob
 
addCap(byte[]) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Add a dummy 'cap' entry at the given insertion key.
addComplexType(ComplexType) - Method in class org.archive.crawler.settings.CrawlerSettings
 
addConstraint(Constraint) - Method in class org.archive.crawler.settings.MapType
 
addConstraint(Constraint) - Method in class org.archive.crawler.settings.Type
Add a constraint to this type.
addCrawlJob(String, String, String, String) - Method in class org.archive.crawler.Heritrix
This method is called when we have an order file to hand that we want to base a job on.
addCrawlJob(URL, HttpURLConnection, String, String, String) - Method in class org.archive.crawler.Heritrix
 
addCrawlJob(File, String, String, String) - Method in class org.archive.crawler.Heritrix
 
addCrawlJob(CrawlJob) - Method in class org.archive.crawler.Heritrix
 
addCrawlJobBasedOn(String, String, String, String) - Method in class org.archive.crawler.Heritrix
 
addCrawlJobBasedOn(File, String, String, String) - Method in class org.archive.crawler.Heritrix
 
addCrawlJobBasedonJar(File, String, String, String) - Method in class org.archive.crawler.Heritrix
Undo jar file and use as basis for a new job.
addCrawlOrderAttributes(ComplexType, List) - Method in class org.archive.crawler.admin.CrawlJob
 
addCrawlStatusListener(CrawlStatusListener) - Method in class org.archive.crawler.framework.CrawlController
Register for CrawlStatus events.
addCrawlURIDispositionListener(CrawlURIDispositionListener) - Method in class org.archive.crawler.framework.CrawlController
Register for CrawlURIDisposition events.
addCredentialAvatar(CredentialAvatar) - Method in class org.archive.crawler.datamodel.CrawlServer
Add an avatar.
addCredentialAvatar(CredentialAvatar) - Method in class org.archive.crawler.datamodel.CrawlURI
Add an avatar.
addCriteria(Criteria) - Method in class org.archive.crawler.settings.refinements.Refinement
Add a new criterion to this refinement.
addDecideRule(DecideRule) - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
added(CrawlURI) - Method in interface org.archive.crawler.frontier.FrontierJournal
 
added(CrawlURI) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
addedSeed(CandidateURI) - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
addedSeed(CandidateURI) - Method in interface org.archive.crawler.scope.SeedListener
 
addElement(CrawlerSettings, Type) - Method in class org.archive.crawler.settings.ComplexType
 
addElement(CrawlerSettings, Type) - Method in class org.archive.crawler.settings.MapType
Add a new element to this map.
addElement(CrawlerSettings, Type) - Method in class org.archive.crawler.settings.ModuleType
 
addElementToDefinition(Type) - Method in class org.archive.crawler.settings.ComplexType
Add a new attribute to the definition of this ComplexType.
addElementType(Type, int) - Method in class org.archive.crawler.settings.DataContainer
Add a new element to the data container.
addElementType(Type) - Method in class org.archive.crawler.settings.DataContainer
Appends the specified element to the end of this data container.
addFilter(CrawlerSettings, Filter) - Method in class org.archive.crawler.filter.OrFilter
 
addForce(String, CandidateURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Add given uri, all the way through to underlying destination, even if already present.
addForce(String, CandidateURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
addForce(String, CandidateURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
addHeaderLink(CrawlURI, Header) - Method in class org.archive.crawler.extractor.ExtractorHTTP
 
addImpliedHttpIfNecessary(String) - Static method in class org.archive.util.ArchiveUtils
Given a string that may be a plain host or host/path (without URI scheme), add an implied http:// if necessary.
addInProcessing(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Adds a CrawlURI to the list of CrawlURIs belonging to this HQ and are being processed at the moment.
additionalFocusAccepts(Object) - Method in class org.archive.crawler.scope.ClassicScope
Check if URI is accepted by the additional focus of this scope.
additionalFocusAccepts(Object) - Method in class org.archive.crawler.scope.DomainScope
 
additionalFocusAccepts(Object) - Method in class org.archive.crawler.scope.HostScope
 
additionalFocusAccepts(Object) - Method in class org.archive.crawler.scope.PathScope
 
additionalFocusAccepts(Object) - Method in class org.archive.crawler.scope.RefinedScope
 
additionalFocusFilter - Variable in class org.archive.crawler.scope.DomainScope
 
additionalFocusFilter - Variable in class org.archive.crawler.scope.HostScope
 
additionalFocusFilter - Variable in class org.archive.crawler.scope.PathScope
 
additionalFocusFilter - Variable in class org.archive.crawler.scope.RefinedScope
 
addJob(CrawlJob) - Method in class org.archive.crawler.admin.CrawlJobHandler
Submit a job to the handler.
addLocalizedError(String, Throwable, String) - Method in class org.archive.crawler.datamodel.CrawlURI
Make note of a non-fatal error, local to a particular Processor, which should be logged somewhere, but allows processing to continue.
addNewFp(long) - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
addNewFp(long) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Add an FP (which may be an old or new FP) to the new complete list.
addNewFp(long) - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
addNow(String, CandidateURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Immediately add uri.
addNow(String, CandidateURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
addNow(String, CandidateURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
addOperations(ArrayList<MBeanOperationInfo>) - Method in class org.archive.configuration.Configuration
 
addOrderToManifest() - Method in class org.archive.crawler.framework.CrawlController
Add order file contents to manifest.
addOutLink(Link) - Method in class org.archive.crawler.datamodel.CrawlURI
Add a discovered Link, unless it would exceed the max number to accept.
addProcessorMap(String, MapType) - Method in class org.archive.crawler.framework.ProcessorChainList
Add a new chain of processors to the chain list.
addProfile(CrawlJob) - Method in class org.archive.crawler.admin.CrawlJobHandler
Add a new profile
addProxyConnectionHeader(HttpState, HttpConnection) - Method in class org.archive.httpclient.HttpRecorderGetMethod
 
addProxyConnectionHeader(HttpState, HttpConnection) - Method in class org.archive.httpclient.HttpRecorderPostMethod
 
AddRedirectFromRootServerToScope - Class in org.archive.crawler.deciderules
 
AddRedirectFromRootServerToScope(String) - Constructor for class org.archive.crawler.deciderules.AddRedirectFromRootServerToScope
 
addRefinement(Refinement) - Method in class org.archive.crawler.settings.CrawlerSettings
Add a refinement to this settings object.
addResponseContent(HttpMethod, CrawlURI) - Method in class org.archive.crawler.fetcher.FetchHTTP
This method populates curi with response status and content type.
ADDRESS_BITS_PER_UNIT - Static variable in class org.archive.util.BloomFilter32bit
 
ADDRESS_BITS_PER_UNIT - Static variable in class org.archive.util.BloomFilter32bitSplit
 
ADDRESS_BITS_PER_UNIT - Static variable in class org.archive.util.BloomFilter32bp2
 
ADDRESS_BITS_PER_UNIT - Static variable in class org.archive.util.BloomFilter32bp2Split
 
ADDRESS_BITS_PER_UNIT - Static variable in class org.archive.util.BloomFilter64bit
 
addSeed(CandidateURI) - Method in class org.archive.crawler.framework.CrawlScope
Add a new seed to scope.
addSeed(CrawlURI) - Method in class org.archive.crawler.scope.SeedCachingScope
 
addSeedListener(SeedListener) - Method in class org.archive.crawler.framework.CrawlScope
 
addToManifest(String, char, boolean) - Method in class org.archive.crawler.framework.CrawlController
Add a file to the manifest of files used/generated by the current crawl.
addToPath(MirrorWriterProcessor.URIToFileReturn) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.DirSegment
 
addToPath(MirrorWriterProcessor.URIToFileReturn) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.EndSegment
 
addToPath(MirrorWriterProcessor.URIToFileReturn) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
Adds this segment to a file path.
addTopLevelModule(ModuleType) - Method in class org.archive.crawler.settings.CrawlerSettings
 
addURL(ArrayList, String) - Method in class org.archive.crawler.scope.DomainScopeTest
 
addVitals(ObjectName) - Static method in class org.archive.crawler.Heritrix
Add vital stats to passed in ObjectName.
addWebapp(String, String, boolean) - Method in class org.archive.crawler.SimpleHttpServer
Add a webapp.
AggressiveExtractorHTML - Class in org.archive.crawler.extractor
Extended version of ExtractorHTML with more aggressive javascript link extraction where javascript code is parsed first with general HTML tags regexp, and than by javascript speculative link regexp.
AggressiveExtractorHTML(String) - Constructor for class org.archive.crawler.extractor.AggressiveExtractorHTML
 
AlertManager - Interface in org.archive.crawler.framework
Manager for application alerts.
alignedOnFirstRecord - Variable in class org.archive.io.arc.ARCReader
Set to true if we are aligned on first record on creation of ARCReader.
ALL - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
ALL - Static variable in class org.archive.crawler.filter.FilePatternFilter
 
ALL_DEFAULT_PATTERNS - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
ALL_DEFAULT_PATTERNS - Static variable in class org.archive.crawler.filter.FilePatternFilter
 
ALL_NONEMPTY - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
ALL_QUEUES - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
allFps - Variable in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
ALLOWALL - Static variable in class org.archive.crawler.datamodel.RobotsExclusionPolicy
 
ALLOWED_TYPES - Static variable in class org.archive.crawler.deciderules.ConfiguredDecideRule
 
allQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
All known queues.
AllSelfTestCases - Class in org.archive.crawler.selftest
All registered heritrix selftests.
AllSelfTestCases() - Constructor for class org.archive.crawler.selftest.AllSelfTestCases
 
alreadyIncluded - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
those UURIs which are already in-process (or processed), and thus should not be rescheduled
alreadySeen - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
AMP - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
AMP - Static variable in class org.archive.crawler.extractor.ExtractorJS
 
AMP - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
AMP - Static variable in class org.archive.extractor.RegexpJSLinkExtractor
 
AntiCalendarCostAssignmentPolicy - Class in org.archive.crawler.frontier
CostAssignmentPolicy that further penalizes URIs with calendar-suggestive strings in them, with an extra unit of cost.
AntiCalendarCostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
 
APOSTROPH - Static variable in class org.archive.net.UURIFactory
 
append(String) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
Appends one lump to the end of this string.
append(File, String) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.URIToFileReturn
Appends one more segment to this path.
append(String) - Method in class org.archive.util.PaddingStringBuffer
append a string directly to the buffer
append(int) - Method in class org.archive.util.PaddingStringBuffer
append an int to the buffer.
append(long) - Method in class org.archive.util.PaddingStringBuffer
append a long to the buffer.
append(StringBuffer, CharSequence, int, int) - Static method in class org.archive.util.PreJ15Utils
Version of 1.5's StringBuffer.append(CharSequence s, int start, int finish)
appendQueueReports(PrintWriter, Iterator, int, int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Append queue report to general Frontier report.
APPLET - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
APPLET - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
applySpecialHandling(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Perform any special handling of the CrawlURI, such as promoting its URI to seed-status, or preferencing it because it is an embed.
arc - Variable in class org.archive.io.arc.ARCReader
Descriptive string for the ARC we're going against (full path, url, etc.).
ARC_FILE_EXTENSION - Static variable in interface org.archive.io.arc.ARCConstants
ARC file extention.
ARC_GZIP_EXTRA_FIELD - Static variable in interface org.archive.io.arc.ARCConstants
The FLG.FEXTRA field that is added to ARC files.
ARC_MAGIC_NUMBER - Static variable in interface org.archive.io.arc.ARCConstants
ARC file *MAGIC NUMBER*.
ARCConstants - Interface in org.archive.io.arc
Constants used by ARC files and in ARC file processing.
ArchiveUtils - Class in org.archive.util
Miscellaneous useful methods.
ArchiveUtils() - Constructor for class org.archive.util.ArchiveUtils
 
ArchiveUtilsTest - Class in org.archive.util
JUnit test suite for ArchiveUtils
ArchiveUtilsTest(String) - Constructor for class org.archive.util.ArchiveUtilsTest
Create a new ArchiveUtilsTest object
ARCLocation - Interface in org.archive.io.arc
Datastructure to hold ARC record location.
ARCReader - Class in org.archive.io.arc
Get an iterator on an arc file or get a record by absolute position.
ARCReader() - Constructor for class org.archive.io.arc.ARCReader
 
ARCReader.ARCRecordIterator - Class in org.archive.io.arc
Inner ARCRecord Iterator class.
ARCReader.ARCRecordIterator() - Constructor for class org.archive.io.arc.ARCReader.ARCRecordIterator
 
ARCReader.RecoverableIOException - Exception in org.archive.io.arc
A decorator on IOException that indicates IOEs that are not fatal.
ARCReader.RecoverableIOException(String) - Constructor for exception org.archive.io.arc.ARCReader.RecoverableIOException
 
ARCReader.RecoverableIOException(IOException) - Constructor for exception org.archive.io.arc.ARCReader.RecoverableIOException
 
ARCReaderFactory - Class in org.archive.io.arc
Factory that returns an ARCReader.
ARCReaderFactoryTest - Class in org.archive.io.arc
 
ARCReaderFactoryTest() - Constructor for class org.archive.io.arc.ARCReaderFactoryTest
 
ARCRecord - Class in org.archive.io.arc
An ARC file record.
ARCRecord(InputStream, ARCRecordMetaData) - Constructor for class org.archive.io.arc.ARCRecord
Constructor.
ARCRecord(InputStream, ARCRecordMetaData, int, boolean, boolean, boolean) - Constructor for class org.archive.io.arc.ARCRecord
Constructor.
ARCRecordMetaData - Class in org.archive.io.arc
An immutable class to hold an ARC record meta data.
ARCRecordMetaData() - Constructor for class org.archive.io.arc.ARCRecordMetaData
Shut down the default constructor.
ARCRecordMetaData(String, Map) - Constructor for class org.archive.io.arc.ARCRecordMetaData
Constructor.
ARCUtils - Class in org.archive.io.arc
 
ARCUtils() - Constructor for class org.archive.io.arc.ARCUtils
 
ARCUtilsTest - Class in org.archive.io.arc
 
ARCUtilsTest() - Constructor for class org.archive.io.arc.ARCUtilsTest
 
ARCWriter - Class in org.archive.io.arc
Write ARC files.
ARCWriter(PrintStream, File, boolean, List, String) - Constructor for class org.archive.io.arc.ARCWriter
Constructor.
ARCWriter(List, String, boolean, int) - Constructor for class org.archive.io.arc.ARCWriter
Constructor.
ARCWriter(List, String, String, boolean, int, List) - Constructor for class org.archive.io.arc.ARCWriter
Constructor.
ARCWriter(ARCWriterSettings) - Constructor for class org.archive.io.arc.ARCWriter
Constructor.
ARCWriter.ARCWriterSettingsImpl - Class in org.archive.io.arc
Class to hold ARCWriter settings.
ARCWriter.ARCWriterSettingsImpl(boolean, List) - Constructor for class org.archive.io.arc.ARCWriter.ARCWriterSettingsImpl
 
ARCWriter.ARCWriterSettingsImpl(List, int, String, boolean, String, List) - Constructor for class org.archive.io.arc.ARCWriter.ARCWriterSettingsImpl
 
ARCWriterPool - Class in org.archive.io.arc
A pool of ARCWriters.
ARCWriterPool(ARCWriterSettings, int, int) - Constructor for class org.archive.io.arc.ARCWriterPool
Constructor
ARCWriterPoolTest - Class in org.archive.io.arc
Test ARCWriterPool
ARCWriterPoolTest() - Constructor for class org.archive.io.arc.ARCWriterPoolTest
 
ARCWriterProcessor - Class in org.archive.crawler.writer
Processor module for writing the results of successful fetches (and perhaps someday, certain kinds of network failures) to the Internet Archive ARC file format.
ARCWriterProcessor(String) - Constructor for class org.archive.crawler.writer.ARCWriterProcessor
 
ARCWriterSettings - Interface in org.archive.io.arc
Settings object for ARCWriters.
ARCWriterTest - Class in org.archive.io.arc
Test ARCWriter class.
ARCWriterTest() - Constructor for class org.archive.io.arc.ARCWriterTest
 
ARRAY_ATTRIBUTE_NAME - Static variable in class org.archive.configuration.registry.TestProcessor
 
ArrayLongFPCache - Class in org.archive.util.fingerprint
Simple long fingerprint cache using a backing array; any long maps to one of 'smear' slots.
ArrayLongFPCache() - Constructor for class org.archive.util.fingerprint.ArrayLongFPCache
 
ArrayLongFPCacheTest - Class in org.archive.util.fingerprint
Unit tests for ArrayLongFPCache.
ArrayLongFPCacheTest() - Constructor for class org.archive.util.fingerprint.ArrayLongFPCacheTest
 
asCrawlUri(CandidateURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
asCrawlUri(CandidateURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
asStringBuffer() - Method in class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
Returns the string as a StringBuffer.
atFinish() - Method in class org.archive.crawler.framework.CrawlController
Evaluate if the crawl should stop because it is finished, without actually stopping the crawl.
ATT_CACHE_PERCENT - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_CACHE_SIZE - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_ENV_HOME - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_IS_READ_ONLY - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_IS_SERIALIZABLE - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_IS_TRANSACTIONAL - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_LOCK_TIMEOUT - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_OPEN - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_SET_READ_ONLY - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_SET_SERIALIZABLE - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_SET_TRANSACTIONAL - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_TXN_TIMEOUT - Static variable in class org.archive.util.JEMBeanHelper
 
attach(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
Attach this credentials avatar to the passed curi .
attach(CrawlURI, String) - Method in class org.archive.crawler.datamodel.credential.Credential
Attach this credentials avatar to the passed curi .
ATTR_ACCEPT_HEADERS - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_ADDITIONAL_FOCUS_FILTER - Static variable in class org.archive.crawler.scope.DomainScope
 
ATTR_ADDITIONAL_FOCUS_FILTER - Static variable in class org.archive.crawler.scope.HostScope
 
ATTR_ADDITIONAL_FOCUS_FILTER - Static variable in class org.archive.crawler.scope.PathScope
 
ATTR_ADDITIONAL_FOCUS_FILTER - Static variable in class org.archive.crawler.scope.RefinedScope
 
ATTR_ALLOW_BY_REGEXP - Static variable in class org.archive.crawler.prefetch.Preselector
indicator allowing all matching URIs
ATTR_ALSO_CHECK_VIA - Static variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Whether the 'via' of CrawlURIs should also be checked to see if it is prefixed by the set of SURT prefixes
ATTR_AVAILABLE_MODES - Static variable in class org.archive.crawler.frontier.DomainSensitiveFrontier
 
ATTR_BALANCE_REPLENISH_AMOUNT - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
amount to replenish budget on each activation (duty cycle)
ATTR_BDB_CACHE_PERCENT - Static variable in class org.archive.crawler.datamodel.CrawlOrder
Percentage of heap to allocate to bdb cache
ATTR_BDB_COOKIES - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_BLOCK_ALL - Static variable in class org.archive.crawler.prefetch.Preselector
indicator allowing all URIs (of a given host, typically) to be blocked at this step
ATTR_BLOCK_BY_REGEXP - Static variable in class org.archive.crawler.prefetch.Preselector
indicator allowing all matching URIs to be blocked at this step
ATTR_CALCULATE_ROBOTS_ONLY - Static variable in class org.archive.crawler.prefetch.PreconditionEnforcer
 
ATTR_CASE_SENSITIVE - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for case sensitive option.
ATTR_CHANGED_FACTOR - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
Factor decrease on wait when changed
ATTR_CHAR_MAP - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for character map.
ATTR_CHECK_OUTLINKS - Static variable in class org.archive.crawler.processor.CrawlMapper
whether to map CrawlURI's outlinks (if CandidateURIs)
ATTR_CHECK_URI - Static variable in class org.archive.crawler.processor.CrawlMapper
whether to map CrawlURI itself (if status nonpositive)
ATTR_CHECKPOINT_COPY_BDBJE_LOGS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
When checkpointing, copy the bdb logs.
ATTR_CHECKPOINTS_PATH - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_COMPRESS - Static variable in class org.archive.crawler.writer.ARCWriterProcessor
Key to use asking settings for compression value.
ATTR_CONTENT_REGEXPR - Static variable in class org.archive.crawler.postprocessor.ContentBasedWaitEvaluator
The regular expression that we limit this evaluator to.
ATTR_CONTENT_TYPE_MAP - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for content type map.
ATTR_COST_POLICY - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
cost assignment policy to use (by class name)
ATTR_COUNTER_MODE - Static variable in class org.archive.crawler.frontier.DomainSensitiveFrontier
 
ATTR_COUNTRY_CODE - Static variable in class org.archive.crawler.deciderules.ExternalGeoLocationDecideRule
 
ATTR_CREDENTIALS - Static variable in class org.archive.crawler.datamodel.CredentialStore
Name of the contained credentials map type.
ATTR_CUSTOM_ROBOTS - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
ATTR_DECIDE_RULES - Static variable in class org.archive.crawler.deciderules.DecidingFilter
 
ATTR_DECIDE_RULES - Static variable in class org.archive.crawler.deciderules.DecidingScope
 
ATTR_DECISION - Static variable in class org.archive.crawler.deciderules.ConfiguredDecideRule
 
ATTR_DEFAULT_ENCODING - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_DEFAULT_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
Fixed wait time for 'unknown' change status.
ATTR_DELAY_FACTOR - Static variable in class org.archive.crawler.frontier.AbstractFrontier
how many multiples of last fetch elapsed time to wait before recontacting same server
ATTR_DELAY_FACTOR - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
How many multiples of last fetch elapsed time to wait before recontacting same server
ATTR_DIRECTORY_FILE - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for directory file.
ATTR_DISK_PATH - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_DIVERSION_DIR - Static variable in class org.archive.crawler.processor.CrawlMapper
where to log diversions
ATTR_DOT_BEGIN - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for dot begin replacement.
ATTR_DOT_END - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for dot end replacement.
ATTR_ENABLED - Static variable in class org.archive.configuration.Configuration
This base Configuration adds the Enabled attribute, expert list and overrideable list.
ATTR_ENABLED - Static variable in class org.archive.crawler.framework.Filter
 
ATTR_ENABLED - Static variable in class org.archive.crawler.framework.Processor
Key to use asking settings for enabled value.
ATTR_ENABLED - Static variable in class org.archive.crawler.url.canonicalize.BaseRule
 
ATTR_ERROR_PENALTY_AMOUNT - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
whether to hold queues INACTIVE until needed for throughput
ATTR_EXCLUDE_FILTER - Static variable in class org.archive.crawler.scope.ClassicScope
 
ATTR_EXPERT - Static variable in class org.archive.configuration.Configuration
 
ATTR_EXTRACT_PROCESSORS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_FETCH_PROCESSORS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_FILTERS - Static variable in class org.archive.crawler.filter.OrFilter
 
ATTR_FILTERS - Static variable in class org.archive.crawler.framework.Processor
Key to use asking settings for filters value.
ATTR_FORCE_ACCEPT_FILTER - Static variable in class org.archive.crawler.scope.ClassicScope
 
ATTR_FORCE_QUEUE - Static variable in class org.archive.crawler.frontier.AbstractFrontier
queue assignment to force onto CrawlURIs; intended to be overridden
ATTR_FORCE_QUEUE - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Queue assignment to force on CrawlURIs.
ATTR_FROM - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_GROUP_MAX_FETCH_SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
group max successful fetches
ATTR_GROUP_MAX_SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
group max successful fetch bytes
ATTR_HOLD_QUEUES - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
whether to hold queues INACTIVE until needed for throughput
ATTR_HOST_DIRECTORY - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for host directory option.
ATTR_HOST_MAP - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for host map.
ATTR_HOST_MAX_FETCH_SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
host max successful fetches
ATTR_HOST_MAX_SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
host max successful fetch bytes
ATTR_HOST_VALENCE - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Maximum simultaneous requests in process to a host (queue)
ATTR_HTTP_HEADERS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_HTTP_PROXY_HOST - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_HTTP_PROXY_PORT - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_IGNORE_COOKIES - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_IGNORE_FORM_ACTION_URLS - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
ATTR_IMPLEMENTATION - Static variable in class org.archive.crawler.deciderules.ExternalGeoLocationDecideRule
 
ATTR_IMPLEMENTATION - Static variable in class org.archive.crawler.deciderules.ExternalImplDecideRule
 
ATTR_INCLUDED - Static variable in class org.archive.crawler.frontier.BdbFrontier
URI-already-included to use (by class name)
ATTR_INITIAL_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
Default wait time after initial visit.
ATTR_IP_VALIDITY_DURATION - Static variable in class org.archive.crawler.prefetch.PreconditionEnforcer
seconds to keep IP information for
ATTR_LINK_FILTERS - Static variable in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
ATTR_LIST_LOGIC - Static variable in class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
 
ATTR_LIST_LOGIC - Static variable in class org.archive.crawler.filter.URIListRegExpFilter
 
ATTR_LOAD_COOKIES - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_LOCAL_NAME - Static variable in class org.archive.crawler.processor.CrawlMapper
name of local crawler (URIs mapped to here are not diverted)
ATTR_LOG_REJECT_FILTERS - Static variable in class org.archive.crawler.postprocessor.LinksScoper
 
ATTR_LOGGERS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_LOGS_PATH - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_MAP_SOURCE - Static variable in class org.archive.crawler.processor.CrawlMapper
where to load map from
ATTR_MASQUERADE - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
ATTR_MATCH_RETURN_VALUE - Static variable in class org.archive.crawler.filter.OrFilter
 
ATTR_MATCH_RETURN_VALUE - Static variable in class org.archive.crawler.filter.PathDepthFilter
 
ATTR_MATCH_RETURN_VALUE - Static variable in class org.archive.crawler.filter.SurtPrefixFilter
 
ATTR_MATCH_RETURN_VALUE - Static variable in class org.archive.crawler.filter.URIListRegExpFilter
 
ATTR_MATCH_RETURN_VALUE - Static variable in class org.archive.crawler.filter.URIRegExpFilter
 
ATTR_MAX_BYTES_DOWNLOAD - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_MAX_BYTES_WRITTEN - Static variable in class org.archive.crawler.writer.ARCWriterProcessor
Key for the maximum ARC bytes to write attribute.
ATTR_MAX_DELAY - Static variable in class org.archive.crawler.frontier.AbstractFrontier
never wait more than this long, regardless of multiple
ATTR_MAX_DELAY - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Never wait more than this long, regardless of multiple
ATTR_MAX_DOCS - Static variable in class org.archive.crawler.frontier.DomainSensitiveFrontier
 
ATTR_MAX_DOCUMENT_DOWNLOAD - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_MAX_HOST_BANDWIDTH_USAGE - Static variable in class org.archive.crawler.frontier.AbstractFrontier
maximum per-host bandwidth usage
ATTR_MAX_LENGTH_BYTES - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_MAX_LINK_HOPS - Static variable in class org.archive.crawler.scope.ClassicScope
 
ATTR_MAX_OVERALL_BANDWIDTH_USAGE - Static variable in class org.archive.crawler.frontier.AbstractFrontier
maximum overall bandwidth usage
ATTR_MAX_PATH_DEPTH - Static variable in class org.archive.crawler.deciderules.TooManyPathSegmentsDecideRule
 
ATTR_MAX_PATH_DEPTH - Static variable in class org.archive.crawler.filter.PathDepthFilter
 
ATTR_MAX_PATH_LEN - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for maximum file system path length.
ATTR_MAX_RETRIES - Static variable in class org.archive.crawler.frontier.AbstractFrontier
maximum times to emit a CrawlURI without final disposition
ATTR_MAX_RETRIES - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Maximum times to emit a CrawlURI without final disposition
ATTR_MAX_SEG_LEN - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for maximum file system path segment length.
ATTR_MAX_SIZE_BYTES - Static variable in class org.archive.crawler.extractor.HTTPContentDigest
Maximum file size for - longer files will be ignored.
ATTR_MAX_SIZE_BYTES - Static variable in class org.archive.crawler.writer.ARCWriterProcessor
Key to use asking settings for max size value.
ATTR_MAX_TIME_SEC - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_MAX_TOE_THREADS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_MAX_TRANS_HOPS - Static variable in class org.archive.crawler.scope.ClassicScope
 
ATTR_MAX_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
Maximum wait between visits
ATTR_MIN_DELAY - Static variable in class org.archive.crawler.frontier.AbstractFrontier
always wait this long after one completion before recontacting same server, regardless of multiple
ATTR_MIN_DELAY - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Always wait this long after one completion before recontacting same server, regardless of multiple
ATTR_MIN_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
Minimum wait between visits
ATTR_MONITOR_MOUNTS - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
List of mounts to monitor; should match "Mounted on" column of 'df' output
ATTR_NAME - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_NAME - Static variable in class org.archive.crawler.datamodel.CredentialStore
 
ATTR_NAME - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
ATTR_NAME - Static variable in class org.archive.crawler.framework.CrawlScope
 
ATTR_NAME - Static variable in interface org.archive.crawler.framework.Frontier
All URI Frontiers should have the same 'name' attribute.
ATTR_NO_OVERRIDE - Static variable in class org.archive.configuration.Configuration
 
ATTR_OVERRIDE_LOGGER_ENABLED - Static variable in class org.archive.crawler.framework.Scoper
Protected so avaiilable to subclasses.
ATTR_PATH - Static variable in class org.archive.crawler.writer.ARCWriterProcessor
Key to use asking settings for arc path value.
ATTR_PATH - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for base directory path value.
ATTR_PAUSE_AT_FINISH - Static variable in class org.archive.crawler.frontier.AbstractFrontier
whether pause, rather than finish, when crawl appears done
ATTR_PAUSE_AT_START - Static variable in class org.archive.crawler.frontier.AbstractFrontier
whether to pause at crawl start
ATTR_PAUSE_THRESHOLD - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Space available level below which a crawl-pause should be triggered.
ATTR_POOL_MAX_ACTIVE - Static variable in class org.archive.crawler.writer.ARCWriterProcessor
Key to get maximum pool size.
ATTR_POOL_MAX_WAIT - Static variable in class org.archive.crawler.writer.ARCWriterProcessor
Key to get maximum wait on pool object before we give up and throw IOException.
ATTR_PORT_DIRECTORY - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for port directory option.
ATTR_POST_PROCESSORS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_PRE_FETCH_PROCESSORS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_PREFERENCE_EMBED_HOPS - Static variable in class org.archive.crawler.frontier.AbstractFrontier
number of hops of embeds (ERX) to bump to front of host queue
ATTR_PREFERENCE_EMBED_HOPS - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Number of hops of embeds (ERX) to bump to front of host queue
ATTR_PREFIX - Static variable in class org.archive.crawler.writer.ARCWriterProcessor
Key to use asking settings for prefix value.
ATTR_QUEUE_ASSIGNMENT_POLICY - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
ATTR_QUEUE_IGNORE_WWW - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Should the queue assignment ignore www in hostnames, effectively stripping them away.
ATTR_QUEUE_TOTAL_BUDGET - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
total expenditure to allow a queue before 'retiring' it
ATTR_REBUILD_ON_RECONFIG - Static variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Whether every config change should trigger a rebuilding of the prefix set.
ATTR_RECHECK_SCOPE - Static variable in class org.archive.crawler.prefetch.Preselector
whether to reapply crawl scope at this step
ATTR_RECHECK_THRESHOLD - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Amount of content received between each recheck of free space
ATTR_RECORDER_IN_BUFFER - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_RECORDER_OUT_BUFFER - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_RECOVER_PATH - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_RECOVER_RETAIN_FAILURES - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_RECOVERY_ENABLED - Static variable in class org.archive.crawler.frontier.AbstractFrontier
Recover log on or off attribute.
ATTR_REGEXP - Static variable in class org.archive.crawler.deciderules.HopsPathMatchesRegExpDecideRule
 
ATTR_REGEXP - Static variable in class org.archive.crawler.deciderules.MatchesRegExpDecideRule
 
ATTR_REGEXP - Static variable in class org.archive.crawler.filter.URIRegExpFilter
 
ATTR_REGEXP_LIST - Static variable in class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
 
ATTR_REGEXP_LIST - Static variable in class org.archive.crawler.filter.URIListRegExpFilter
 
ATTR_REGULAR_EXPRESSION - Static variable in class org.archive.crawler.processor.Test
 
ATTR_REPETITIONS - Static variable in class org.archive.crawler.deciderules.PathologicalPathDecideRule
 
ATTR_REPETITIONS - Static variable in class org.archive.crawler.filter.PathologicalPathFilter
 
ATTR_REREAD_SEEDS_ON_CONFIG - Static variable in class org.archive.crawler.framework.CrawlScope
Whether every configu change should trigger a rereading of the original seeds spec/file.
ATTR_RETRY_DELAY - Static variable in class org.archive.crawler.frontier.AbstractFrontier
for retryable problems, seconds to wait before a retry
ATTR_RETRY_DELAY - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
For retryable problems, seconds to wait before a retry
ATTR_ROBOTS_VALIDITY_DURATION - Static variable in class org.archive.crawler.prefetch.PreconditionEnforcer
seconds to cache robots info
ATTR_ROTATION_DIGITS - Static variable in class org.archive.crawler.processor.CrawlMapper
where to log diversions
ATTR_RULES - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_RULES - Static variable in class org.archive.crawler.deciderules.DecideRuleSequence
 
ATTR_SAVE_COOKIES - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_SCOPE - Static variable in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
 
ATTR_SCOPE_EMBEDDED_LINKS - Static variable in class org.archive.crawler.postprocessor.LinksScoper
 
ATTR_SCRATCH_PATH - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_SEEDS - Static variable in class org.archive.crawler.framework.CrawlScope
 
ATTR_SEEDS_AS_SURT_PREFIXES - Static variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
ATTR_SEEDS_AS_SURT_PREFIXES - Static variable in class org.archive.crawler.scope.SurtPrefixScope
 
ATTR_SEND_CONNECTION_CLOSE - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_SEND_RANGE - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_SEND_REFERER - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_SERVER_MAX_FETCH_SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
server max successful fetches
ATTR_SERVER_MAX_SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
server max successful fetch bytes
ATTR_SETTINGS_DIRECTORY - Static variable in class org.archive.configuration.registry.CrawlOrder
 
ATTR_SETTINGS_DIRECTORY - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_SHA1_CONTENT - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_SNOOZE_DEACTIVATE_MS - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
When a snooze target for a queue is longer than this amount, and there are already ready queues, deactivate rather than snooze the current queue -- so other more responsive sites get a chance in active rotation.
ATTR_SOTIMEOUT_MS - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_STATE_PATH - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_STATS_INTERVAL - Static variable in class org.archive.crawler.framework.AbstractTracker
Attribute name for logging interval in seconds setting
ATTR_STRIP_REG_EXPR - Static variable in class org.archive.crawler.extractor.HTTPContentDigest
A regular expression detailing elements to strip before making digest
ATTR_SUFFIX - Static variable in class org.archive.crawler.writer.ARCWriterProcessor
Key to use asking settings for suffix value.
ATTR_SUFFIX_AT_END - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for suffix at end option.
ATTR_SURTS_DUMP_FILE - Static variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
ATTR_SURTS_DUMP_FILE - Static variable in class org.archive.crawler.scope.SurtPrefixScope
 
ATTR_SURTS_SOURCE_FILE - Static variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
ATTR_SURTS_SOURCE_FILE - Static variable in class org.archive.crawler.filter.SurtPrefixFilter
 
ATTR_SURTS_SOURCE_FILE - Static variable in class org.archive.crawler.scope.SurtPrefixScope
 
ATTR_TIMEOUT_SECONDS - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_TOO_LONG_DIRECTORY - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for too-long directory.
ATTR_TRANSITIVE_FILTER - Static variable in class org.archive.crawler.scope.DomainScope
 
ATTR_TRANSITIVE_FILTER - Static variable in class org.archive.crawler.scope.HostScope
 
ATTR_TRANSITIVE_FILTER - Static variable in class org.archive.crawler.scope.PathScope
 
ATTR_TRANSITIVE_FILTER - Static variable in class org.archive.crawler.scope.RefinedScope
 
ATTR_TREAT_FRAMES_AS_EMBED_LINKS - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
ATTR_TRUST - Static variable in class org.archive.crawler.fetcher.FetchHTTP
SSL trust level setting attribute name.
ATTR_TYPE - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
ATTR_UNCHANGED_FACTOR - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
Factor increase on wait when unchanged
ATTR_UNDERSCORE_SET - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for underscore set.
ATTR_USE_DEFAULT - Static variable in class org.archive.crawler.filter.FilePatternFilter
 
ATTR_USE_OVERDUE_TIME - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
Indicates if the amount of time the URI was overdue should be added to the wait time before the new wait time is calculated.
ATTR_USE_PRESET - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
ATTR_USE_URI_UNIQ_FILTER - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Should the Frontier use a seperate 'already included' datastructure or rely on the queues'.
ATTR_USER_AGENT - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_USER_AGENTS - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
ATTR_WRITE_PROCESSORS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
AUDIO - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
AUDIO - Static variable in class org.archive.crawler.filter.FilePatternFilter
 
AUDIO_PATTERNS - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
AUDIO_PATTERNS - Static variable in class org.archive.crawler.filter.FilePatternFilter
 
AuthSelfTest - Class in org.archive.crawler.selftest
Test authentications, both basic/digest auth and html form logins.
AuthSelfTest() - Constructor for class org.archive.crawler.selftest.AuthSelfTest
 
auxiliaryDirectoryStack - Variable in class org.archive.io.ObjectPlusFilesInputStream
 
auxiliaryDirectoryStack - Variable in class org.archive.io.ObjectPlusFilesOutputStream
 
avail - Variable in class org.archive.io.RecyclingFastBufferedOutputStream
The number of buffer bytes available starting from RecyclingFastBufferedOutputStream.pos.
available() - Method in class org.archive.io.arc.ARCRecord
This available is not the stream's available.
available() - Method in class org.archive.io.RandomAccessInputStream
 
AVAILABLE_COST_POLICIES - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
all policies available to be chosen
AVAILABLE_EXTRACTOR - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
 
averageDepth - Variable in class org.archive.crawler.admin.StatisticsTracker
 
averageDepth() - Method in class org.archive.crawler.admin.StatisticsTracker
Average depth of the last URI in all eligible queues.
averageDepth() - Method in interface org.archive.crawler.framework.Frontier
 
averageDepth() - Method in interface org.archive.crawler.framework.StatisticsTracking
 
averageDepth() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
averageDepth() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 

B

BackgroundImageExtractionSelfTestCase - Class in org.archive.crawler.selftest
Test the crawler can find background images in pages.
BackgroundImageExtractionSelfTestCase() - Constructor for class org.archive.crawler.selftest.BackgroundImageExtractionSelfTestCase
 
BACKSLASH - Static variable in class org.archive.net.UURIFactory
 
BACKSLASH_PATTERN - Static variable in class org.archive.net.UURIFactory
 
BadURIsStopPageParsingSelfTest - Class in org.archive.crawler.selftest
Selftest for figuring problems parsing URIs in a page.
BadURIsStopPageParsingSelfTest() - Constructor for class org.archive.crawler.selftest.BadURIsStopPageParsingSelfTest
 
BASE - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
base - Variable in class org.archive.extractor.CharSequenceLinkExtractor
 
BASE - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
Base32 - Class in org.archive.util
Base32 - encodes and decodes RFC3548 Base32 (see http://www.faqs.org/rfcs/rfc3548.html ) Imported public-domain code of Bitzi.
Base32() - Constructor for class org.archive.util.Base32
 
BASE_DOMAIN - Static variable in class org.archive.configuration.registry.JmxRegistryTest
 
BaseRule - Class in org.archive.crawler.url.canonicalize
Base of all rules applied canonicalizing a URL that are configurable via the Heritrix settings system.
BaseRule(String, String) - Constructor for class org.archive.crawler.url.canonicalize.BaseRule
Constructor.
batchFlush() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
batchSchedule(CandidateURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
BdbFrontier - Class in org.archive.crawler.frontier
A Frontier using several BerkeleyDB JE Databases to hold its record of known hosts (queues), and pending URIs.
BdbFrontier(String) - Constructor for class org.archive.crawler.frontier.BdbFrontier
Constructor.
BdbFrontier(String, String) - Constructor for class org.archive.crawler.frontier.BdbFrontier
Create the BdbFrontier
BdbMultipleWorkQueues - Class in org.archive.crawler.frontier
A BerkeleyDB-database-backed structure for holding ordered groupings of CrawlURIs.
BdbMultipleWorkQueues(Environment, StoredClassCatalog, boolean) - Constructor for class org.archive.crawler.frontier.BdbMultipleWorkQueues
Create the multi queue in the given environment.
BdbMultipleWorkQueues.BdbFrontierMarker - Class in org.archive.crawler.frontier
Marker for remembering a position within the BdbMultipleWorkQueues.
BdbMultipleWorkQueues.BdbFrontierMarker(DatabaseEntry, String) - Constructor for class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
Create a marker pointed at the given start location.
BdbMultipleWorkQueuesTest - Class in org.archive.crawler.frontier
Unit tests for BdbMultipleWorkQueues functionality.
BdbMultipleWorkQueuesTest() - Constructor for class org.archive.crawler.frontier.BdbMultipleWorkQueuesTest
 
BdbUriUniqFilter - Class in org.archive.crawler.util
A BDB implementation of an AlreadySeen list.
BdbUriUniqFilter() - Constructor for class org.archive.crawler.util.BdbUriUniqFilter
Shutdown default constructor.
BdbUriUniqFilter(Environment) - Constructor for class org.archive.crawler.util.BdbUriUniqFilter
Constructor.
BdbUriUniqFilter(File) - Constructor for class org.archive.crawler.util.BdbUriUniqFilter
Constructor.
BdbUriUniqFilter(File, int) - Constructor for class org.archive.crawler.util.BdbUriUniqFilter
Constructor.
BdbUriUniqFilterTest - Class in org.archive.crawler.util
Test BdbUriUniqFilter.
BdbUriUniqFilterTest() - Constructor for class org.archive.crawler.util.BdbUriUniqFilterTest
 
BdbWorkQueue - Class in org.archive.crawler.frontier
One independent queue of items with the same 'classKey' (eg host).
BdbWorkQueue(String, BdbFrontier) - Constructor for class org.archive.crawler.frontier.BdbWorkQueue
Create a virtual queue inside the given BdbMultipleWorkQueues
BEGIN_TRANSFORMED_AUTHORITY - Static variable in class org.archive.util.SURT
 
beginCrawlStop() - Method in class org.archive.crawler.framework.CrawlController
Start the process of stopping the crawl.
beginFpMerge() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
beginFpMerge() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Begin merging pending candidates with complete list.
beginFpMerge() - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
BenchmarkBlooms - Class in org.archive.util
Simple benchmarking of different BloomFilter implementations.
BenchmarkBlooms() - Constructor for class org.archive.util.BenchmarkBlooms
 
BenchmarkUriUniqFilters - Class in org.archive.crawler.util
BenchmarkUriUniqFilters
BenchmarkUriUniqFilters() - Constructor for class org.archive.crawler.util.BenchmarkUriUniqFilters
 
betterPrintStack(RuntimeException) - Static method in class org.archive.util.DevUtils
 
bindObjectName(Context, ObjectName) - Static method in class org.archive.util.JndiUtils
 
BIT_INDEX_MASK - Static variable in class org.archive.util.BloomFilter32bit
 
BIT_INDEX_MASK - Static variable in class org.archive.util.BloomFilter32bitSplit
 
BIT_INDEX_MASK - Static variable in class org.archive.util.BloomFilter32bp2
 
BIT_INDEX_MASK - Static variable in class org.archive.util.BloomFilter32bp2Split
 
BIT_INDEX_MASK - Static variable in class org.archive.util.BloomFilter64bit
 
bloom - Variable in class org.archive.crawler.util.BloomUriUniqFilter
 
BloomFilter - Interface in org.archive.util
Common interface for different Bloom filter implementations
BloomFilter32bit - Class in org.archive.util
A Bloom filter.
BloomFilter32bit(int, int) - Constructor for class org.archive.util.BloomFilter32bit
Creates a new Bloom filter with given number of hash functions and expected number of elements.
BloomFilter32bitSplit - Class in org.archive.util
A Bloom filter.
BloomFilter32bitSplit(int, int) - Constructor for class org.archive.util.BloomFilter32bitSplit
Creates a new Bloom filter with given number of hash functions and expected number of elements.
BloomFilter32bp2 - Class in org.archive.util
A Bloom filter.
BloomFilter32bp2(int, int) - Constructor for class org.archive.util.BloomFilter32bp2
Creates a new Bloom filter with given number of hash functions and expected number of elements.
BloomFilter32bp2Split - Class in org.archive.util
A Bloom filter.
BloomFilter32bp2Split(int, int) - Constructor for class org.archive.util.BloomFilter32bp2Split
Creates a new Bloom filter with given number of hash functions and expected number of elements.
BloomFilter64bit - Class in org.archive.util
A Bloom filter.
BloomFilter64bit(int, int) - Constructor for class org.archive.util.BloomFilter64bit
Creates a new Bloom filter with given number of hash functions and expected number of elements.
BloomUriUniqFilter - Class in org.archive.crawler.util
A MG4J BloomFilter-based implementation of an AlreadySeen list.
BloomUriUniqFilter() - Constructor for class org.archive.crawler.util.BloomUriUniqFilter
Default constructor
BloomUriUniqFilter(int, int) - Constructor for class org.archive.crawler.util.BloomUriUniqFilter
Constructor.
BloomUriUniqFilterTest - Class in org.archive.crawler.util
Test BloomUriUniqFilter.
BloomUriUniqFilterTest() - Constructor for class org.archive.crawler.util.BloomUriUniqFilterTest
 
BOOLEAN - Static variable in class org.archive.crawler.settings.SettingsHandler
 
BOOLEAN_ATTRIBUTE_NAME - Static variable in class org.archive.configuration.registry.TestProcessor
 
borrowARCWriter() - Method in class org.archive.io.arc.ARCWriterPool
Check out an ARCWriter from the pool.
BroadScope - Class in org.archive.crawler.scope
A CrawlScope instance defines which URIs are "in" a particular crawl.
BroadScope(String) - Constructor for class org.archive.crawler.scope.BroadScope
Constructor.
BucketQueueAssignmentPolicy - Class in org.archive.crawler.frontier
Uses the target IPs as basis for queue-assignment, distributing them over a fixed number of sub-queues.
BucketQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
buffer - Variable in class org.archive.io.RecyclingFastBufferedOutputStream
The internal buffer.
buffer - Variable in class org.archive.util.PaddingStringBuffer
 
bufStreamBuf - Variable in class org.archive.io.RecordingOutputStream
Reusable buffer for FastBufferedOutputStream
buildDisplayingHeader(int, long) - Static method in class org.archive.crawler.util.LogReader
 
buildMBeanInfo() - Method in class org.archive.crawler.admin.CrawlJob
Build up the MBean info for Heritrix main.
buildMBeanInfo() - Method in class org.archive.crawler.Heritrix
Build up the MBean info for Heritrix main.
buildSurtPrefixSet() - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Construct the set of prefixes to use, from the seed list ( which may include both URIs and '+'-prefixed directives).
busyThreads - Variable in class org.archive.crawler.admin.StatisticsTracker
 
byteArrayEquals(byte[], byte[]) - Static method in class org.archive.util.ArchiveUtils
check that two byte arrays are equal.
byteArrayIntoLong(byte[]) - Static method in class org.archive.util.ArchiveUtils
 
byteArrayIntoLong(byte[], int) - Static method in class org.archive.util.ArchiveUtils
Byte array into long.

C

cache - Variable in class org.archive.crawler.processor.CrawlMapper
 
cache - Variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
CachedBdbMap - Class in org.archive.util
A BDB JE backed hashmap.
CachedBdbMap(String) - Constructor for class org.archive.util.CachedBdbMap
Constructor.
CachedBdbMap(File, String, Class, Class) - Constructor for class org.archive.util.CachedBdbMap
A constructor for creating a new CachedBdbMap.
CachedBdbMap.DbEnvironmentEntry - Class in org.archive.util
Simple structure to keep needed information about a DB Environment.
CachedBdbMap.DbEnvironmentEntry() - Constructor for class org.archive.util.CachedBdbMap.DbEnvironmentEntry
 
CachedBdbMapTest - Class in org.archive.util
 
CachedBdbMapTest() - Constructor for class org.archive.util.CachedBdbMapTest
 
cacheLength() - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
cacheMetadata() - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
calculateInsertKey(CrawlURI) - Static method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Calculate the insertKey that places a CrawlURI in the desired spot.
calculateOriginKey(String) - Static method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Calculate the 'origin' key for a virtual queue of items with the given classKey.
calculateSnoozeTime(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Calculates how long a host queue needs to be snoozed following the crawling of a URI.
CALENDARISH - Static variable in class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
 
CandidateURI - Class in org.archive.crawler.datamodel
A URI, discovered or passed-in, that may be scheduled.
CandidateURI() - Constructor for class org.archive.crawler.datamodel.CandidateURI
Constructor.
CandidateURI(UURI) - Constructor for class org.archive.crawler.datamodel.CandidateURI
 
CandidateURI(UURI, String, UURI, CharSequence) - Constructor for class org.archive.crawler.datamodel.CandidateURI
 
CandidateURITest - Class in org.archive.crawler.datamodel
Test CandidateURI serialization.
CandidateURITest() - Constructor for class org.archive.crawler.datamodel.CandidateURITest
 
CanonicalizationRule - Interface in org.archive.crawler.url
A rule to apply canonicalizing a url.
canonicalize(UURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Canonicalize passed uuri.
canonicalize(CandidateURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Canonicalize passed CandidateURI.
canonicalize(UURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Canonicalize passed uuri.
canonicalize(CandidateURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Canonicalize passed CandidateURI.
canonicalize(String, Object) - Method in interface org.archive.crawler.url.CanonicalizationRule
Apply this canonicalization rule.
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.FixupQueryStr
 
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.LowercaseRule
 
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.RegexRule
 
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.StripSessionIDs
 
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.StripUserinfoRule
 
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.StripWWWRule
 
canonicalize(UURI, CrawlOrder) - Static method in class org.archive.crawler.url.Canonicalizer
Convenience method that is passed a settings object instance pulling from it what it needs to canonicalize.
canonicalize(UURI, Iterator) - Static method in class org.archive.crawler.url.Canonicalizer
Run the passed uuri through the list of rules.
Canonicalizer - Class in org.archive.crawler.url
URL canonicalizer.
CanonicalizerTest - Class in org.archive.crawler.url
Test canonicalization.
CanonicalizerTest() - Constructor for class org.archive.crawler.url.CanonicalizerTest
 
capacityPowerOfTwo - Variable in class org.archive.util.AbstractLongFPSet
the capacity of this set, specified as the exponent of a power of 2
catalog - Variable in class org.archive.crawler.extractor.PDFParser
 
caUri - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter.PendingItem
 
cdxOutput(ARCReader, boolean) - Static method in class org.archive.io.arc.ARCReader
 
ChangeEvaluator - Class in org.archive.crawler.extractor
This processor compares the CrawlURI's current content digest with the one from a previous crawl.
ChangeEvaluator(String) - Constructor for class org.archive.crawler.extractor.ChangeEvaluator
Constructor
characters(char[], int, int) - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
 
charAt(int) - Method in class org.archive.crawler.settings.TextField
 
charAt(int) - Method in class org.archive.io.CharSubSequence
 
charAt(int) - Method in class org.archive.net.UURI
 
charSequenceFrom(InputStream, Charset) - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
CharSequenceLinkExtractor - Class in org.archive.extractor
Abstract superclass providing utility methods for LinkExtractors which would prefer to work on a CharSequence rather than a stream.
CharSequenceLinkExtractor() - Constructor for class org.archive.extractor.CharSequenceLinkExtractor
 
CharSequenceProvider - Interface in org.archive.extractor
Interface indicating an object can efficiently provide a (perhaps cached or simulated) CharSequence version of itself.
CharsetSelfTest - Class in org.archive.crawler.selftest
Simple test to ensure we can extract links from multibyte pages.
CharsetSelfTest() - Constructor for class org.archive.crawler.selftest.CharsetSelfTest
 
CharSubSequence - Class in org.archive.io
Provides a subsequence view onto a CharSequence.
CharSubSequence(CharSequence, int, int) - Constructor for class org.archive.io.CharSubSequence
 
check(CrawlerSettings, ComplexType, Type, Object) - Method in class org.archive.crawler.settings.Constraint
Run the check.
checkARCFileSize() - Method in class org.archive.io.arc.ARCWriter
Call this method just before we start to write a new record to the ARC.
checkAttribute(ModuleAttributeInfo, ComplexType, CrawlerSettings, HttpServletRequest, boolean) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Process passed attribute.
checkBytesWritten() - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
checkClientTrusted(X509Certificate[], String) - Method in class org.archive.httpclient.ConfigurableX509TrustManager
 
checkClose(Iterator) - Method in class org.archive.crawler.framework.CrawlScope
Convenience method to close SeedFileIterator, if appropriate.
checkCrawlJob(CrawlJob, HttpServletResponse, String, String) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Check passed job is not null and not readonly.
checkDirectory(File) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
checkFinish() - Method in class org.archive.crawler.framework.CrawlController
Evaluate if the crawl should stop because it is finished.
checkForEmptyPlaceHolder(String) - Method in class org.archive.crawler.Heritrix
If passed str has placeholder for the empty string, return the empty string else return orginal.
checkForInterrupt() - Method in class org.archive.crawler.framework.Processor
 
checkForNull(String) - Method in class org.archive.crawler.io.UriProcessingFormatter
 
checkMidfetchAbort(CrawlURI, HttpRecorderMethod, HttpConnection) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
checkOrder(CrawlerSettings, Type[], MapType) - Method in class org.archive.crawler.settings.MapTypeTest
Helper method for checking that elements are in a certain order after maipulating them.
checkParameters(byte[], long, long) - Method in class org.archive.io.ReplayCharSequenceFactory
Test passed arguments.
checkParamsCount(String, Object[], int) - Static method in class org.archive.util.JmxUtils
 
checkpoint() - Method in class org.archive.crawler.admin.CrawlJob
 
Checkpoint - Class in org.archive.crawler.datamodel
Record of a specific checkpoint on disk.
Checkpoint() - Constructor for class org.archive.crawler.datamodel.Checkpoint
Publically inaccessible default constructor.
Checkpoint(File) - Constructor for class org.archive.crawler.datamodel.Checkpoint
Create a Checkpoint instance based on the given prexisting checkpoint directory
checkpoint() - Method in class org.archive.crawler.framework.Checkpointer
Run a checkpoint of the crawler.
checkpoint() - Method in class org.archive.crawler.framework.CrawlController
Run checkpointing.
checkpoint(File) - Method in interface org.archive.crawler.frontier.FrontierJournal
Checkpoint.
checkpoint(File) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
checkpointBdb(File) - Method in class org.archive.crawler.framework.CrawlController
Checkpoint bdb.
checkpointBigMaps(File) - Method in class org.archive.crawler.framework.CrawlController
 
Checkpointer - Class in org.archive.crawler.framework
Runs checkpointing.
Checkpointer(CrawlController, File) - Constructor for class org.archive.crawler.framework.Checkpointer
Create a new CheckpointContext with the given store directory
Checkpointer(CrawlController, String) - Constructor for class org.archive.crawler.framework.Checkpointer
Create a new CheckpointContext with the given store directory
checkpointFailed(Exception) - Method in class org.archive.crawler.framework.Checkpointer
Note that a checkpoint failed
checkpointFailed(String) - Method in class org.archive.crawler.framework.Checkpointer
 
checkpointFailed() - Method in class org.archive.crawler.framework.Checkpointer
 
checkpointJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
Cause the current job to write a checkpoint to disk.
CheckpointUtils - Class in org.archive.crawler.util
Utilities useful checkpointing.
CheckpointUtils() - Constructor for class org.archive.crawler.util.CheckpointUtils
 
checkQuota(CrawlURI, String, String, CrawlSubstats, String) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
checkQuota(CrawlURI, long, long, String) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
Check if the given quota and actual values rule out processing the given CrawlURI, and mark up the CrawlURI appropriately if so.
checkServerTrusted(X509Certificate[], String) - Method in class org.archive.httpclient.ConfigurableX509TrustManager
 
checkStream(InputStream) - Static method in class org.archive.io.GzippedInputStream
 
CHECKSUM_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for checksum field.
CHECKSUM_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header Checksum field.
checkType(Class) - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
checkType(Object) - Method in class org.archive.crawler.settings.DoubleList
Check if element is of right type for this list.
checkType(Object) - Method in class org.archive.crawler.settings.FloatList
Check if element is of right type for this list.
checkType(Object) - Method in class org.archive.crawler.settings.IntegerList
Check if element is of right type for this list.
checkType(Object) - Method in class org.archive.crawler.settings.ListType
Check if element is of right type for this list.
checkType(Object) - Method in class org.archive.crawler.settings.LongList
Check if element is of right type for this list.
checkType(Object) - Method in class org.archive.crawler.settings.StringList
Check if element is of right type for this list.
checkUserAgentAndFrom() - Method in class org.archive.crawler.datamodel.CrawlOrder
Checks if the User Agent and From field are set 'correctly' in the specified Crawl Order.
checkValidAttributeName(String) - Method in class org.archive.configuration.Configuration
 
checkValue(CrawlerSettings, String, Object) - Method in class org.archive.crawler.settings.ComplexType
Check an attribute to see if it fulfills all the constraints set on the definition of this attribute.
checkValue(CrawlerSettings, String, Type, Object) - Method in class org.archive.crawler.settings.ComplexType
 
checkValue(CrawlerSettings, String, Type, Object) - Method in class org.archive.crawler.settings.MapType
 
checkWriteable(File) - Method in class org.archive.io.arc.ARCWriter
 
CIRCUMFLEX - Static variable in class org.archive.net.UURIFactory
 
CIRCUMFLEX_PATTERN - Static variable in class org.archive.net.UURIFactory
 
classCatalog - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
For BDB serialization of objects
classCatalog - Variable in class org.archive.util.CachedBdbMap.DbEnvironmentEntry
 
CLASSEXT - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
CLASSEXT - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
CLASSIC - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
ClassicScope - Class in org.archive.crawler.scope
ClassicScope: superclass with shared Scope behavior for most common scopes.
ClassicScope(String) - Constructor for class org.archive.crawler.scope.ClassicScope
 
ClassicScope() - Constructor for class org.archive.crawler.scope.ClassicScope
Default constructor.
classKey - Variable in class org.archive.crawler.frontier.WorkQueue
The classKey
ClassKeyMatchesRegExpDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any CrawlURI class key -- i.e.
ClassKeyMatchesRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.ClassKeyMatchesRegExpDecideRule
Usual constructor.
classnameBasedUID(Class, int) - Static method in class org.archive.util.ArchiveUtils
Generate a long UID based on the given class and version number.
cleanup() - Method in class org.archive.crawler.datamodel.ServerCache
Called when shutting down the cache so we can do clean up.
cleanup() - Static method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
Cleanup this factory.
cleanup() - Method in class org.archive.crawler.framework.Checkpointer
 
cleanup() - Method in class org.archive.crawler.framework.ToePool
 
cleanup() - Method in class org.archive.crawler.settings.SettingsHandler
 
cleanup() - Method in class org.archive.util.HttpRecorder
Cleanup backing files.
cleanupCurrentRecord() - Method in class org.archive.io.arc.ARCReader
Cleanout the current record if there is one.
cleanupHttp() - Method in class org.archive.crawler.fetcher.FetchHTTP
Perform any final cleanup related to the HttpClient instance.
cleanUpOldFiles(String) - Method in class org.archive.util.TmpDirTestCase
Delete any files left over from previous run.
cleanUpOldFiles(File, String) - Method in class org.archive.util.TmpDirTestCase
Delete any files left over from previous run.
clear() - Method in class org.archive.crawler.settings.ListType
Removes all elements from this list.
clear() - Method in class org.archive.crawler.settings.SoftSettingsHash
Removes all settings object from this hash.
clear() - Method in class org.archive.util.CachedBdbMap
Note that a call to this method CLOSEs the underlying bdbje.
clearAList() - Method in class org.archive.crawler.datamodel.CandidateURI
 
clearAt(long) - Method in class org.archive.util.AbstractLongFPSet
 
clearAt(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
clearCheckpointInProgressDirectory() - Method in class org.archive.crawler.framework.Checkpointer
 
clearErrors() - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
Reset handler.
clearHeld() - Method in class org.archive.crawler.frontier.WorkQueue
Clear isHeld to false
clearOutlinks() - Method in class org.archive.crawler.datamodel.CrawlURI
 
close() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Close down any allocated resources.
close() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Cleanup all open Berkeley Database objects.
close() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
Closes all HQs and the Environment.
close() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
clean up
close() - Method in interface org.archive.crawler.frontier.FrontierJournal
Flush and close any held objects.
close() - Method in class org.archive.crawler.frontier.RecoveryJournal
Flush and close the underlying IO objects.
close() - Method in class org.archive.crawler.scope.SeedFileIterator
 
close() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
close() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
close() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
close() - Method in class org.archive.io.arc.ARCReader
Call close when done so we can cleanup after ourselves.
close() - Method in class org.archive.io.arc.ARCRecord
Calling close on a record skips us past this record to the next record in the stream.
close() - Method in class org.archive.io.arc.ARCWriter
Close any extant ARC file.
close() - Method in class org.archive.io.arc.ARCWriterPool
 
close() - Method in class org.archive.io.ObjectPlusFilesInputStream
In addition to default, do any registered cleanup tasks.
close() - Method in class org.archive.io.RandomAccessInputStream
 
close() - Method in class org.archive.io.RandomAccessOutputStream
 
close() - Method in class org.archive.io.RecordingInputStream
 
close() - Method in class org.archive.io.RecordingOutputStream
 
close() - Method in class org.archive.io.RecyclingFastBufferedOutputStream
 
close() - Method in interface org.archive.io.ReplayCharSequence
Call this method when done so implementation has chance to clean up resources.
close() - Method in class org.archive.io.ReplayInputStream
 
close() - Method in class org.archive.io.SinkHandler
 
close() - Method in class org.archive.util.CachedBdbMap
 
close() - Method in class org.archive.util.HttpRecorder
Close all streams.
closeDiskStream() - Method in class org.archive.io.RecordingOutputStream
 
closeIdleConnections(long) - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
 
closeLogFiles() - Method in class org.archive.crawler.framework.CrawlController
Close all log files and remove handlers from loggers.
closeQueue() - Method in class org.archive.crawler.frontier.BdbFrontier
 
closeQueue() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
closeRecorder() - Method in class org.archive.io.RecordingInputStream
 
closeRecorder() - Method in class org.archive.io.RecordingOutputStream
 
closeRecorders() - Method in class org.archive.util.HttpRecorder
Close both input and output recorders.
coalesceHostAuthorityStrings() - Method in class org.archive.net.UURI
The two String fields cachedHost and cachedAuthorityMinusUserInfo are usually identical; if so, coalesce into a single instance.
coalesceUriStrings() - Method in class org.archive.net.UURI
The two String fields cachedString and cachedEscapedURI are usually identical; if so, coalesce into a single instance.
CODE_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header Result Code field.
COLON - Static variable in class org.archive.net.UURIFactory
 
CommandLineParser - Class in org.archive.crawler
Print Heritrix command-line usage message.
CommandLineParser(String[], PrintWriter, String) - Constructor for class org.archive.crawler.CommandLineParser
Constructor.
CommandLineParser.HeritrixHelpFormatter - Class in org.archive.crawler
Override so can customize usage output.
CommandLineParser.HeritrixHelpFormatter() - Constructor for class org.archive.crawler.CommandLineParser.HeritrixHelpFormatter
 
COMMENT_LINE - Static variable in class org.archive.util.iterator.RegexpLineIterator
 
COMMERCIAL_AT - Static variable in class org.archive.net.UURIFactory
 
COMPACT_REPORT - Static variable in class org.archive.crawler.framework.ToePool
 
compactReportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
 
compare(Object, Object) - Method in class org.archive.crawler.util.StringIntPairComparator
 
compareTo(Object) - Method in class org.archive.crawler.frontier.WorkQueue
 
compareTo(Object) - Method in class org.archive.crawler.settings.Constraint
Compare this constraints level to another constraint.
compareTo(Object) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter.PendingItem
 
compareTo(Object) - Method in class org.archive.net.UURI
 
completePause() - Method in class org.archive.crawler.framework.CrawlController
 
completeStop() - Method in class org.archive.crawler.admin.CrawlJob.MBeanCrawlController
 
completeStop() - Method in class org.archive.crawler.framework.CrawlController
Called when the last toethread exits.
ComplexType - Class in org.archive.crawler.settings
Superclass of all configurable modules.
ComplexType(String, String) - Constructor for class org.archive.crawler.settings.ComplexType
Creates a new instance of ComplexType.
ComplexType.Context - Class in org.archive.crawler.settings
 
ComplexType.Context() - Constructor for class org.archive.crawler.settings.ComplexType.Context
 
ComplexType.Context(CrawlerSettings, UURI) - Constructor for class org.archive.crawler.settings.ComplexType.Context
 
ComplexType.MBeanAttributeInfoIterator - Class in org.archive.crawler.settings
Iterator over all MBeanAttributeInfo for this ComplexType
ComplexType.MBeanAttributeInfoIterator(Object) - Constructor for class org.archive.crawler.settings.ComplexType.MBeanAttributeInfoIterator
 
COMPOSITE_TYPE - Static variable in class org.archive.configuration.Pointer
 
CompositeFileInputStream - Class in org.archive.io
 
CompositeFileInputStream(List) - Constructor for class org.archive.io.CompositeFileInputStream
 
CompositeFileReader - Class in org.archive.io
 
CompositeFileReader(List) - Constructor for class org.archive.io.CompositeFileReader
 
CompositeIterator - Class in org.archive.util.iterator
An iterator that's built up out of any number of other iterators.
CompositeIterator() - Constructor for class org.archive.util.iterator.CompositeIterator
Create an empty CompositeIterator.
CompositeIterator(Iterator, Iterator) - Constructor for class org.archive.util.iterator.CompositeIterator
Convenience method for concatenating together two iterators.
compressed - Variable in class org.archive.io.arc.ARCReader
Is this arc compressed?
COMPRESSED_ARC_FILE_EXTENSION - Static variable in interface org.archive.io.arc.ARCConstants
Compressed arc file extension.
COMPRESSED_FILE_EXTENSION - Static variable in interface org.archive.io.arc.ARCConstants
Compressed file extention.
Configurable - Interface in org.archive.configuration
Implemented by code that wants to be configured.
ConfigurableX509TrustManager - Class in org.archive.httpclient
A configurable trust manager built on X509TrustManager.
ConfigurableX509TrustManager() - Constructor for class org.archive.httpclient.ConfigurableX509TrustManager
 
ConfigurableX509TrustManager(String) - Constructor for class org.archive.httpclient.ConfigurableX509TrustManager
Constructor.
ConfigurableX509TrustManagerTest - Class in org.archive.httpclient
Test configurable trust.
ConfigurableX509TrustManagerTest(String) - Constructor for class org.archive.httpclient.ConfigurableX509TrustManagerTest
 
Configuration - Class in org.archive.configuration
Configuration for a named component homed on a domain.
Configuration(String) - Constructor for class org.archive.configuration.Configuration
Constructor.
Configuration(MBeanInfo, AttributeList) - Constructor for class org.archive.configuration.Configuration
Constructor.
ConfigurationException - Exception in org.archive.configuration
 
ConfigurationException() - Constructor for exception org.archive.configuration.ConfigurationException
 
ConfigurationException(String) - Constructor for exception org.archive.configuration.ConfigurationException
 
ConfigurationException(String, Throwable) - Constructor for exception org.archive.configuration.ConfigurationException
 
ConfigurationException(Throwable) - Constructor for exception org.archive.configuration.ConfigurationException
 
ConfigurationException - Exception in org.archive.crawler.framework.exceptions
ConfigurationExceptions should be thrown when a configuration file is missing data, or contains uninterpretable data, at runtime.
ConfigurationException() - Constructor for exception org.archive.crawler.framework.exceptions.ConfigurationException
default constructor
ConfigurationException(String) - Constructor for exception org.archive.crawler.framework.exceptions.ConfigurationException
Create a ConfigurationException
ConfigurationException(String, Throwable) - Constructor for exception org.archive.crawler.framework.exceptions.ConfigurationException
 
ConfigurationException(Throwable) - Constructor for exception org.archive.crawler.framework.exceptions.ConfigurationException
Create a ConfigurationException
ConfigurationException(String, String, String) - Constructor for exception org.archive.crawler.framework.exceptions.ConfigurationException
Create ConfigurationException
ConfigurationException(String, Throwable, String, String) - Constructor for exception org.archive.crawler.framework.exceptions.ConfigurationException
Create ConfigurationException
ConfigurationException(Throwable, String, String) - Constructor for exception org.archive.crawler.framework.exceptions.ConfigurationException
Create ConfigurationException
configure(String) - Method in class org.archive.crawler.deciderules.ConfiguredDecideRuleTest
 
configure(Registry) - Method in class org.archive.crawler.Heritrix
 
ConfiguredDecideRule - Class in org.archive.crawler.deciderules
Rule which can be configured to ACCEPT or REJECT at operator's option.
ConfiguredDecideRule(String) - Constructor for class org.archive.crawler.deciderules.ConfiguredDecideRule
 
ConfiguredDecideRuleTest - Class in org.archive.crawler.deciderules
 
ConfiguredDecideRuleTest() - Constructor for class org.archive.crawler.deciderules.ConfiguredDecideRuleTest
 
configureHttp() - Method in class org.archive.crawler.fetcher.FetchHTTP
 
configureTrustStore() - Static method in class org.archive.crawler.Heritrix
Configure our trust store.
congestionRatio - Variable in class org.archive.crawler.admin.StatisticsTracker
 
congestionRatio() - Method in class org.archive.crawler.admin.StatisticsTracker
Ratio of number of threads that would theoretically allow maximum crawl progress (if each was as productive as current threads), to current number of threads.
congestionRatio() - Method in interface org.archive.crawler.framework.Frontier
 
congestionRatio() - Method in interface org.archive.crawler.framework.StatisticsTracking
 
congestionRatio() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
congestionRatio() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
connect() - Method in class org.archive.net.rsync.RsyncURLConnection
Do rsync copy to local file.
consecutiveConnectionErrors - Variable in class org.archive.crawler.datamodel.CrawlServer
 
considerIncluded(UURI) - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that it should consider the given UURI as if already scheduled.
considerIncluded(UURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
considerIncluded(UURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
considerStrings(CrawlURI, CharSequence, CrawlController, boolean) - Static method in class org.archive.crawler.extractor.ExtractorJS
 
Constraint - Class in org.archive.crawler.settings
Superclass for constraints that can be set on attribute definitions.
Constraint(Level, String) - Constructor for class org.archive.crawler.settings.Constraint
Constructs a new Constraint.
Constraint.FailedCheck - Class in org.archive.crawler.settings
Objects of this class represents failed constraint checks.
Constraint.FailedCheck(CrawlerSettings, ComplexType, Type, Object, String) - Constructor for class org.archive.crawler.settings.Constraint.FailedCheck
Construct a new FailedCheck object.
Constraint.FailedCheck(CrawlerSettings, ComplexType, Type, Object) - Constructor for class org.archive.crawler.settings.Constraint.FailedCheck
Construct a new FailedCheck object using the constraints default message.
constructedRegexp - Variable in class org.archive.crawler.deciderules.PathologicalPathDecideRule
 
constructRegexp() - Method in class org.archive.crawler.deciderules.PathologicalPathDecideRule
 
containerInitialization() - Static method in class org.archive.crawler.Heritrix
Run setup tasks for this 'container'.
contains(Object) - Method in class org.archive.crawler.settings.ListType
 
contains(long) - Method in class org.archive.util.AbstractLongFPSet
Does this set contain the given value?
contains(CharSequence) - Method in interface org.archive.util.BloomFilter
Checks whether the given character sequence is in this filter.
contains(CharSequence) - Method in class org.archive.util.BloomFilter32bit
Checks whether the given character sequence is in this filter.
contains(CharSequence) - Method in class org.archive.util.BloomFilter32bitSplit
Checks whether the given character sequence is in this filter.
contains(CharSequence) - Method in class org.archive.util.BloomFilter32bp2
Checks whether the given character sequence is in this filter.
contains(CharSequence) - Method in class org.archive.util.BloomFilter32bp2Split
Checks whether the given character sequence is in this filter.
contains(CharSequence) - Method in class org.archive.util.BloomFilter64bit
Checks whether the given character sequence is in this filter.
contains(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
contains(long) - Method in interface org.archive.util.fingerprint.LongFPSet
Does this set contain a given fingerprint.
containsAll(Collection) - Method in class org.archive.crawler.settings.ListType
 
containsHost(String) - Method in class org.archive.crawler.datamodel.ServerCache
 
containsKey(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
containsKey(Object) - Method in class org.archive.util.CachedBdbMap
 
containsPrefixOf(String) - Method in class org.archive.util.SurtPrefixSet
Test whether the given String is prefixed by one of this set's entries.
containsServer(String) - Method in class org.archive.crawler.datamodel.ServerCache
 
containsValue(Object) - Method in class org.archive.util.CachedBdbMap
 
CONTENT_CHANGED - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
URI content had changed between the two latest, successfully completed fetches.
CONTENT_UNCHANGED - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
URI content has not changed between the two latest, successfully completed fetches.
CONTENT_UNKNOWN - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
No knowledge of URI content.
ContentBasedWaitEvaluator - Class in org.archive.crawler.postprocessor
A WaitEvaluator that compares the CrawlURIs content type to a configurable regular expression.
ContentBasedWaitEvaluator(String) - Constructor for class org.archive.crawler.postprocessor.ContentBasedWaitEvaluator
Constructor
ContentBasedWaitEvaluator(String, String, String, Long, Long, Long, Double, Double) - Constructor for class org.archive.crawler.postprocessor.ContentBasedWaitEvaluator
Constructor
contentSinceCheck - Variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
 
ContentTypeRegExpFilter - Class in org.archive.crawler.filter
Compares the content-type of the passed CrawlURI to a regular expression.
ContentTypeRegExpFilter(String) - Constructor for class org.archive.crawler.filter.ContentTypeRegExpFilter
 
ContentTypeRegExpFilter(String, String) - Constructor for class org.archive.crawler.filter.ContentTypeRegExpFilter
 
contextDestroyed(ServletContextEvent) - Method in class org.archive.crawler.WebappLifecycle
 
contextInitialized(ServletContextEvent) - Method in class org.archive.crawler.WebappLifecycle
 
controller - Variable in class org.archive.crawler.extractor.CrawlUriSWFAction
 
controller - Variable in class org.archive.crawler.framework.AbstractTracker
A reference to the CrawlContoller of the crawl that we are to track statistics for.
controller - Variable in class org.archive.crawler.framework.ToePool
 
controller - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
convertAllPrefixesToDomains() - Method in class org.archive.util.SurtPrefixSet
Changes all prefixes so that they only enforce a general domain (allowing subdomains).For prefixes that don't include a ')', no change is necessary.
convertAllPrefixesToHosts() - Method in class org.archive.util.SurtPrefixSet
Changes all prefixes so that they enforce an exact host.
convertImpact(int) - Static method in class org.archive.util.JmxUtils
 
convertToFatalConfigurationException(Exception) - Method in class org.archive.crawler.framework.CrawlController
 
convertToOpenMBeanAttribute(MBeanAttributeInfo) - Static method in class org.archive.util.JmxUtils
 
convertToOpenMBeanAttribute(MBeanAttributeInfo, String) - Static method in class org.archive.util.JmxUtils
 
convertToOpenMBeanOperation(MBeanOperationInfo) - Static method in class org.archive.util.JmxUtils
 
convertToOpenMBeanOperation(MBeanOperationInfo, String, OpenType) - Static method in class org.archive.util.JmxUtils
 
convertToOpenMBeanOperationInfo(MBeanParameterInfo) - Static method in class org.archive.util.JmxUtils
 
cookieDb - Variable in class org.archive.crawler.fetcher.FetchHTTP
Database backing cookie map, if using BDB
COOKIEDB_NAME - Static variable in class org.archive.crawler.fetcher.FetchHTTP
Name of cookie BDB Database
cookies - Variable in class org.archive.crawler.fetcher.FetchHTTP.PostRestore
 
CookieUtils - Class in org.archive.crawler.admin.ui
Utility methods for accessing cookies.
CookieUtils() - Constructor for class org.archive.crawler.admin.ui.CookieUtils
 
copyAttribute(String, DataContainer) - Method in class org.archive.crawler.settings.DataContainer
 
copyAttributeInfo(String, DataContainer) - Method in class org.archive.crawler.settings.DataContainer
 
copyContentBodyTo(File) - Method in class org.archive.io.RecordingInputStream
 
copyFile(File, File) - Static method in class org.archive.util.FileUtils
Copy the src file to the destination.
copyFile(File, File, boolean) - Static method in class org.archive.util.FileUtils
Copy the src file to the destination.
copyFile(File, File, long) - Static method in class org.archive.util.FileUtils
Copy up to extent bytes of the source file to the destination
copyFile(File, File, long, boolean) - Static method in class org.archive.util.FileUtils
Copy up to extent bytes of the source file to the destination
copyFiles(File, Set, File) - Static method in class org.archive.util.FileUtils
 
copyFiles(File, File) - Static method in class org.archive.util.FileUtils
Recursively copy all files from one directory to another.
copyFiles(File, FilenameFilter, File, boolean, boolean) - Static method in class org.archive.util.FileUtils
Recursively copy all files from one directory to another.
copySettings(File) - Method in class org.archive.crawler.framework.CrawlController
Copy off the settings.
copySettings(File, String) - Method in class org.archive.crawler.settings.XMLSettingsHandler
Creates a replica of the settings file structure in another directory (fully recursive, includes all per host settings).
CoreAttributeConstants - Interface in org.archive.crawler.datamodel
CrawlURI attribute keys used by the core crawler classes.
CostAssignmentPolicy - Class in org.archive.crawler.frontier
Calculate a integer 'cost' value for the given CrawlURI.
CostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.CostAssignmentPolicy
 
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
 
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.CostAssignmentPolicy
 
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.UnitCostAssignmentPolicy
 
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.WagCostAssignmentPolicy
Add constant penalties for certain features of URI (and its 'via') that make it more delayable/skippable.
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.ZeroCostAssignmentPolicy
 
count() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
 
count - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
count - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
count() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
count() - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
count() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
count - Variable in class org.archive.util.AbstractLongFPSet
The current number of elements in the set
count() - Method in class org.archive.util.AbstractLongFPSet
Return the number of entries in this set.
count - Variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
count() - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
count() - Method in interface org.archive.util.fingerprint.LongFPSet
get the number of elements in the Set
COUNT_DOMAIN - Static variable in class org.archive.crawler.frontier.DomainSensitiveFrontier
 
COUNT_HOST - Static variable in class org.archive.crawler.frontier.DomainSensitiveFrontier
 
COUNT_OVERRIDE - Static variable in class org.archive.crawler.frontier.DomainSensitiveFrontier
 
countCrawlURIs() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Count all entries in both primaryUriDB and processingUriDB.
crawlCheckpoint(File) - Method in class org.archive.crawler.admin.CrawlJob
 
crawlCheckpoint(File) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
crawlCheckpoint(File) - Method in class org.archive.crawler.admin.StatisticsTracker
 
crawlCheckpoint(File) - Method in interface org.archive.crawler.event.CrawlStatusListener
Called by CrawlController when checkpointing.
crawlCheckpoint(File) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
crawlCheckpoint(File) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlCheckpoint(File) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
crawlCheckpoint(File) - Method in class org.archive.crawler.frontier.BdbFrontier
 
crawlCheckpoint(File) - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
CrawlController - Class in org.archive.crawler.framework
CrawlController collects all the classes which cooperate to perform a crawl and provides a high-level interface to the running crawl.
CrawlController() - Constructor for class org.archive.crawler.framework.CrawlController
Default constructor
crawlDuration() - Method in class org.archive.crawler.framework.AbstractTracker
 
crawlDuration() - Method in interface org.archive.crawler.framework.StatisticsTracking
Returns how long the current crawl has been running (excluding any time spent paused/suspended/stopped) since it began.
crawledURIDisregard(CrawlURI) - Method in class org.archive.crawler.admin.StatisticsTracker
 
crawledURIDisregard(CrawlURI) - Method in interface org.archive.crawler.event.CrawlURIDispositionListener
Notification of a crawled URI that is to be disregarded.
crawledURIDisregard(CrawlURI) - Method in class org.archive.crawler.frontier.DomainSensitiveFrontier
 
crawledURIFailure(CrawlURI) - Method in class org.archive.crawler.admin.StatisticsTracker
 
crawledURIFailure(CrawlURI) - Method in interface org.archive.crawler.event.CrawlURIDispositionListener
Notification of a failed crawling of a URI.
crawledURIFailure(CrawlURI) - Method in class org.archive.crawler.frontier.DomainSensitiveFrontier
 
crawledURINeedRetry(CrawlURI) - Method in class org.archive.crawler.admin.StatisticsTracker
 
crawledURINeedRetry(CrawlURI) - Method in interface org.archive.crawler.event.CrawlURIDispositionListener
Notification of a failed crawl of a URI that will be retried (failure due to possible transient problems).
crawledURINeedRetry(CrawlURI) - Method in class org.archive.crawler.frontier.DomainSensitiveFrontier
 
crawledURISuccessful(CrawlURI) - Method in class org.archive.crawler.admin.StatisticsTracker
 
crawledURISuccessful(CrawlURI) - Method in interface org.archive.crawler.event.CrawlURIDispositionListener
Notification of a successfully crawled URI
crawledURISuccessful(CrawlURI) - Method in class org.archive.crawler.frontier.DomainSensitiveFrontier
 
crawlEnded(String) - Method in class org.archive.crawler.admin.CrawlJob
 
crawlEnded(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
crawlEnded(String) - Method in class org.archive.crawler.admin.StatisticsTracker
 
crawlEnded(String) - Method in interface org.archive.crawler.event.CrawlStatusListener
Called when a CrawlController has ended a crawl and is about to exit.
crawlEnded(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
crawlEnded(String) - Method in class org.archive.crawler.framework.AbstractTracker
 
crawlEnded(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlEnded(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
crawlEnded(String) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
crawlEnded(String) - Method in class org.archive.crawler.selftest.SelfTestCrawlJobHandler
 
crawlEnded(String) - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
crawlEnding(String) - Method in class org.archive.crawler.admin.CrawlJob
 
crawlEnding(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
crawlEnding(String) - Method in interface org.archive.crawler.event.CrawlStatusListener
Called when a CrawlController is ending a crawl (for any reason)
crawlEnding(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
crawlEnding(String) - Method in class org.archive.crawler.framework.AbstractTracker
 
crawlEnding(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlEnding(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
crawlEnding(String) - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
crawlerEndTime - Variable in class org.archive.crawler.framework.AbstractTracker
 
crawlerPauseStarted - Variable in class org.archive.crawler.framework.AbstractTracker
 
CrawlerSettings - Class in org.archive.crawler.settings
Class representing a settings file.
CrawlerSettings(SettingsHandler, String) - Constructor for class org.archive.crawler.settings.CrawlerSettings
Constructs a new CrawlerSettings object.
CrawlerSettings(SettingsHandler, String, String) - Constructor for class org.archive.crawler.settings.CrawlerSettings
Constructs a new CrawlerSettings object which is a refinement of another settings object.
CrawlerSettingsTest - Class in org.archive.crawler.settings
Test the CrawlerSettings object
CrawlerSettingsTest() - Constructor for class org.archive.crawler.settings.CrawlerSettingsTest
 
crawlerStartTime - Variable in class org.archive.crawler.framework.AbstractTracker
 
crawlerTotalPausedTime - Variable in class org.archive.crawler.framework.AbstractTracker
 
CrawlHost - Class in org.archive.crawler.datamodel
Represents a single remote "host".
CrawlHost(String) - Constructor for class org.archive.crawler.datamodel.CrawlHost
Create a new CrawlHost object.
CrawlHost(String, String) - Constructor for class org.archive.crawler.datamodel.CrawlHost
Create a new CrawlHost object.
CrawlJob - Class in org.archive.crawler.admin
A CrawlJob encapsulates a 'crawl order' with any and all information and methods needed by a CrawlJobHandler to accept and execute them.
CrawlJob() - Constructor for class org.archive.crawler.admin.CrawlJob
A shutdown Constructor.
CrawlJob(String, String, XMLSettingsHandler, CrawlJobErrorHandler, int, File) - Constructor for class org.archive.crawler.admin.CrawlJob
A constructor for jobs.
CrawlJob(String, XMLSettingsHandler, CrawlJobErrorHandler) - Constructor for class org.archive.crawler.admin.CrawlJob
A constructor for profiles.
CrawlJob(String, String, XMLSettingsHandler, CrawlJobErrorHandler, int, File, String, boolean, boolean) - Constructor for class org.archive.crawler.admin.CrawlJob
 
CrawlJob(File, CrawlJobErrorHandler) - Constructor for class org.archive.crawler.admin.CrawlJob
A constructor for reloading jobs from disk.
CrawlJob.MBeanCrawlController - Class in org.archive.crawler.admin
Subclass of crawlcontroller that unregisters beans when stopped.
CrawlJob.MBeanCrawlController() - Constructor for class org.archive.crawler.admin.CrawlJob.MBeanCrawlController
 
CrawlJobErrorHandler - Class in org.archive.crawler.admin
An implementation of the ValueErrorHandler for the UI.
CrawlJobErrorHandler() - Constructor for class org.archive.crawler.admin.CrawlJobErrorHandler
 
CrawlJobErrorHandler(Level) - Constructor for class org.archive.crawler.admin.CrawlJobErrorHandler
 
CrawlJobHandler - Class in org.archive.crawler.admin
This class manages CrawlJobs.
CrawlJobHandler(File) - Constructor for class org.archive.crawler.admin.CrawlJobHandler
Constructor.
CrawlJobHandler(File, boolean, boolean) - Constructor for class org.archive.crawler.admin.CrawlJobHandler
Constructor allowing for optional loading of profiles and jobs.
CrawlMapper - Class in org.archive.crawler.processor
A simple crawl splitter/mapper, dividing up CandidateURIs/CrawlURIs between crawlers by diverting some range of URIs to local log files (which can then be imported to other crawlers).
CrawlMapper(String) - Constructor for class org.archive.crawler.processor.CrawlMapper
Constructor.
CrawlMapper.FilePrintWriter - Class in org.archive.crawler.processor
PrintWriter which remembers the File to which it writes.
CrawlMapper.FilePrintWriter(File) - Constructor for class org.archive.crawler.processor.CrawlMapper.FilePrintWriter
 
CrawlOrder - Class in org.archive.configuration.registry
 
CrawlOrder(String) - Constructor for class org.archive.configuration.registry.CrawlOrder
 
CrawlOrder - Class in org.archive.crawler.datamodel
Represents the 'root' of the settings hierarchy.
CrawlOrder() - Constructor for class org.archive.crawler.datamodel.CrawlOrder
Construct a CrawlOrder.
CrawlOrder.CrawlOrderConfiguration - Class in org.archive.configuration.registry
 
CrawlOrder.CrawlOrderConfiguration(String) - Constructor for class org.archive.configuration.registry.CrawlOrder.CrawlOrderConfiguration
 
CrawlOrderSubClass - Class in org.archive.configuration.registry
 
CrawlOrderSubClass(String) - Constructor for class org.archive.configuration.registry.CrawlOrderSubClass
 
CrawlOrderSubClass.CrawlOrderSubClassConfiguration - Class in org.archive.configuration.registry
 
CrawlOrderSubClass.CrawlOrderSubClassConfiguration(String) - Constructor for class org.archive.configuration.registry.CrawlOrderSubClass.CrawlOrderSubClassConfiguration
 
crawlPaused(String) - Method in class org.archive.crawler.admin.CrawlJob
 
crawlPaused(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
crawlPaused(String) - Method in interface org.archive.crawler.event.CrawlStatusListener
Called when a CrawlController is actually paused (all threads are idle).
crawlPaused(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
crawlPaused(String) - Method in class org.archive.crawler.framework.AbstractTracker
 
crawlPaused(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlPaused(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
crawlPaused(String) - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
crawlPausing(String) - Method in class org.archive.crawler.admin.CrawlJob
 
crawlPausing(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
crawlPausing(String) - Method in interface org.archive.crawler.event.CrawlStatusListener
Called when a CrawlController is going to be paused.
crawlPausing(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
crawlPausing(String) - Method in class org.archive.crawler.framework.AbstractTracker
 
crawlPausing(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlPausing(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
crawlPausing(String) - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
crawlResuming(String) - Method in class org.archive.crawler.admin.CrawlJob
 
crawlResuming(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
crawlResuming(String) - Method in interface org.archive.crawler.event.CrawlStatusListener
Called when a CrawlController is resuming a crawl that had been paused.
crawlResuming(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
crawlResuming(String) - Method in class org.archive.crawler.framework.AbstractTracker
 
crawlResuming(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlResuming(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
crawlResuming(String) - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
CrawlScope - Class in org.archive.crawler.framework
A CrawlScope instance defines which URIs are "in" a particular crawl.
CrawlScope(String) - Constructor for class org.archive.crawler.framework.CrawlScope
Constructs a new CrawlScope.
CrawlScope() - Constructor for class org.archive.crawler.framework.CrawlScope
Default constructor.
CrawlServer - Class in org.archive.crawler.datamodel
Represents a single remote "server".
CrawlServer(String) - Constructor for class org.archive.crawler.datamodel.CrawlServer
Creates a new CrawlServer object.
CrawlSettingsSAXHandler - Class in org.archive.crawler.settings
An SAX element handler that updates a CrawlerSettings object.
CrawlSettingsSAXHandler(CrawlerSettings) - Constructor for class org.archive.crawler.settings.CrawlSettingsSAXHandler
Creates a new CrawlSettingsSAXHandler.
CrawlSettingsSAXSource - Class in org.archive.crawler.settings
Class that takes a CrawlerSettings object and create SAXEvents from it.
CrawlSettingsSAXSource(CrawlerSettings) - Constructor for class org.archive.crawler.settings.CrawlSettingsSAXSource
Constructs a new CrawlSettingsSAXSource.
crawlStarted(String) - Method in class org.archive.crawler.admin.CrawlJob
 
crawlStarted(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
crawlStarted(String) - Method in interface org.archive.crawler.event.CrawlStatusListener
Called on crawl start.
crawlStarted(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
crawlStarted(String) - Method in class org.archive.crawler.framework.AbstractTracker
 
crawlStarted(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlStarted(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
crawlStarted(String) - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
CrawlStateUpdater - Class in org.archive.crawler.postprocessor
A step, late in the processing of a CrawlURI, for updating the per-host information that may have been affected by the fetch.
CrawlStateUpdater(String) - Constructor for class org.archive.crawler.postprocessor.CrawlStateUpdater
 
CrawlStatusListener - Interface in org.archive.crawler.event
Listen for CrawlStatus events.
CrawlSubstats - Class in org.archive.crawler.datamodel
Collector of statististics for a 'subset' of a crawl, such as a server (host:port), host, or frontier group (eg queue).
CrawlSubstats() - Constructor for class org.archive.crawler.datamodel.CrawlSubstats
 
CrawlSubstats.HasCrawlSubstats - Interface in org.archive.crawler.datamodel
 
CrawlURI - Class in org.archive.crawler.datamodel
Represents a candidate URI and the associated state it collects as it is crawled.
CrawlURI(UURI) - Constructor for class org.archive.crawler.datamodel.CrawlURI
Create a new instance of CrawlURI from a UURI.
CrawlURI(CandidateURI, long) - Constructor for class org.archive.crawler.datamodel.CrawlURI
Create a new instance of CrawlURI from a CandidateURI
crawlURIBinding - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
A binding for the CrawlURIARWrapper object
CrawlURIDispositionListener - Interface in org.archive.crawler.event
An interface for objects that want to be notified of a CrawlURI disposition (happens each time a curi has been through the processors).
CrawlUriSWFAction - Class in org.archive.crawler.extractor
SWF action that handles discovered URIs.
CrawlUriSWFAction(CrawlURI, CrawlController) - Constructor for class org.archive.crawler.extractor.CrawlUriSWFAction
 
CrawlURITest - Class in org.archive.crawler.datamodel
 
CrawlURITest() - Constructor for class org.archive.crawler.datamodel.CrawlURITest
 
create(ObjectName) - Static method in class org.archive.configuration.Pointer
 
create(CrawlerSettings, String, Class) - Method in class org.archive.crawler.datamodel.CredentialStore
Create and add to the list a credential of the passed type giving the credential the passed name.
createAlreadyIncluded() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Create a UriUniqFilter that will serve as record of already seen URIs.
createAlreadyIncluded() - Method in class org.archive.crawler.frontier.BdbFrontier
Create a UriUniqFilter that will serve as record of already seen URIs.
createAlreadyIncluded() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Create a UriUniqFilter that will serve as record of already seen URIs.
createAndAddLink(String, CharSequence, char) - Method in class org.archive.crawler.datamodel.CrawlURI
Convenience method for creating a Link with the given string and context
createAndAddLinkRelativeToBase(String, CharSequence, char) - Method in class org.archive.crawler.datamodel.CrawlURI
Convenience method for creating a Link with the given string and context, relative to a previously set base HREF
createAndAddLinkRelativeToVia(String, CharSequence, char) - Method in class org.archive.crawler.datamodel.CrawlURI
Convenience method for creating a Link with the given string and context, relative to this CrawlURI's via UURI
createARCFile(File, boolean) - Static method in class org.archive.io.arc.ARCWriterTest
Write an arc file for other tests to use.
createARCRecord(InputStream, long) - Method in class org.archive.io.arc.ARCReader
Create new arc record.
createArcWithOneRecord(String, boolean) - Method in class org.archive.io.arc.ARCWriterTest
 
createARCWriter(String, boolean) - Method in class org.archive.io.arc.ARCWriterTest
 
createAttributeInfo() - Method in class org.archive.configuration.Configuration
 
createCandidateURI(UURI, Link) - Method in class org.archive.crawler.datamodel.CandidateURI
Utility method for creation of CandidateURIs found extracting links from this CrawlURI.
createCandidateURI(UURI, Link, int, boolean) - Method in class org.archive.crawler.datamodel.CandidateURI
Utility method for creation of CandidateURIs found extracting links from this CrawlURI.
createCDXIndexFile(String) - Static method in class org.archive.io.arc.ARCReader
Generate a CDX index file for an ARC file.
createCharSequenceFrom(InputStream, Charset) - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
createCheckpointInProgressDirectory() - Method in class org.archive.crawler.framework.Checkpointer
 
createCompositeType(Map, String, String) - Static method in class org.archive.util.JmxUtils
 
createCrawlJob(CrawlJobHandler, File, String) - Static method in class org.archive.crawler.Heritrix
 
createCrawlJobBasedOn(File, String, String, String) - Method in class org.archive.crawler.Heritrix
 
createdEnvironment - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
createDiskMap(Database, StoredClassCatalog, Class, Class) - Method in class org.archive.util.CachedBdbMap
 
createExtractor() - Method in class org.archive.crawler.extractor.ExtractorHTMLTest
 
createFileLogger(File, String, Logger) - Static method in class org.archive.crawler.util.LogUtils
Creates a file logger that use heritrix.properties file logger configuration.
createFp(CharSequence) - Static method in class org.archive.crawler.util.FPMergeUriUniqFilter
Create a fingerprint from the given key
createHostFor(String) - Method in class org.archive.crawler.datamodel.ServerCache
 
createHQ(String, int) - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
Creates a new AdaptiveRevisitHostQueue.
createKey(CharSequence) - Static method in class org.archive.crawler.util.BdbUriUniqFilter
Create fingerprint.
createLink(String, CharSequence, char) - Method in class org.archive.crawler.datamodel.CrawlURI
Convenience method for creating a Link discovered at this URI with the given string and context
createMBeanInfo(String, String) - Method in class org.archive.configuration.Configuration
Create OpenMBeanInfo instance.
createMultiGzipMembers() - Method in class org.archive.io.GzippedInputStreamTest
 
createNewJob(File, String, String, String, int) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
createOpenMBeanAttributeInfo(OpenType, MBeanAttributeInfo, String) - Static method in class org.archive.util.JmxUtils
 
createOperationInfo() - Method in class org.archive.configuration.Configuration
 
createSeedCandidateURI(UURI) - Static method in class org.archive.crawler.datamodel.CandidateURI
 
createServerFor(String) - Method in class org.archive.crawler.datamodel.ServerCache
 
createSettingsHandler(File, String, String, String, File, CrawlJobErrorHandler, String, String) - Method in class org.archive.crawler.admin.CrawlJobHandler
Creates a new settings handler based on an existing job.
createSocket(String, int, InetAddress, int) - Method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
 
createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
Attempts to get a new socket connection to the given host within the given time limit.
createSocket(String, int) - Method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
 
createSocket(String, int, InetAddress, int) - Method in class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
 
createSocket(String, int) - Method in class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
 
createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
 
createSocket(Socket, String, int, boolean) - Method in class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
 
createUriSet() - Method in class org.archive.crawler.util.MemUriUniqFilter
 
createUriSet() - Method in class org.archive.crawler.util.NoopUriUniqFilter
 
Credential - Class in org.archive.crawler.datamodel.credential
Credential type.
Credential(String, String) - Constructor for class org.archive.crawler.datamodel.credential.Credential
Constructor.
CredentialAvatar - Class in org.archive.crawler.datamodel.credential
A credential representation.
CredentialAvatar(Class, String) - Constructor for class org.archive.crawler.datamodel.credential.CredentialAvatar
Constructor.
CredentialAvatar(Class, String, String) - Constructor for class org.archive.crawler.datamodel.credential.CredentialAvatar
Constructor.
CredentialStore - Class in org.archive.crawler.datamodel
Front door to the credential store.
CredentialStore(String) - Constructor for class org.archive.crawler.datamodel.CredentialStore
Constructor.
CredentialStoreTest - Class in org.archive.crawler.datamodel
Test add, edit, delete from credential store.
CredentialStoreTest() - Constructor for class org.archive.crawler.datamodel.CredentialStoreTest
 
Criteria - Interface in org.archive.crawler.settings.refinements
Superclass for the refinement criteria.
criteriaIterator() - Method in class org.archive.crawler.settings.refinements.Refinement
Get an ListIterator over the criteria set for this refinement.
CSS_BACKSLASH_ESCAPE - Static variable in class org.archive.crawler.extractor.ExtractorCSS
 
CSS_BACKSLASH_ESCAPE - Static variable in class org.archive.extractor.RegexpCSSLinkExtractor
 
CSS_URI_EXTRACTOR - Static variable in class org.archive.crawler.extractor.ExtractorCSS
CSS URL extractor pattern.
CSS_URI_EXTRACTOR - Static variable in class org.archive.extractor.RegexpCSSLinkExtractor
CSS URL extractor pattern.
curi - Variable in class org.archive.crawler.extractor.CrawlUriSWFAction
 
curi - Variable in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
The URI, for logging and error reporting.
current - Variable in class org.archive.crawler.util.BenchmarkUriUniqFilters
 
CURRENT_LOG_SUFFIX - Static variable in class org.archive.crawler.framework.CrawlController
suffix to use on active logs
currentDocsPerSecond - Variable in class org.archive.crawler.admin.StatisticsTracker
 
currentFps - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
currentIterator - Variable in class org.archive.util.iterator.CompositeIterator
 
currentKBPerSec - Variable in class org.archive.crawler.admin.StatisticsTracker
 
currentKey - Variable in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
Strong reference needed to avoid disappearance of key between nextEntry() and any use of the entry
currentProcessedDocsPerSec() - Method in class org.archive.crawler.admin.StatisticsTracker
 
currentProcessedDocsPerSec() - Method in interface org.archive.crawler.framework.StatisticsTracking
Returns an estimate of recent document download rates based on a queue of recently seen CrawlURIs (as of last snapshot).
currentProcessedKBPerSec() - Method in class org.archive.crawler.admin.StatisticsTracker
 
currentProcessedKBPerSec() - Method in interface org.archive.crawler.framework.StatisticsTracking
Calculates an estimate of the rate, in kb, at which documents are currently being processed by the crawler.
currentRecord - Variable in class org.archive.io.arc.ARCReader
The ARCRecord currently being read.
CUSTOM - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
CUSTOM - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
CUSTOM - Static variable in class org.archive.crawler.filter.FilePatternFilter
 
CustomSWFTags - Class in org.archive.crawler.extractor
Overwrite action tags, that may hold URI, to use CrawlUriSWFAction action.
CustomSWFTags(SWFActions) - Constructor for class org.archive.crawler.extractor.CustomSWFTags
 

D

d - Variable in class org.archive.util.BloomFilter32bit
The number of hash functions used by this filter.
d - Variable in class org.archive.util.BloomFilter32bitSplit
The number of hash functions used by this filter.
d - Variable in class org.archive.util.BloomFilter32bp2
The number of hash functions used by this filter.
d - Variable in class org.archive.util.BloomFilter32bp2Split
The number of hash functions used by this filter.
d - Variable in class org.archive.util.BloomFilter64bit
The number of hash functions used by this filter.
DataContainer - Class in org.archive.crawler.settings
This class holds the data for a ComplexType for a settings object.
DataContainer(CrawlerSettings, ComplexType) - Constructor for class org.archive.crawler.settings.DataContainer
Create a data container for a module.
DATE_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for date field.
DATE_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header Date field.
db - Variable in class org.archive.util.CachedBdbMap
The BDB JE database used for this instance.
dbDir - Variable in class org.archive.util.CachedBdbMap.DbEnvironmentEntry
 
DecideRule - Class in org.archive.crawler.deciderules
Interface for rules which, given an object to evaluate, respond with a decision: DecideRule.ACCEPT, DecideRule.REJECT, or DecideRule.PASS.
DecideRule(String) - Constructor for class org.archive.crawler.deciderules.DecideRule
Constructor.
DecideRuleSequence - Class in org.archive.crawler.deciderules
RuleSequence represents a series of Rules, which are applied in turn to give the final result.
DecideRuleSequence(String) - Constructor for class org.archive.crawler.deciderules.DecideRuleSequence
 
DecideRuleSequenceTest - Class in org.archive.crawler.deciderules
 
DecideRuleSequenceTest() - Constructor for class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
DecidingFilter - Class in org.archive.crawler.deciderules
DecidingFilter: a classic Filter which makes its accept/reject decision based on whatever DecideRules have been set up inside it.
DecidingFilter(String, String) - Constructor for class org.archive.crawler.deciderules.DecidingFilter
 
DecidingFilter(String) - Constructor for class org.archive.crawler.deciderules.DecidingFilter
 
DecidingScope - Class in org.archive.crawler.deciderules
DecidingScope: a Scope which makes its accept/reject decision based on whatever DecideRules have been set up inside it, allowing initial experimentation with new model in minimally-changed old framework.
DecidingScope(String) - Constructor for class org.archive.crawler.deciderules.DecidingScope
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.AcceptDecideRule
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.ConfiguredDecideRule
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.DecideRule
Make decision on passed object.
decisionFor(Object) - Method in class org.archive.crawler.deciderules.DecideRuleSequence
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.PredicatedDecideRule
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.PrerequisiteAcceptDecideRule
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.RejectDecideRule
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.SeedAcceptDecideRule
 
decode(char[], String) - Static method in class org.archive.net.LaxURI
 
decode(String, String) - Static method in class org.archive.net.LaxURI
 
decode(String) - Static method in class org.archive.util.Base32
Decodes the given Base32 String to a raw byte array.
decodeUrlLoose(byte[]) - Static method in class org.archive.net.LaxURLCodec
Decodes an array of URL safe 7-bit characters into an array of original bytes.
decrementQueuedCount(long) - Method in class org.archive.crawler.frontier.AbstractFrontier
Note that a number of queued Uris have been deleted.
deepestUri - Variable in class org.archive.crawler.admin.StatisticsTracker
 
deepestUri() - Method in class org.archive.crawler.admin.StatisticsTracker
Ordinal position of the 'deepest' URI eligible for crawling.
deepestUri() - Method in interface org.archive.crawler.framework.Frontier
 
deepestUri() - Method in interface org.archive.crawler.framework.StatisticsTracking
 
deepestUri() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
deepestUri() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT - Static variable in class org.archive.httpclient.ConfigurableX509TrustManager
Default setting for trust level.
DEFAULT - Static variable in class org.archive.net.LaxURLCodec
 
DEFAULT_ALSO_CHECK_VIA - Static variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
DEFAULT_ARC_FILE_PREFIX - Static variable in interface org.archive.io.arc.ARCConstants
Default ARC file prefix.
DEFAULT_ATTR_RECOVERY_ENABLED - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_BALANCE_REPLENISH_AMOUNT - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT_BUFFER_SIZE - Static variable in class org.archive.io.RecyclingFastBufferedOutputStream
The default size of the internal buffer in bytes (16Ki).
DEFAULT_CALCULATE_ROBOTS_ONLY - Static variable in class org.archive.crawler.prefetch.PreconditionEnforcer
whether to calculate robots exclusion without applying
DEFAULT_CAPACITY - Static variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
DEFAULT_CHANGED_FACTOR - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
DEFAULT_CHECK_OUTLINKS - Static variable in class org.archive.crawler.processor.CrawlMapper
 
DEFAULT_CHECK_URI - Static variable in class org.archive.crawler.processor.CrawlMapper
 
DEFAULT_CHECKPOINT_COPY_BDBJE_LOGS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
DEFAULT_COMPRESS - Static variable in interface org.archive.io.arc.ARCConstants
Default as to whether we do compression of ARC files.
DEFAULT_CONTENT_REGEXPR - Static variable in class org.archive.crawler.postprocessor.ContentBasedWaitEvaluator
 
DEFAULT_CONTENT_REGEXPR - Static variable in class org.archive.crawler.postprocessor.ImageWaitEvaluator
 
DEFAULT_CONTENT_REGEXPR - Static variable in class org.archive.crawler.postprocessor.TextWaitEvaluator
 
DEFAULT_COST_POLICY - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT_COUNTRY_CODE - Static variable in class org.archive.crawler.deciderules.ExternalGeoLocationDecideRule
 
DEFAULT_DEFAULT_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
DEFAULT_DELAY_FACTOR - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_DIGEST_METHOD - Static variable in class org.archive.io.arc.ARCReader
 
DEFAULT_DIVERSION_DIR - Static variable in class org.archive.crawler.processor.CrawlMapper
 
DEFAULT_ENCODING - Static variable in class org.archive.crawler.Heritrix
Default encoding.
DEFAULT_ENCODING - Static variable in interface org.archive.io.arc.ARCConstants
Encoding to use getting bytes from strings.
DEFAULT_ERROR_PENALTY_AMOUNT - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT_FORCE_QUEUE - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_FORCE_QUEUE - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
DEFAULT_GROUP_MAX_FETCH_SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_GROUP_MAX_SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_GZIP_HEADER_LENGTH - Static variable in interface org.archive.io.arc.ARCConstants
Length of minimual 'default GZIP header.
DEFAULT_HOLD_QUEUES - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT_HOST_MAX_FETCH_SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_HOST_MAX_SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_INITIAL_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.ImageWaitEvaluator
 
DEFAULT_INITIAL_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.TextWaitEvaluator
 
DEFAULT_INITIAL_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
DEFAULT_LIST_LOGIC - Static variable in class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
 
DEFAULT_LIST_LOGIC - Static variable in class org.archive.crawler.filter.URIListRegExpFilter
 
DEFAULT_LOCAL_NAME - Static variable in class org.archive.crawler.processor.CrawlMapper
 
DEFAULT_MAP_SOURCE - Static variable in class org.archive.crawler.processor.CrawlMapper
 
DEFAULT_MATCH_RETURN_VALUE - Static variable in class org.archive.crawler.filter.URIListRegExpFilter
 
DEFAULT_MAX_ACTIVE - Static variable in class org.archive.io.arc.ARCWriterPool
Default maximum active number of ARCWriters in the pool.
DEFAULT_MAX_ARC_FILE_SIZE - Static variable in interface org.archive.io.arc.ARCConstants
Default maximum ARC file size.
DEFAULT_MAX_DELAY - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_MAX_HOPS - Static variable in class org.archive.crawler.deciderules.TooManyHopsDecideRule
Default access so available to test code.
DEFAULT_MAX_HOST_BANDWIDTH_USAGE - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_MAX_OVERALL_BANDWIDTH_USAGE - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_MAX_PATH_DEPTH - Static variable in class org.archive.crawler.deciderules.TooManyPathSegmentsDecideRule
Default maximum value.
DEFAULT_MAX_PENDING - Static variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
DEFAULT_MAX_RETRIES - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_MAX_SIZE_BYTES - Static variable in class org.archive.crawler.extractor.HTTPContentDigest
 
DEFAULT_MAX_TRANS_HOPS - Static variable in class org.archive.crawler.deciderules.TransclusionDecideRule
Default maximum hops.
DEFAULT_MAX_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
DEFAULT_MAXIMUM_WAIT - Static variable in class org.archive.io.arc.ARCWriterPool
Maximum time to wait on a free ARCWriter.
DEFAULT_MIN_DELAY - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_MIN_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
DEFAULT_MODE - Static variable in class org.archive.crawler.frontier.DomainSensitiveFrontier
 
DEFAULT_MONITOR_MOUNTS - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
 
DEFAULT_PAUSE_AT_FINISH - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_PAUSE_AT_START - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_PAUSE_THRESHOLD - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
 
DEFAULT_PORT - Static variable in class org.archive.crawler.SimpleHttpServer
Default web port.
DEFAULT_PREFERENCE_EMBED_HOPS - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_PROFILE - Static variable in class org.archive.crawler.admin.CrawlJobHandler
Default profile name.
DEFAULT_PROFILE_NAME - Static variable in class org.archive.crawler.admin.CrawlJobHandler
Name of system property whose specification overrides default profile used.
DEFAULT_QUEUE_IGNORE_WWW - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
DEFAULT_QUEUE_TOTAL_BUDGET - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT_REBUILD_ON_RECONFIG - Static variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
DEFAULT_RECHECK_THRESHOLD - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
 
DEFAULT_REGULAR_EXPRESSION - Static variable in class org.archive.crawler.processor.Test
 
DEFAULT_REPETITIONS - Static variable in class org.archive.crawler.deciderules.PathologicalPathDecideRule
Default maximum repetitions.
DEFAULT_REPETITIONS - Static variable in class org.archive.crawler.filter.PathologicalPathFilter
 
DEFAULT_REREAD_SEEDS_ON_CONFIG - Static variable in class org.archive.crawler.framework.CrawlScope
 
DEFAULT_RETRY_DELAY - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_ROTATION_DIGITS - Static variable in class org.archive.crawler.processor.CrawlMapper
 
DEFAULT_SERVER_MAX_FETCH_SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_SERVER_MAX_SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_SMEAR - Static variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
DEFAULT_SNOOZE_DEACTIVATE_MS - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT_STATISTICS_REPORT_INTERVAL - Static variable in class org.archive.crawler.framework.AbstractTracker
Default period between logging stat values
DEFAULT_STRIP_REG_EXPR - Static variable in class org.archive.crawler.extractor.HTTPContentDigest
 
DEFAULT_TOE_PRIORITY - Static variable in class org.archive.crawler.framework.ToePool
run worker thread slightly lower than usual
DEFAULT_UNCHANGED_FACTOR - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
DEFAULT_USE_OVERDUE_TIME - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
DEFAULT_USE_URI_UNIQ_FILTER - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
deferredHosts() - Method in interface org.archive.crawler.framework.FrontierHostStatistics
Total number of deferred hosts.
definitionMap - Variable in class org.archive.crawler.settings.ComplexType
 
delete(CrawlURI) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Delete the given CrawlURI from persistent store.
deleted(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that a CrawlURI has been deleted outside of the normal next()/finished() lifecycle.
deleted(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
deleted(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Force logging, etc.
deleteDir(File) - Static method in class org.archive.util.FileUtils
Deletes all files and subdirectories under dir.
deleteInProcessing(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Removes a URI from the list of URIs belonging to this HQ and are currently being processed.
deleteItem(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
deleteItem(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Removes the given item from the queue.
deleteJob(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
The specified job will be removed from the pending queue or aborted if currently running.
deleteMatchedItems(Predicate) - Method in class org.archive.queue.MemQueue
 
deleteMatchedItems(Predicate) - Method in interface org.archive.queue.Queue
All objects in the queue where matcher.match(object) returns true will be deleted from the queue.
deleteMatching(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.WorkQueue
Delete URIs matching the given pattern from this queue.
deleteMatchingFromQueue(String, String, DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Delete all CrawlURIs matching the given expression.
deleteMatchingFromQueue(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
deleteMatchingFromQueue(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.WorkQueue
Delete URIs matching the given pattern from this queue.
deleteProfile(CrawlJob) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
deleteSettingsObject(CrawlerSettings) - Method in class org.archive.crawler.settings.SettingsCache
Delete a settings object from the cache.
deleteSettingsObject(CrawlerSettings) - Method in class org.archive.crawler.settings.SettingsHandler
Delete a settings object from persistent storage.
deleteSettingsObject(CrawlerSettings) - Method in class org.archive.crawler.settings.XMLSettingsHandler
Delete a settings object from persistent storage.
deleteURIs(String) - Method in interface org.archive.crawler.framework.Frontier
Delete any URI that matches the given regular expression from the list of discovered and pending URIs.
deleteURIs(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
deleteURIs(String) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
deleteURIsFromPending(String) - Method in class org.archive.crawler.admin.CrawlJob
Delete any URI from the frontier of the current (paused) job that match the specified regular expression.
deleteURIsFromPending(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
Delete any URI from the frontier of the current (paused) job that match the specified regular expression.
DENYALL - Static variable in class org.archive.crawler.datamodel.RobotsExclusionPolicy
 
Deque - Interface in org.archive.queue
Double-ended queue which supports add at either end, remove from only the 'head' end.
dequeue(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
Remove the peekItem from the queue and adjusts the count.
dequeue() - Method in class org.archive.queue.MemQueue
 
dequeue() - Method in interface org.archive.queue.Queue
remove an entry from the start of the queue
deRegister(Object) - Method in interface org.archive.configuration.Registry
Unregister named object.
deRegister(String, Class<?>) - Method in interface org.archive.configuration.Registry
Unregister named object.
deRegister(String, Class<?>, String) - Method in interface org.archive.configuration.Registry
Unregister named object.
deRegister(Object) - Method in class org.archive.configuration.registry.JmxRegistry
 
deRegister(String, Class, String) - Method in class org.archive.configuration.registry.JmxRegistry
 
deRegister(String, Class) - Method in class org.archive.configuration.registry.JmxRegistry
 
deregisterJndi(ObjectName) - Static method in class org.archive.crawler.Heritrix
 
deserializeAlreadySeen(Class, File) - Method in class org.archive.crawler.frontier.BdbFrontier
 
destroy() - Method in class org.archive.crawler.admin.ui.RootFilter
 
destroy() - Method in class org.archive.crawler.Heritrix
Do inverse of construction.
detach(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
Detach this credential from passed curi.
detachAll(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
Detach all credentials of this type from passed curi.
DevUtils - Class in org.archive.util
Write a message and stack trace to the 'org.archive.util.DevUtils' logger.
DevUtils() - Constructor for class org.archive.util.DevUtils
 
disallows(CrawlURI, String) - Method in class org.archive.crawler.datamodel.RobotsExclusionPolicy
 
discardNewJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
Discard the handler's 'new job'.
discoveredUriCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
discoveredUriCount() - Method in class org.archive.crawler.admin.StatisticsTracker
Number of discovered URIs.
discoveredUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of discovered URIs.
discoveredUriCount() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
discoveredUriCount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
(non-Javadoc)
DiskFPMergeUriUniqFilter - Class in org.archive.crawler.util
Crude FPMergeUriUniqFilter using a disk data file of raw longs as the overall FP record.
DiskFPMergeUriUniqFilter(File) - Constructor for class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
DiskFPMergeUriUniqFilter.DataFileLongIterator - Class in org.archive.crawler.util
 
DiskFPMergeUriUniqFilter.DataFileLongIterator(DataInputStream) - Constructor for class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
Construct a long iterator reading from the given stream.
diskMap - Variable in class org.archive.util.CachedBdbMap
The Collection view of the BDB JE database used for this instance.
diskMapSize - Variable in class org.archive.util.CachedBdbMap
The number of objects stored in the BDB JE database.
disregardDisposition(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
disregardedFetchAttempts() - Method in class org.archive.crawler.admin.StatisticsTracker
Get the total number of failed fetch attempts (connection failures -> give up, etc)
disregardedUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs that were scheduled at one point but have been disregarded.
disregardedUriCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
disregardedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
disregardedUriCount() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
diversionLogs - Variable in class org.archive.crawler.processor.CrawlMapper
Mapping of target crawlers to logs (PrintWriters)
divertLog(CandidateURI, String) - Method in class org.archive.crawler.processor.CrawlMapper
Note the given CandidateURI in the appropriate diversion log.
DNSJavaUtil - Class in org.archive.util
Utility methods based on DNSJava.
doAbort(CrawlURI, HttpMethod, String) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
doCmdLineArgs(String[]) - Static method in class org.archive.crawler.Heritrix
 
docsPerSecond - Variable in class org.archive.crawler.admin.StatisticsTracker
 
document - Variable in class org.archive.crawler.extractor.PDFParser
 
documentReader - Variable in class org.archive.crawler.extractor.PDFParser
 
doFilter(ServletRequest, ServletResponse, FilterChain) - Method in class org.archive.crawler.admin.ui.RootFilter
 
doFlush() - Method in class org.archive.crawler.admin.CrawlJobHandler
If its a HostQueuesFrontier, needs to be flushed for the queued.
doGetFileUrl(File) - Method in class org.archive.io.arc.ARCReaderFactoryTest
 
doJournalAdded(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalEmitted(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalFinishedFailure(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalFinishedSuccess(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalRescheduled(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
DOMAIN - Static variable in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
 
DomainScope - Class in org.archive.crawler.scope
A core CrawlScope suitable for the most common crawl needs.
DomainScope(String) - Constructor for class org.archive.crawler.scope.DomainScope
 
DomainScopeTest - Class in org.archive.crawler.scope
Test the domain scope focus filter.
DomainScopeTest() - Constructor for class org.archive.crawler.scope.DomainScopeTest
 
DomainSensitiveFrontier - Class in org.archive.crawler.frontier
Behaves like BdbFrontier (i.e., a basic mostly breadth-first frontier), but with the addition that you can set the number of documents to download on a per site basis.
DomainSensitiveFrontier(String) - Constructor for class org.archive.crawler.frontier.DomainSensitiveFrontier
 
DONT_SCHEDULE - Static variable in class org.archive.crawler.datamodel.CandidateURI
Marks URI as not schedulable.
doOneCrawl(String) - Method in class org.archive.crawler.Heritrix
Launch the crawler without a web UI and run the passed crawl only.
doOneCrawl(String, CrawlStatusListener) - Method in class org.archive.crawler.Heritrix
Launch the crawler without a web UI and run passed crawl only.
doStripRegexMatch(String, Matcher) - Method in class org.archive.crawler.url.canonicalize.BaseRule
Run a regex that strips elements of a string.
DOT - Static variable in class org.archive.crawler.scope.DomainScope
 
DOT - Static variable in class org.archive.net.UURIFactory
 
DOT - Static variable in class org.archive.util.SURT
 
DOT_ARC_FILE_EXTENSION - Static variable in interface org.archive.io.arc.ARCConstants
Dot ARC file extension.
DOT_COMPRESSED_ARC_FILE_EXTENSION - Static variable in interface org.archive.io.arc.ARCConstants
Compressed dot arc file extension.
DOT_COMPRESSED_FILE_EXTENSION - Static variable in interface org.archive.io.arc.ARCConstants
Dot plus compressed file extention.
DOUBLE - Static variable in class org.archive.crawler.settings.SettingsHandler
 
DOUBLE_LIST - Static variable in class org.archive.crawler.settings.SettingsHandler
 
DoubleList - Class in org.archive.crawler.settings
List of Double values
DoubleList(String, String) - Constructor for class org.archive.crawler.settings.DoubleList
Creates a new DoubleList.
DoubleList(String, String, DoubleList) - Constructor for class org.archive.crawler.settings.DoubleList
Creates a new DoubleList and initializes it with the values from another DoubleList.
DoubleList(String, String, Double[]) - Constructor for class org.archive.crawler.settings.DoubleList
Creates a new DoubleList and initializes it with the values from an array of Doubles.
DoubleList(String, String, double[]) - Constructor for class org.archive.crawler.settings.DoubleList
Creates a new DoubleList and initializes it with the values from an double array.
doubleToString(double, int) - Static method in class org.archive.util.ArchiveUtils
Converts a double to a string.
downloadDisregards - Variable in class org.archive.crawler.admin.StatisticsTracker
 
downloadedUriCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
downloadFailures - Variable in class org.archive.crawler.admin.StatisticsTracker
 
drainBuffer - Variable in class org.archive.io.RecordingInputStream
Reusable buffer to avoid reallocation on each readFullyUntil
dumpOutput(ARCReader, boolean) - Static method in class org.archive.io.arc.ARCReader
 
dumpReports() - Method in class org.archive.crawler.admin.StatisticsTracker
Run the reports.
dumpReports() - Method in class org.archive.crawler.framework.AbstractTracker
Dump reports, if any, on request or at crawl end.
dumpSurtPrefixSet() - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Dump the current prefixes in use to configured dump file (if any)
duplicateCount - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
 
duplicatesAtLastSample - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
 

E

EACH_ATTRIBUTE_EXTRACTOR - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
EACH_ATTRIBUTE_EXTRACTOR - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
earlyInitialize(CrawlerSettings) - Method in class org.archive.crawler.settings.ComplexType
This method can be overridden in subclasses to do local initialisation.
element - Variable in exception org.archive.crawler.framework.exceptions.ConfigurationException
 
elementContext(CharSequence, CharSequence) - Static method in class org.archive.crawler.extractor.Link
Create a suitable XPath-like context from an element name and optional attribute name.
EMBED_HOP - Static variable in class org.archive.crawler.extractor.Link
embedded links necessary to render the page, like IMG/@SRC
EMBED_MISC - Static variable in class org.archive.crawler.extractor.Link
stand-in value for embeds without other context
emitted(CrawlURI) - Method in interface org.archive.crawler.frontier.FrontierJournal
Note that a CrawlURI was emitted for processing.
emitted(CrawlURI) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
EMPTY - Static variable in class org.archive.util.AbstractLongFPSet
A constant used to indicate that a slot in the set storage is empty.
EMPTY_STRING - Static variable in class org.archive.net.UURIFactory
 
encode(BitSet, String, String) - Method in class org.archive.net.LaxURLCodec
Encodes a string into its URL safe form using the specified string charset.
encode(byte[]) - Static method in class org.archive.util.Base32
Encodes byte array to Base32 String.
encounteredReferences - Variable in class org.archive.crawler.extractor.PDFParser
 
end - Variable in class org.archive.io.CharSubSequence
 
END_TRANSFORMED_AUTHORITY - Static variable in class org.archive.util.SURT
 
endDocument() - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
 
EndedException - Exception in org.archive.crawler.framework.exceptions
Indicates a crawl has ended, either due to operator termination, frontier exhaustion, or any other reason.
EndedException(String) - Constructor for exception org.archive.crawler.framework.exceptions.EndedException
 
endElement(String, String, String) - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
End of an element.
endsWith(char) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
Tests if this string ends with a character.
enqueue(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Add the given CrawlURI, noting its addition in running count.
enqueue(Object) - Method in class org.archive.queue.MemQueue
 
enqueue(Object) - Method in interface org.archive.queue.Queue
Add an entry to the end of queue
ensureNewJobWritten(CrawlJob, String, String) - Static method in class org.archive.crawler.admin.CrawlJobHandler
Ensure order file with new name/desc is written.
ensureWriteableDirectory(String) - Static method in class org.archive.util.IoUtils
Ensure writeable directory.
ensureWriteableDirectory(List) - Static method in class org.archive.util.IoUtils
Ensure writeable directories.
ensureWriteableDirectory(File) - Static method in class org.archive.util.IoUtils
Ensure writeable directory.
entry - Variable in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
ENTRY - Static variable in class org.archive.util.iterator.RegexpLineIterator
 
entrySet() - Method in class org.archive.util.CachedBdbMap
 
entryString(Object) - Static method in class org.archive.util.Histotable
Utility method to convert a key->LongWrapper(count) into the string "count key".
environment - Variable in class org.archive.util.CachedBdbMap.DbEnvironmentEntry
 
eq(Object, Object) - Static method in class org.archive.crawler.settings.SoftSettingsHash
Check for equality of non-null reference x and possibly-null y.
equals(Object) - Method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
All instances of DefaultProtocolSocketFactory are the same.
equals(Object) - Method in class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
 
equals(Object) - Method in class org.archive.crawler.settings.refinements.Refinement
 
equals(Object) - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
 
equals(Object) - Method in class org.archive.crawler.settings.SoftSettingsHash.SettingsEntry
 
equals(Object) - Method in class org.archive.crawler.settings.TextField
 
equals(Object) - Method in class org.archive.crawler.settings.Type
The implementation of equals consider to Types as equal if name and value are equal.
equals(long) - Method in class org.archive.io.SinkHandlerLogRecord
 
equals(SinkHandlerLogRecord) - Method in class org.archive.io.SinkHandlerLogRecord
 
equals(Object) - Method in class org.archive.net.UURI
Test an object if this UURI is equal to another.
errors - Variable in class org.archive.crawler.admin.CrawlJobErrorHandler
All encountered errors
escape(String) - Static method in class org.archive.util.JavaLiterals
 
ESCAPED_AMP - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
ESCAPED_AMP - Static variable in class org.archive.crawler.extractor.ExtractorJS
 
ESCAPED_AMP - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
ESCAPED_AMP - Static variable in class org.archive.extractor.RegexpJSLinkExtractor
 
ESCAPED_APOSTROPH - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_BACKSLASH - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_CIRCUMFLEX - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_LCURBRACKET - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_LSQRBRACKET - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_PIPE - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_QUOT - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_RCURBRACKET - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_RSQRBRACKET - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_SPACE - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_SQUOT - Static variable in class org.archive.net.UURIFactory
 
escapeForHTML(String) - Static method in class org.archive.util.TextUtils
Escapes a string so that it can be placed inside XML/HTML attribute.
escapeForJavascript(String) - Static method in class org.archive.util.TextUtils
Escapes a string so that it can be passed as an argument to a javscript in a JSP page.
escapeForMarkupAttribute(String) - Static method in class org.archive.util.TextUtils
Escapes a string so that it can be placed inside XML/HTML attribute.
escapeWhitespace(String) - Method in class org.archive.net.UURIFactory
Escape any whitespace found.
evaluate(Object) - Method in class org.archive.crawler.deciderules.AddRedirectFromRootServerToScope
 
evaluate(Object) - Method in class org.archive.crawler.deciderules.ClassKeyMatchesRegExpDecideRule
Evaluate passed object.
evaluate(Object) - Method in class org.archive.crawler.deciderules.ExternalGeoLocationDecideRule
 
evaluate(Object) - Method in class org.archive.crawler.deciderules.ExternalImplDecideRule
 
evaluate(Object) - Method in interface org.archive.crawler.deciderules.ExternalImplInterface
 
evaluate(Object) - Method in class org.archive.crawler.deciderules.HopsPathMatchesRegExpDecideRule
Evaluate whether given object (if CandidateURI) has hops-path matching configured regexp
evaluate(Object) - Method in class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
Evaluate whether given object's string version matches configured regexps
evaluate(Object) - Method in class org.archive.crawler.deciderules.MatchesRegExpDecideRule
Evaluate whether given object's string version matches configured regexp
evaluate(Object) - Method in class org.archive.crawler.deciderules.NotMatchesFilePatternDecideRule
Evaluate whether given object's string version does not match configured regexp (by reversing the superclass's answer).
evaluate(Object) - Method in class org.archive.crawler.deciderules.NotMatchesListRegExpDecideRule
Evaluate whether given object's string version does not match configured regexps (by reversing the superclass's answer).
evaluate(Object) - Method in class org.archive.crawler.deciderules.NotMatchesRegExpDecideRule
Evaluate whether given object's string version does not match configured regexp (by reversing the superclass's answer).
evaluate(Object) - Method in class org.archive.crawler.deciderules.NotOnDomainsDecideRule
Evaluate whether given object's URI is NOT in the set of domains -- simply reverse superclass's determination
evaluate(Object) - Method in class org.archive.crawler.deciderules.NotOnHostsDecideRule
Evaluate whether given object's URI is NOT in the set of hosts -- simply reverse superclass's determination
evaluate(Object) - Method in class org.archive.crawler.deciderules.NotSurtPrefixedDecideRule
Evaluate whether given object's URI is NOT in the SURT prefix set -- simply reverse superclass's determination
evaluate(Object) - Method in class org.archive.crawler.deciderules.PredicatedDecideRule
 
evaluate(Object) - Method in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
Evaluate whether given object comes from a URI which is in scope
evaluate(Object) - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Evaluate whether given object's URI is covered by the SURT prefix set
evaluate(Object) - Method in class org.archive.crawler.deciderules.TooManyHopsDecideRule
Evaluate whether given object is over the threshold number of hops.
evaluate(Object) - Method in class org.archive.crawler.deciderules.TooManyPathSegmentsDecideRule
Evaluate whether given object is over the threshold number of path-segments.
evaluate(Object) - Method in class org.archive.crawler.deciderules.TransclusionDecideRule
Evaluate whether given object is within the threshold number of transitive hops.
evaluate(Object) - Method in class org.archive.util.Inverter
 
exceedsMaxHops(Object) - Method in class org.archive.crawler.scope.ClassicScope
Check if there are too many hops
exception - Variable in class org.archive.crawler.datamodel.LocalizedError
 
exceptionNext() - Method in class org.archive.io.arc.ARCReader.ARCRecordIterator
A next that throws exceptions and has handling of recoverable exceptions moving us to next record.
exceptionToString(String, Throwable) - Static method in class org.archive.util.TextUtils
 
excludeAccepts(Object) - Method in class org.archive.crawler.scope.ClassicScope
Check if URI is excluded by any filters.
exec(String[]) - Static method in class org.archive.util.ProcessUtils
Runs process.
execute(HttpState, HttpConnection) - Method in class org.archive.httpclient.HttpRecorderGetMethod
 
execute(HttpState, HttpConnection) - Method in class org.archive.httpclient.HttpRecorderPostMethod
 
EXISTS_CASE_INSENSITIVE_MATCH - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
existsMaybeCaseSensitive return code for a file that exists, using a case-insensitive comparison.
EXISTS_EXACT_MATCH - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
existsMaybeCaseSensitive return code for a file that exists.
EXISTS_NOT - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
existsMaybeCaseSensitive return code for a file that does not exist.
existsMaybeCaseSensitive(File, String, File) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
Checks if a file (including directories) exists.
EXPANDED_URI_SAFE - Static variable in class org.archive.net.LaxURLCodec
A more expansive set of ASCII URI characters to consider as 'safe' to leave unencoded, based on actual browser behavior.
expected_n - Variable in class org.archive.crawler.util.BloomUriUniqFilter
 
EXPECTED_SIZE_KEY - Static variable in class org.archive.crawler.util.BloomUriUniqFilter
 
expectedModCount - Variable in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
expend(int) - Method in class org.archive.crawler.frontier.WorkQueue
Decrease the internal running budget by the given amount.
exportTo(FileWriter) - Method in class org.archive.util.SurtPrefixSet
 
ExternalGeoLocationDecideRule - Class in org.archive.crawler.deciderules
A rule that can be configured to take alternate implementations of the ExternalGeoLocationInterface.
ExternalGeoLocationDecideRule(String) - Constructor for class org.archive.crawler.deciderules.ExternalGeoLocationDecideRule
 
ExternalGeoLookupInterface - Interface in org.archive.crawler.deciderules
Interface used by ExternalImplDecideRule.
ExternalImplDecideRule - Class in org.archive.crawler.deciderules
A rule that can be configured to take alternate implementations of the ExternalImplInterface.
ExternalImplDecideRule(String) - Constructor for class org.archive.crawler.deciderules.ExternalImplDecideRule
 
ExternalImplInterface - Interface in org.archive.crawler.deciderules
Interface used by ExternalImplDecideRule.
extract(CrawlURI) - Method in class org.archive.crawler.extractor.Extractor
 
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorCSS
 
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorDOC
Processes a word document and extracts any hyperlinks from it.
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorHTML
 
extract(CrawlURI, CharSequence) - Method in class org.archive.crawler.extractor.ExtractorHTML
Run extractor.
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorJS
 
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorPDF
 
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorSWF
 
extract(String) - Method in class org.archive.crawler.extractor.ExtractorTool
 
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorUniversal
 
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorXML
 
extract(CharSequence, UURI, UURI, List, ExtractErrorListener) - Static method in class org.archive.extractor.CharSequenceLinkExtractor
Convenience method to do default extraction.
extractAddress(ObjectName) - Static method in class org.archive.util.JmxUtils
 
extractErrorListener - Variable in class org.archive.extractor.CharSequenceLinkExtractor
 
ExtractErrorListener - Interface in org.archive.extractor
ExtractErrorListener receives exceptions that may need to be logged from inside a LinkExtractor, allowing the extraction to continue without raising an exception through hasNext()/next()/nextLink().
extractInlineCss - Variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
extractInlineJs - Variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
extractLine - Variable in class org.archive.util.iterator.RegexpLineIterator
 
Extractor - Class in org.archive.crawler.extractor
Convenience shared superclass for Extractor Processors.
Extractor(String, String) - Constructor for class org.archive.crawler.extractor.Extractor
Passthrough constructor.
ExtractorCSS - Class in org.archive.crawler.extractor
This extractor is parsing URIs from CSS type files.
ExtractorCSS(String) - Constructor for class org.archive.crawler.extractor.ExtractorCSS
 
ExtractorDOC - Class in org.archive.crawler.extractor
This class allows the caller to extract href style links from word97-format word documents.
ExtractorDOC(String) - Constructor for class org.archive.crawler.extractor.ExtractorDOC
 
ExtractorHTML - Class in org.archive.crawler.extractor
Basic link-extraction, from an HTML content-body, using regular expressions.
ExtractorHTML(String) - Constructor for class org.archive.crawler.extractor.ExtractorHTML
 
ExtractorHTML(String, String) - Constructor for class org.archive.crawler.extractor.ExtractorHTML
 
ExtractorHTMLTest - Class in org.archive.crawler.extractor
Test html extractor.
ExtractorHTMLTest() - Constructor for class org.archive.crawler.extractor.ExtractorHTMLTest
 
ExtractorHTTP - Class in org.archive.crawler.extractor
Extracts URIs from HTTP response headers.
ExtractorHTTP(String) - Constructor for class org.archive.crawler.extractor.ExtractorHTTP
 
ExtractorJS - Class in org.archive.crawler.extractor
Processes Javascript files for strings that are likely to be crawlable URIs.
ExtractorJS(String) - Constructor for class org.archive.crawler.extractor.ExtractorJS
 
ExtractorPDF - Class in org.archive.crawler.extractor
Allows the caller to process a CrawlURI representing a PDF for the purpose of extracting URIs
ExtractorPDF(String) - Constructor for class org.archive.crawler.extractor.ExtractorPDF
 
ExtractorSWF - Class in org.archive.crawler.extractor
Extracts URIs from SWF (flash/shockwave) files.
ExtractorSWF(String) - Constructor for class org.archive.crawler.extractor.ExtractorSWF
 
ExtractorTool - Class in org.archive.crawler.extractor
Run named extractors against passed ARC file.
ExtractorTool() - Constructor for class org.archive.crawler.extractor.ExtractorTool
 
ExtractorTool(String[], String) - Constructor for class org.archive.crawler.extractor.ExtractorTool
 
ExtractorUniversal - Class in org.archive.crawler.extractor
A last ditch extractor that will look at the raw byte code and try to extract anything that looks like a link.
ExtractorUniversal(String) - Constructor for class org.archive.crawler.extractor.ExtractorUniversal
Constructor
ExtractorXML - Class in org.archive.crawler.extractor
A simple extractor which finds HTTP URIs inside XML/RSS files, inside attribute values and simple elements (those with only whitespace + HTTP URI + whitespace as contents)
ExtractorXML(String) - Constructor for class org.archive.crawler.extractor.ExtractorXML
 
extractURIs() - Method in class org.archive.crawler.extractor.PDFParser
Extract URIs from all objects found in a Pdf document's catalog.
extractURIs(PdfObject) - Method in class org.archive.crawler.extractor.PDFParser
Parse a PdfDictionary, looking for URIs recursively and adding them to foundURIs
extraInfo() - Static method in class org.archive.util.DevUtils
 

F

F_ADD - Static variable in class org.archive.crawler.frontier.RecoveryJournal
 
F_EMIT - Static variable in class org.archive.crawler.frontier.RecoveryJournal
 
F_FAILURE - Static variable in class org.archive.crawler.frontier.RecoveryJournal
 
F_RESCHEDULE - Static variable in class org.archive.crawler.frontier.RecoveryJournal
 
F_SUCCESS - Static variable in class org.archive.crawler.frontier.RecoveryJournal
 
failedFetchAttempts() - Method in class org.archive.crawler.admin.StatisticsTracker
Get the total number of failed fetch attempts (connection failures -> give up, etc)
failedFetchCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs that failed to process.
failedFetchCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
failedFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
(non-Javadoc)
failedFetchCount() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
failureDisposition(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
The CrawlURI has encountered a problem, and will not be retried.
fastOutputStreamHolder - Variable in class org.archive.crawler.frontier.RecyclingSerialBinding
Thread-local cache of reusable FastOutputStream
FatalConfigurationException - Exception in org.archive.crawler.framework.exceptions
 
FatalConfigurationException(String) - Constructor for exception org.archive.crawler.framework.exceptions.FatalConfigurationException
 
FatalConfigurationException() - Constructor for exception org.archive.crawler.framework.exceptions.FatalConfigurationException
 
FatalConfigurationException(String, String, String) - Constructor for exception org.archive.crawler.framework.exceptions.FatalConfigurationException
 
FetchDNS - Class in org.archive.crawler.fetcher
Processor to resolve 'dns:' URIs.
FetchDNS(String) - Constructor for class org.archive.crawler.fetcher.FetchDNS
Create a new instance of FetchDNS.
FetchHTTP - Class in org.archive.crawler.fetcher
HTTP fetcher that uses Apache Jakarta Commons HttpClient library.
FetchHTTP(String) - Constructor for class org.archive.crawler.fetcher.FetchHTTP
Constructor.
FetchHTTP.PostRestore - Class in org.archive.crawler.fetcher
 
FetchHTTP.PostRestore(Cookie[]) - Constructor for class org.archive.crawler.fetcher.FetchHTTP.PostRestore
 
fetchNonResponses - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
fetchResponses - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
FetchStatusCodes - Interface in org.archive.crawler.datamodel
Constant flag codes to be used, in lieu of per-protocol codes (like HTTP's 200, 404, etc.), when network/internal/ out-of-band conditions occur.
fetchStatusCodesToString(int) - Static method in class org.archive.crawler.datamodel.CrawlURI
Takes a status code and converts it into a human readable string.
fetchSuccesses - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
file - Variable in exception org.archive.crawler.framework.exceptions.ConfigurationException
 
file - Variable in class org.archive.crawler.processor.CrawlMapper.FilePrintWriter
 
fileExists(File) - Method in class org.archive.crawler.selftest.SelfTestCase
Confirm passed file exists on disk under the test directory.
FILENAME_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for filename field.
FILENAME_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header filename field.
filenames - Variable in class org.archive.io.CompositeFileInputStream
 
FilePatternFilter - Class in org.archive.crawler.filter
Compares suffix of a passed CrawlURI, UURI, or String against a regular expression pattern accepting matches.
FilePatternFilter(String) - Constructor for class org.archive.crawler.filter.FilePatternFilter
 
FilePatternFilterTest - Class in org.archive.crawler.filter
Tests FilePatternFilter default pattern (all default file extension) and separate subgroups patterns such as images, audio, video, and miscellaneous groups.
FilePatternFilterTest() - Constructor for class org.archive.crawler.filter.FilePatternFilterTest
 
filesExist(List) - Method in class org.archive.crawler.selftest.SelfTestCase
Confirm passed files exist on disk under the test directory.
filesFoundInArc() - Method in class org.archive.crawler.selftest.SelfTestCase
Find all files that belong to this test that are mentioned in the arc.
FileUtils - Class in org.archive.util
Utility methods for manipulating files and directories.
FileUtilsTest - Class in org.archive.util
 
FileUtilsTest() - Constructor for class org.archive.util.FileUtilsTest
 
fillSeedsCache() - Method in class org.archive.crawler.scope.SeedCachingScope
Ensure seeds cache is created/filled
filter - Variable in class org.archive.crawler.filter.FilePatternFilterTest
 
Filter - Class in org.archive.crawler.framework
Base class for filter classes.
Filter(String, String) - Constructor for class org.archive.crawler.framework.Filter
Creates a new 'null' filter.
Filter(String) - Constructor for class org.archive.crawler.framework.Filter
Creates a new 'null' filter.
FILTERS - Static variable in class org.archive.crawler.admin.ui.JobConfigureUtils
 
filtersAccept(CrawlURI) - Method in class org.archive.crawler.framework.Processor
Do all specified filters (if any) accept this CrawlURI?
filtersAccept(MapType, CrawlURI) - Method in class org.archive.crawler.framework.Processor
Do all specified filters (if any) accept this CrawlURI?
finalCleanup() - Method in class org.archive.crawler.admin.StatisticsTracker
 
finalCleanup() - Method in class org.archive.crawler.framework.AbstractTracker
Cleanup resources used, at crawl end.
finalize() - Method in class org.archive.crawler.SimpleHttpServer
 
finalize() - Method in class org.archive.io.arc.ARCReader
 
finalize() - Method in class org.archive.util.CachedBdbMap
 
finalTasks() - Method in class org.archive.crawler.fetcher.FetchHTTP
 
finalTasks() - Method in class org.archive.crawler.framework.Processor
Classes subclassing this one should override this method to perform processor specific actions.
findFirstLineBeginning(InputStreamReader, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that that begins with the given string.
findFirstLineBeginningFromSeries(String, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that begins with the given string.
findFirstLineContaining(String, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that matches a given regular expression.
findFirstLineContaining(InputStreamReader, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that matches a given regular expression.
findFirstLineContainingFromSeries(String, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that matches a given regular expression.
findNextLink() - Method in class org.archive.extractor.CharSequenceLinkExtractor
Scan to the next link(s), if any, loading it into the next buffer.
findNextLink() - Method in class org.archive.extractor.RegexpCSSLinkExtractor
 
findNextLink() - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
findNextLink() - Method in class org.archive.extractor.RegexpJSLinkExtractor
 
finished(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Report a URI being processed as having finished processing.
finished(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
finished(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Note that the previously emitted CrawlURI has completed its processing (for now).
finishedFailure(UURI) - Method in interface org.archive.crawler.frontier.FrontierJournal
 
finishedFailure(CrawlURI) - Method in interface org.archive.crawler.frontier.FrontierJournal
 
finishedFailure(CrawlURI) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
finishedFailure(UURI) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
finishedFailure(String) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
finishedSuccess(CrawlURI) - Method in interface org.archive.crawler.frontier.FrontierJournal
 
finishedSuccess(UURI) - Method in interface org.archive.crawler.frontier.FrontierJournal
 
finishedSuccess(CrawlURI) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
finishedSuccess(UURI) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
finishedSuccess(String) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
finishedUriCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
finishedUriCount() - Method in class org.archive.crawler.admin.StatisticsTracker
Number of URIs that have finished processing.
finishedUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs that have finished processing.
finishedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
(non-Javadoc)
finishedUriCount() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
finishFpMerge() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
finishFpMerge() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Complete the merge of candidate and previously-known FPs (closing files/iterators as appropriate).
finishFpMerge() - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
finishLast(HttpConnection) - Static method in class org.archive.httpclient.SingleHttpConnectionManager
 
fireCrawledURIDisregardEvent(CrawlURI) - Method in class org.archive.crawler.framework.CrawlController
Allows an external class to raise a CrawlURIDispostion crawledURIDisregard event that will be broadcast to all listeners that have registered with the CrawlController.
fireCrawledURIFailureEvent(CrawlURI) - Method in class org.archive.crawler.framework.CrawlController
Allows an external class to raise a CrawlURIDispostion crawledURIFailure event that will be broadcast to all listeners that have registered with the CrawlController.
fireCrawledURINeedRetryEvent(CrawlURI) - Method in class org.archive.crawler.framework.CrawlController
Allows an external class to raise a CrawlURIDispostion crawledURINeedRetry event that will be broadcast to all listeners that have registered with the CrawlController.
fireCrawledURISuccessfulEvent(CrawlURI) - Method in class org.archive.crawler.framework.CrawlController
Allows an external class to raise a CrawlURIDispostion crawledURISuccessful event that will be broadcast to all listeners that have registered with the CrawlController.
fireValueErrorHandlers(Constraint.FailedCheck) - Method in class org.archive.crawler.settings.SettingsHandler
Fire events on all registered ValueErrorHandler.
fixSpaceInMetadataLine(List, int) - Method in class org.archive.io.arc.ARCReader
Fix space in URLs.
FixupQueryStr - Class in org.archive.crawler.url.canonicalize
Strip any trailing question mark.
FixupQueryStr(String) - Constructor for class org.archive.crawler.url.canonicalize.FixupQueryStr
 
FixupQueryStrTest - Class in org.archive.crawler.url.canonicalize
Test we strip trailing question mark.
FixupQueryStrTest() - Constructor for class org.archive.crawler.url.canonicalize.FixupQueryStrTest
 
FlashParseSelfTest - Class in org.archive.crawler.selftest
Simple selftest for flash extractor.
FlashParseSelfTest() - Constructor for class org.archive.crawler.selftest.FlashParseSelfTest
 
flattenVia() - Method in class org.archive.crawler.datamodel.CandidateURI
Method returns string version of this URI's referral URI.
flg - Variable in class org.archive.io.GzipHeader
The GZIP header FLG byte.
FLOAT - Static variable in class org.archive.crawler.settings.SettingsHandler
 
FLOAT_LIST - Static variable in class org.archive.crawler.settings.SettingsHandler
 
FloatList - Class in org.archive.crawler.settings
List of Float values
FloatList(String, String) - Constructor for class org.archive.crawler.settings.FloatList
Creates a new FloatList.
FloatList(String, String, FloatList) - Constructor for class org.archive.crawler.settings.FloatList
Creates a new FloatList and initializes it with the values from another FloatList.
FloatList(String, String, Float[]) - Constructor for class org.archive.crawler.settings.FloatList
Creates a new FloatList and initializes it with the values from an array of Floats.
FloatList(String, String, float[]) - Constructor for class org.archive.crawler.settings.FloatList
Creates a new FloatList and initializes it with the values from an float array.
flush() - Method in class org.archive.crawler.admin.CrawlJob
If its a HostQueuesFrontier, needs to be flushed for the queued.
flush() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
flush() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Perform a merge of all 'pending' items to the overall fingerprint list.
flush() - Method in class org.archive.io.RecordingOutputStream
 
flush() - Method in class org.archive.io.SinkHandler
 
FLUSH_DELAY_FACTOR - Static variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
flushProcessingURIs() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Flush any CrawlURIs in the processingUriDB into the primaryUriDB.
focusAccepts(Object) - Method in class org.archive.crawler.scope.BroadScope
Check if URI is accepted by the focus of this scope.
focusAccepts(Object) - Method in class org.archive.crawler.scope.ClassicScope
Check if URI is accepted by the focus of this scope.
focusAccepts(Object) - Method in class org.archive.crawler.scope.DomainScope
Check if an URI is part of this scope.
focusAccepts(Object) - Method in class org.archive.crawler.scope.HostScope
 
focusAccepts(Object) - Method in class org.archive.crawler.scope.PathScope
 
focusAccepts(Object) - Method in class org.archive.crawler.scope.SurtPrefixScope
Check if a URI is part of this scope.
forceAccepts(Object) - Method in class org.archive.crawler.scope.ClassicScope
 
forceFetch() - Method in class org.archive.crawler.datamodel.CandidateURI
If this method returns true, this URI should be fetched even though it already has been crawled.
forceScarceMemory() - Static method in class org.archive.util.TestUtils
Temporarily exhaust memory, forcing weak/soft references to be broken.
forget(String, CandidateURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Forget item was seen
forget(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Forget the given CrawlURI.
forget(String, CandidateURI) - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
forget(String, CandidateURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
forget(String, CandidateURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
format(LogRecord) - Method in class org.archive.crawler.io.LocalErrorFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.RuntimeErrorFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.StatisticsLogFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.UriErrorFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.UriProcessingFormatter
 
format(Matcher, String, StringBuffer) - Method in class org.archive.crawler.url.canonicalize.RegexRule
 
format(LogRecord) - Method in class org.archive.util.OneLineSimpleLogger
 
formatBytesForDisplay(long) - Static method in class org.archive.util.ArchiveUtils
Takes an amount of bytes and formats it for display.
formatMillisecondsToConventional(long) - Static method in class org.archive.util.ArchiveUtils
Convert milliseconds value to a human-readable duration
formatMillisecondsToConventional(long, boolean) - Static method in class org.archive.util.ArchiveUtils
Convert milliseconds value to a human-readable duration
foundURIs - Variable in class org.archive.crawler.extractor.PDFParser
 
fp - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter.PendingItem
 
FPMergeUriUniqFilter - Class in org.archive.crawler.util
UriUniqFilter based on merging FP arrays (in memory or from disk).
FPMergeUriUniqFilter() - Constructor for class org.archive.crawler.util.FPMergeUriUniqFilter
 
FPMergeUriUniqFilter.PendingItem - Class in org.archive.crawler.util
Represents a long fingerprint and (possibly) its corresponding CandidateURI, awaiting the next merge in a 'pending' state.
FPMergeUriUniqFilter.PendingItem(long, CandidateURI) - Constructor for class org.archive.crawler.util.FPMergeUriUniqFilter.PendingItem
 
FPUriUniqFilter - Class in org.archive.crawler.util
UriUniqFilter storing 64-bit UURI fingerprints, using an internal LongFPSet instance.
FPUriUniqFilter(LongFPSet) - Constructor for class org.archive.crawler.util.FPUriUniqFilter
Create FPUriUniqFilter wrapping given long set
FPUriUniqFilterTest - Class in org.archive.crawler.util
Test FPUriUniqFilter.
FPUriUniqFilterTest() - Constructor for class org.archive.crawler.util.FPUriUniqFilterTest
 
FRAME - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
FramesSelfTestCase - Class in org.archive.crawler.selftest
Test crawler can parse pages w/ frames in them.
FramesSelfTestCase() - Constructor for class org.archive.crawler.selftest.FramesSelfTestCase
 
freeMatcher(Matcher) - Method in class org.archive.util.PatternMatcherRecycler
Return the given Matcher to the reuse stack, if stack is not already at its maximum size.
freeReserveMemory() - Method in class org.archive.crawler.framework.CrawlController
 
from(CandidateURI, long) - Static method in class org.archive.crawler.datamodel.CrawlURI
Make a CrawlURI from the passed CandidateURI.
from(Object) - Static method in class org.archive.net.UURI
Convenience method for finding the UURI inside an Object likely to have one.
fromString(String) - Static method in class org.archive.crawler.datamodel.CandidateURI
Given a string containing a URI, then optional whitespace delimited hops-path and via info, create a CandidateURI instance.
fromURI(String) - Static method in class org.archive.util.SURT
Utility method for creating the SURT form of the URI in the given String.
Frontier - Interface in org.archive.crawler.framework
An interface for URI Frontiers.
Frontier.FrontierGroup - Interface in org.archive.crawler.framework
Generic interface representing the internal groupings of a Frontier's URIs -- usually queues.
FrontierHostStatistics - Interface in org.archive.crawler.framework
An optional interface the Frontiers can implement to provide information about specific hosts.
FrontierJournal - Interface in org.archive.crawler.frontier
Record of key Frontier happenings.
FrontierMarker - Interface in org.archive.crawler.framework
A marker is a pointer to a place somewhere inside a frontier's list of pending URIs.
FrontierScheduler - Class in org.archive.crawler.postprocessor
'Schedule' with the Frontier CandidateURIs being carried by the passed CrawlURI.
FrontierScheduler(String) - Constructor for class org.archive.crawler.postprocessor.FrontierScheduler
 

G

GenerationFileHandler - Class in org.archive.io
FileHandler with support for rotating the current file to an archival name with a specified integer suffix, and provision of a new replacement FileHandler with the current filename.
GenerationFileHandler(String, boolean, boolean) - Constructor for class org.archive.io.GenerationFileHandler
Constructor.
GenerationFileHandler(LinkedList, boolean) - Constructor for class org.archive.io.GenerationFileHandler
 
get(String) - Method in class org.archive.configuration.Configuration
 
get(String, String) - Method in interface org.archive.configuration.Registry
Get attributeName on Configurable component.
get(String, String, Class<?>) - Method in interface org.archive.configuration.Registry
Get attributeName on Configurable component.
get(String, String, Class<?>, String) - Method in interface org.archive.configuration.Registry
Get attributeName on Configurable component.
get(String, String) - Method in class org.archive.configuration.registry.JmxRegistry
 
get(String, String, Class) - Method in class org.archive.configuration.registry.JmxRegistry
 
get(String, String, Class, String) - Method in class org.archive.configuration.registry.JmxRegistry
 
get(Object) - Method in class org.archive.crawler.datamodel.CredentialStore
 
get(Object, String) - Method in class org.archive.crawler.datamodel.CredentialStore
 
get(String) - Method in interface org.archive.crawler.framework.AlertManager
 
get(DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Get the next nearest item after the given key.
get(Object) - Method in class org.archive.crawler.settings.DataContainer
 
get(String) - Method in class org.archive.crawler.settings.DataContainer
 
get(int) - Method in class org.archive.crawler.settings.ListType
Returns the object stored at the index specified
get(String) - Method in class org.archive.crawler.settings.SoftSettingsHash
Returns the value to which the specified key is mapped in this weak hash map, or null if the map contains no mapping for this key.
get(String) - Static method in class org.archive.crawler.util.LogReader
Returns the entire file.
get(InputStreamReader) - Static method in class org.archive.crawler.util.LogReader
Reads entire contents of reader, returns as string.
get(String, int, int) - Static method in class org.archive.crawler.util.LogReader
Gets a portion of a log file.
get(InputStreamReader, int, int, long) - Static method in class org.archive.crawler.util.LogReader
Gets a portion of a log file.
get(long) - Method in class org.archive.io.arc.ARCReader
Get record at passed offset.
get() - Method in class org.archive.io.arc.ARCReader
 
get(String) - Static method in class org.archive.io.arc.ARCReaderFactory
Get ARCReader on passed path or url.
get(File) - Static method in class org.archive.io.arc.ARCReaderFactory
 
get(File, long) - Static method in class org.archive.io.arc.ARCReaderFactory
 
get(File, boolean, long) - Static method in class org.archive.io.arc.ARCReaderFactory
 
get(String, InputStream, boolean) - Static method in class org.archive.io.arc.ARCReaderFactory
 
get(URL, long) - Static method in class org.archive.io.arc.ARCReaderFactory
Get an ARCReader aligned at offset.
get(URL) - Static method in class org.archive.io.arc.ARCReaderFactory
Get an ARCReader.
get(long) - Method in class org.archive.io.SinkHandler
 
get(Object) - Method in class org.archive.util.CachedBdbMap
 
get12DigitDate() - Static method in class org.archive.util.ArchiveUtils
Utility function for creating arc-style date stamps in the format yyyMMddHHmm.
get12DigitDate(long) - Static method in class org.archive.util.ArchiveUtils
Utility function for creating arc-style date stamps in the format yyyMMddHHmm.
get12DigitDate(Date) - Static method in class org.archive.util.ArchiveUtils
 
get14DigitDate() - Static method in class org.archive.util.ArchiveUtils
Utility function for creating arc-style date stamps in the format yyyMMddHHmmss.
get14DigitDate(long) - Static method in class org.archive.util.ArchiveUtils
Utility function for creating arc-style date stamps in the format yyyMMddHHmmss.
get14DigitDate(Date) - Static method in class org.archive.util.ArchiveUtils
 
get17DigitDate() - Static method in class org.archive.util.ArchiveUtils
Utility function for creating arc-style date stamps in the format yyyMMddHHmmssSSS.
get17DigitDate(long) - Static method in class org.archive.util.ArchiveUtils
Utility function for creating arc-style date stamps in the format yyyMMddHHmmssSSS.
get17DigitDate(Date) - Static method in class org.archive.util.ArchiveUtils
 
getAbsoluteName() - Method in class org.archive.crawler.settings.ComplexType
Get the absolute name of this ComplexType.
getAcceptedIssuers() - Method in class org.archive.httpclient.ConfigurableX509TrustManager
 
getActiveToeCount() - Method in class org.archive.crawler.framework.CrawlController
 
getActiveToeCount() - Method in class org.archive.crawler.framework.ToePool
 
getAlert(String) - Method in class org.archive.crawler.Heritrix
 
getAlerts() - Method in class org.archive.crawler.Heritrix
 
getAlertsCount() - Method in class org.archive.crawler.Heritrix
 
getAList() - Method in class org.archive.crawler.datamodel.CandidateURI
Deprecated. Public access will be deprecated. This methods access will change in next release. Use specialized accessors instead such as CandidateURI.getString(String).
getAll() - Method in interface org.archive.crawler.framework.AlertManager
 
getAll() - Method in class org.archive.io.SinkHandler
 
getAllLocalHostNames() - Static method in class org.archive.util.InetAddressUtil
 
getAllUnread() - Method in class org.archive.io.SinkHandler
 
getAndCheckJob(CrawlJob, HttpServletRequest, HttpServletResponse) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Check passed crawljob CrawlJob setting.
getAnnotations() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the annotations set for this uri.
getArc() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getArcFile() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getArcFile() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getArcFile() - Method in class org.archive.io.arc.ARCWriter
Get arcFile.
getArcMaxSize() - Method in class org.archive.crawler.writer.ARCWriterProcessor
Max size we want ARC files to be (bytes).
getArcMaxSize() - Method in class org.archive.io.arc.ARCWriter.ARCWriterSettingsImpl
 
getArcMaxSize() - Method in interface org.archive.io.arc.ARCWriterSettings
 
getArcPrefix() - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
getArcPrefix() - Method in class org.archive.io.arc.ARCWriter.ARCWriterSettingsImpl
 
getArcPrefix() - Method in interface org.archive.io.arc.ARCWriterSettings
 
getArcSuffix() - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
getArcSuffix() - Method in class org.archive.io.arc.ARCWriter.ARCWriterSettingsImpl
 
getArcSuffix() - Method in interface org.archive.io.arc.ARCWriterSettings
 
getAt(long) - Method in class org.archive.util.AbstractLongFPSet
Get the stored value at the given slot.
getAt(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
getAttribute(String) - Method in class org.archive.configuration.Configuration
 
getAttribute(String) - Method in class org.archive.crawler.admin.CrawlJob
 
getAttribute(String) - Method in class org.archive.crawler.Heritrix
 
getAttribute(String) - Method in class org.archive.crawler.settings.ComplexType
Obtain the value of a specific attribute from the crawl order.
getAttribute(String, CrawlURI) - Method in class org.archive.crawler.settings.ComplexType
Obtain the value of a specific attribute that is valid for a specific CrawlURI.
getAttribute(Object, String) - Method in class org.archive.crawler.settings.ComplexType
Obtain the value of a specific attribute that is valid for a specific CrawlerSettings object.
getAttribute(String) - Method in class org.archive.util.JEApplicationMBean
 
getAttribute(Environment, String) - Method in class org.archive.util.JEMBeanHelper
Get an attribute value for the given environment.
getAttributeInfo(CrawlerSettings, String) - Method in class org.archive.crawler.settings.ComplexType
Get the effective Attribute info for an element of this type from a settings object.
getAttributeInfo(String) - Method in class org.archive.crawler.settings.ComplexType
Get the Attribute info for an element of this type from the global settings.
getAttributeInfo(String) - Method in class org.archive.crawler.settings.DataContainer
 
getAttributeInfoIterator(Object) - Method in class org.archive.crawler.settings.ComplexType
Get an Iterator over all the MBeanAttributeInfo in this ComplexType.
getAttributeList(Environment) - Method in class org.archive.util.JEMBeanHelper
Get MBean attribute metadata for this environment.
getAttributeNames() - Method in class org.archive.configuration.Configuration
 
getAttributes(String[]) - Method in class org.archive.configuration.Configuration
 
getAttributes() - Method in class org.archive.configuration.Configuration
 
getAttributes(String[]) - Method in class org.archive.crawler.admin.CrawlJob
 
getAttributes(String[]) - Method in class org.archive.crawler.Heritrix
 
getAttributes(String[]) - Method in class org.archive.crawler.settings.ComplexType
 
getAttributes(String[]) - Method in class org.archive.util.JEApplicationMBean
 
getAttributeUnchecked(String) - Method in class org.archive.crawler.writer.ARCWriterProcessor
Version of getAttributes that catches and logs exceptions and returns null if failure to fetch the attribute.
getAudience() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the audience/customer/recipient of the crawl job product from this CrawlerSettings object.
getAudience() - Method in class org.archive.crawler.settings.refinements.Refinement
 
getAuthorityMinusUserinfo() - Method in class org.archive.net.UURI
Return the authority minus userinfo (if any).
getAuthScheme(HttpMethod, CrawlURI) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
getAuxiliaryDirectory() - Method in class org.archive.io.ObjectPlusFilesInputStream
Return the top auxiliary directory, from which saved files are restored.
getAuxiliaryDirectory() - Method in class org.archive.io.ObjectPlusFilesOutputStream
Return the current auxiliary directory for storing files associated with serialized objects.
getBaos(String) - Static method in class org.archive.io.arc.ARCWriterTest
 
getBaseDomain() - Method in class org.archive.configuration.registry.JmxRegistry
 
getBaseURI() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the (HTML) Base URI used for derelativizing internal URIs.
getBdbEnvironment() - Method in class org.archive.crawler.framework.CrawlController
 
getBdbLogFileName(long) - Method in class org.archive.crawler.framework.CrawlController
 
getBdbSubDirectory(File) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getBigMap(String, Class, Class) - Method in class org.archive.crawler.framework.CrawlController
Call this method to get instance of the crawler BigMap implementation.
getBit(long) - Method in class org.archive.util.BloomFilter32bit
Returns from the local bitvector the value of the bit with the specified index.
getBit(long) - Method in class org.archive.util.BloomFilter32bitSplit
Returns from the local bitvector the value of the bit with the specified index.
getBit(int) - Method in class org.archive.util.BloomFilter32bp2
Returns from the local bitvector the value of the bit with the specified index.
getBit(int) - Method in class org.archive.util.BloomFilter32bp2Split
Returns from the local bitvector the value of the bit with the specified index.
getBit(long) - Method in class org.archive.util.BloomFilter64bit
Returns from the local bitvector the value of the bit with the specified index.
getBodyOffset() - Method in class org.archive.io.arc.ARCRecord
 
getBooleanProperty(String) - Static method in class org.archive.util.PropertyUtils
 
getBufferedInput(File) - Static method in class org.archive.crawler.frontier.RecoveryJournal
Get a BufferedInputStream on the recovery file given.
getBufferedReader(File) - Static method in class org.archive.crawler.frontier.RecoveryJournal
 
getByRealm(Set, String, CrawlURI) - Static method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
Convenience method that does look up on passed set using realm for key.
getByRegExpr(String, String, int, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegExpr(InputStreamReader, String, int, boolean, int, int, long) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegExpr(String, String, String, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegExpr(InputStreamReader, String, String, boolean, int, int, long) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegExprFromSeries(String, String, int, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegExprFromSeries(String, String, String, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getBytesPerFileType(String) - Method in class org.archive.crawler.admin.StatisticsTracker
Returns the accumulated number of bytes from files of a given file type.
getBytesPerHost(String) - Method in class org.archive.crawler.admin.StatisticsTracker
Returns the accumulated number of bytes downloaded from a given host.
getCacheMisses() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
getCandidateURIString() - Method in class org.archive.crawler.datamodel.CandidateURI
 
getCause() - Method in exception org.archive.io.arc.ARCReader.RecoverableIOException
 
getCharacterEncoding() - Method in class org.archive.util.HttpRecorder
 
getCharSequence() - Method in interface org.archive.extractor.CharSequenceProvider
 
getCheckpointCopyBdbjeLogs() - Method in class org.archive.crawler.framework.CrawlController
 
getCheckpointInProgressDirectory() - Method in class org.archive.crawler.framework.Checkpointer
 
getCheckpointRecover() - Method in class org.archive.crawler.framework.CrawlController
Get recover checkpoint.
getCheckpointRecover(CrawlOrder) - Static method in class org.archive.crawler.framework.CrawlController
 
getCheckpointsDirectory() - Method in class org.archive.crawler.datamodel.CrawlOrder
 
getCheckpointsDisk() - Method in class org.archive.crawler.framework.CrawlController
 
getClassCatalog() - Method in class org.archive.crawler.framework.CrawlController
 
getClassCheckpointFile(File, String, Class) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassCheckpointFile(File, Class) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassCheckpointFilename(Class) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassCheckpointFilename(Class, String) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassKey() - Method in class org.archive.crawler.datamodel.CandidateURI
Get the token (usually the hostname + port) which indicates what "class" this CrawlURI should be grouped with, for the purposes of ensuring only one item of the class is processed at once, all items of the class are held for a politeness period, etc.
getClassKey(CandidateURI) - Method in interface org.archive.crawler.framework.Frontier
 
getClassKey(CandidateURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getClassKey(CandidateURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
getClassKey(CrawlController, CandidateURI) - Method in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
getClassKey(CrawlController, CandidateURI) - Method in class org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
 
getClassKey(CrawlController, CrawlURI) - Method in class org.archive.crawler.frontier.IPQueueAssignmentPolicy
 
getClassKey(CrawlController, CandidateURI) - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
Get the String key (name) of the queue to which the CrawlURI should be assigned.
getClassKey(CrawlController, CandidateURI) - Method in class org.archive.crawler.frontier.SurtAuthorityQueueAssignmentPolicy
 
getClassKey() - Method in class org.archive.crawler.frontier.WorkQueue
 
getClassName(String) - Static method in class org.archive.crawler.settings.SettingsHandler
 
getClasspathPath(File) - Static method in class org.archive.util.IoUtils
 
getCommandLine() - Method in class org.archive.crawler.CommandLineParser
 
getCommandLineArguments() - Method in class org.archive.crawler.CommandLineParser
 
getCommandLineOptions() - Method in class org.archive.crawler.CommandLineParser
 
getCompletedJobs() - Method in class org.archive.crawler.admin.CrawlJobHandler
 
getComplexType() - Method in class org.archive.crawler.settings.DataContainer
Get the ComplexType for which this DataContainer keeps data.
getComplexTypeByAbsoluteName(CrawlerSettings, String) - Method in class org.archive.crawler.settings.SettingsHandler
Get a complex type by its absolute name.
getCompositeData() - Method in class org.archive.configuration.Pointer
 
getCompositeType() - Static method in class org.archive.configuration.Pointer
 
getCompoundName(String) - Static method in class org.archive.util.JndiUtils
 
getCompoundName(ObjectName) - Static method in class org.archive.util.JndiUtils
Return name to use as jndi name.
getConfdir() - Static method in class org.archive.crawler.Heritrix
Get the configuration directory.
getConfdir(boolean) - Static method in class org.archive.crawler.Heritrix
Get the configuration directory.
getConfiguration() - Method in interface org.archive.configuration.Configurable
Return a Configuration.
getConfiguration() - Method in class org.archive.configuration.registry.CrawlOrder
 
getConfiguration() - Method in class org.archive.configuration.registry.CrawlOrderSubClass
 
getConfiguration() - Method in class org.archive.configuration.registry.TestProcessor
 
getConfiguration() - Method in class org.archive.configuration.StoreElement
 
getConfiguration() - Method in class org.archive.crawler.Heritrix
 
getConfiguredImplementation(Object) - Method in class org.archive.crawler.deciderules.ExternalGeoLocationDecideRule
Get implementation, if one specified.
getConfiguredImplementation(Object) - Method in class org.archive.crawler.deciderules.ExternalImplDecideRule
Get implementation, if one specified.
getConnection() - Method in class org.archive.httpclient.HttpRecorderMethod
 
getConnection(HostConfiguration) - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
 
getConnection(HostConfiguration, long) - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
Deprecated. Use #getConnectionWithTimeout(HostConfiguration, long)
getConnectionWithTimeout(HostConfiguration, long) - Method in class org.archive.httpclient.SingleHttpConnectionManager
 
getConnectionWithTimeout(HostConfiguration, long) - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
 
getConstraints() - Method in class org.archive.crawler.settings.MapType
 
getConstraints() - Method in class org.archive.crawler.settings.Type
Returns a list of constraints for the value of this type.
getContent() - Static method in class org.archive.io.arc.ARCWriterTest
 
getContent(String) - Static method in class org.archive.io.arc.ARCWriterTest
 
getContentDigest() - Method in class org.archive.crawler.datamodel.CrawlURI
Return the retained content-digest value, if any.
getContentHandler() - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
getContentLength() - Method in class org.archive.crawler.datamodel.CrawlURI
For completed HTTP transactions, the length of the content-body.
getContentReplayInputStream() - Method in class org.archive.io.RecordingInputStream
 
getContentReplayInputStream() - Method in class org.archive.io.RecordingOutputStream
Return a replay stream, cued up to begining of content
getContentSize() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the size in bytes of this URI's content.
getContentType() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the content type of this URI.
getContentType() - Method in class org.archive.crawler.settings.MapType
Get the content type allowed for this map.
getContext() - Method in class org.archive.crawler.extractor.Link
 
getController() - Method in class org.archive.crawler.admin.CrawlJob
 
getController() - Method in class org.archive.crawler.datamodel.CrawlOrder
 
getController() - Method in class org.archive.crawler.framework.Checkpointer
 
getController() - Method in class org.archive.crawler.framework.Processor
Get the controller object.
getController() - Method in class org.archive.crawler.framework.ToePool
 
getController() - Method in class org.archive.crawler.framework.ToeThread
Get the CrawlController acossiated with this thread.
getCookieValue(Cookie[], String, String) - Static method in class org.archive.crawler.admin.ui.CookieUtils
 
getCount() - Method in interface org.archive.crawler.framework.AlertManager
 
getCount() - Method in class org.archive.crawler.frontier.WorkQueue
 
getCount() - Method in class org.archive.io.SinkHandler
 
getCountryCode() - Method in class org.archive.crawler.datamodel.CrawlHost
Get country code of this host
getCrawlendReport(String, String) - Method in class org.archive.crawler.Heritrix
Return named crawl end report for job with passed uid.
getCrawlEndTime() - Method in class org.archive.crawler.framework.AbstractTracker
If crawl has ended it will return the time it ended (given by System.currentTimeMillis() at that time).
getCrawlerTotalElapsedTime() - Method in class org.archive.crawler.framework.AbstractTracker
 
getCrawlerTotalElapsedTime() - Method in interface org.archive.crawler.framework.StatisticsTracking
Total amount of time spent actively crawling so far.
getCrawlJob() - Method in class org.archive.crawler.admin.CrawlJob.MBeanCrawlController
 
getCrawlJob() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getCrawlJobDir() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getCrawlOrderAttribute(String) - Method in class org.archive.crawler.admin.CrawlJob
 
getCrawlOrderAttribute(String, ComplexType) - Method in class org.archive.crawler.admin.CrawlJob
 
getCrawlOrderName() - Method in class org.archive.crawler.datamodel.CrawlOrder
Get the name of the order file.
getCrawlPauseStartedTime() - Method in class org.archive.crawler.framework.AbstractTracker
Get the time when the the crawl was last paused/suspended (as given by System.currentTimeMillis() at that time).
getCrawlStartTime() - Method in class org.archive.crawler.framework.AbstractTracker
Get the starting time of the crawl (as given by System.currentTimeMillis() when the crawl started).
getCrawlStatus() - Method in class org.archive.crawler.admin.CrawlJob
 
getCrawlTotalPauseTime() - Method in class org.archive.crawler.framework.AbstractTracker
Returns the number of milliseconds that the crawl spent paused or otherwise in a nonactive state.
getCrawlURI(ARCRecord, HttpRecorder) - Method in class org.archive.crawler.extractor.ExtractorTool
 
getCrawlURI(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns the CrawlURI associated with the specified URI (string) or null if no such CrawlURI is queued in this HQ.
getCrawlURIString() - Method in class org.archive.crawler.datamodel.CrawlURI
 
getCreationTime() - Method in class org.archive.io.SinkHandlerLogRecord
 
getCredential(SettingsHandler, CrawlURI) - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
getCredentialAvatars() - Method in class org.archive.crawler.datamodel.CrawlServer
 
getCredentialAvatars() - Method in class org.archive.crawler.datamodel.CrawlURI
 
getCredentialDomain(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
 
getCredentialStore(SettingsHandler) - Static method in class org.archive.crawler.datamodel.CredentialStore
Get a credential store reference.
getCredentialTypes() - Static method in class org.archive.crawler.datamodel.CredentialStore
 
getCurrentJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
 
getCurrentProcessorName() - Method in class org.archive.crawler.framework.ToeThread
 
getCurrentRecord() - Method in class org.archive.io.arc.ARCReader
 
getCustomRobots(CrawlerSettings) - Method in class org.archive.crawler.datamodel.RobotsHonoringPolicy
Get the supplied custom robots.txt
getData(ComplexType) - Method in class org.archive.crawler.settings.CrawlerSettings
 
getData(String) - Method in class org.archive.crawler.settings.CrawlerSettings
 
getDatabaseName() - Method in class org.archive.util.CachedBdbMap
 
getDataContainerRecursive(ComplexType.Context) - Method in class org.archive.crawler.settings.ComplexType
Get the active data container for this ComplexType for a specific settings object.
getDataContainerRecursive(ComplexType.Context, String) - Method in class org.archive.crawler.settings.ComplexType
Get the active data container for this ComplexType for a specific settings object.
getDate() - Method in class org.archive.io.arc.ARCRecordMetaData
Get the time when the record was harvested.
getDecideRule(Object) - Method in class org.archive.crawler.deciderules.DecidingFilter
 
getDecideRule(Object) - Method in class org.archive.crawler.deciderules.DecidingScope
 
getDefaultMessage() - Method in class org.archive.crawler.settings.Constraint
Get the default message to return if a check fails.
getDefaultNextProcessor(CrawlURI) - Method in class org.archive.crawler.framework.Processor
Returns the next processor for the given CrawlURI in the processor chain.
getDefaultProfile() - Method in class org.archive.crawler.admin.CrawlJobHandler
Returns the default profile.
getDefaultValue() - Method in class org.archive.crawler.settings.ComplexType
 
getDefaultValue() - Method in class org.archive.crawler.settings.ListType
 
getDefaultValue() - Method in class org.archive.crawler.settings.ModuleAttributeInfo
 
getDefaultValue() - Method in class org.archive.crawler.settings.SimpleType
 
getDefaultValue() - Method in class org.archive.crawler.settings.Type
The default value for this type
getDeferrals() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the deferral count.
getDefinition(String) - Method in class org.archive.crawler.settings.ComplexType
Get the content type definition for an attribute.
getDefinition() - Method in class org.archive.crawler.settings.Constraint.FailedCheck
Get the definition for the checked attribute.
getDefinition(String) - Method in class org.archive.crawler.settings.MapType
Get the content type definition for attributes of this map.
getDescription() - Method in class org.archive.crawler.settings.ComplexType
Get the description of this type The description should be suitable for showing in a user interface.
getDescription() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the description of this CrawlerSettings object.
getDescription() - Method in class org.archive.crawler.settings.ListType
 
getDescription() - Method in interface org.archive.crawler.settings.refinements.Criteria
Returns a description of the Criteria's current settings.
getDescription() - Method in class org.archive.crawler.settings.refinements.PortnumberCriteria
 
getDescription() - Method in class org.archive.crawler.settings.refinements.Refinement
Return the description of this refinement.
getDescription() - Method in class org.archive.crawler.settings.refinements.RegularExpressionCriteria
 
getDescription() - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
 
getDescription() - Method in class org.archive.crawler.settings.SimpleType
 
getDescription() - Method in class org.archive.crawler.settings.Type
Get the description of this type The description should be suitable for showing in a user interface.
getDestination() - Method in class org.archive.crawler.extractor.Link
 
getDigest() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getDigestValue() - Method in class org.archive.io.RecordingInputStream
Return the digest value for any recorded, digested data.
getDigestValue() - Method in class org.archive.io.RecordingOutputStream
Return the digest value for any recorded, digested data.
getDirectory() - Method in class org.archive.crawler.admin.CrawlJob
Returns the path of the job's base directory.
getDirectory() - Method in class org.archive.crawler.datamodel.Checkpoint
 
getDisk() - Method in class org.archive.crawler.framework.CrawlController
Get the 'working' directory of the current crawl.
getDisplayName() - Method in class org.archive.crawler.admin.CrawlJob
Return the combination of given name and UID most commonly used in administrative interface.
getDisplayName() - Method in class org.archive.crawler.datamodel.Checkpoint
 
getDisposition() - Method in class org.archive.crawler.admin.SeedRecord
 
getDiversionLog(String) - Method in class org.archive.crawler.processor.CrawlMapper
Get the diversion log for a given target crawler node node.
getDomainOverrides(String) - Method in class org.archive.crawler.settings.SettingsHandler
Will return a Collection of strings with domains that contain 'per' domain overrides (or their subdomains contain them).
getDomainOverrides(String) - Method in class org.archive.crawler.settings.XMLSettingsHandler
 
getDTDHandler() - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
getEarliestNextURIEmitTime() - Method in class org.archive.crawler.datamodel.CrawlHost
Get the earliest time a URI for this host could be emitted.
getElement() - Method in exception org.archive.crawler.framework.exceptions.ConfigurationException
 
getElementFromDefinition(String) - Method in class org.archive.crawler.settings.ComplexType
Get an element definition from this complex type.
getEmbedHopCount() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the embeded hop count.
getEntityResolver() - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
getEnvironmentHome() - Method in class org.archive.util.JEMBeanHelper
Return the target environment directory.
getEnvironmentIfOpen() - Method in class org.archive.util.JEMBeanHelper
Return an Environment only if the environment has already been opened in this process.
getEnvironmentOpenConfig() - Method in class org.archive.util.JEMBeanHelper
If the helper was instantiated with canConfigure==true, it shows environment configuration attributes.
getError(String) - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
Get error for a specific attribute.
getError(String, Level) - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
Get error for a specific attribute
getErrorHandler() - Method in class org.archive.crawler.admin.CrawlJob
 
getErrorHandler() - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
getErrorMessage() - Method in class org.archive.crawler.admin.CrawlJob
Get the error message associated with this job.
getErrors() - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
Get an List of all the encountered errors.
getErrors(Level) - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
Get an List of all the encountered errors.
getEscapedURI() - Method in class org.archive.net.UURI
 
getFeature(String) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
getFetchAttempts() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the number of attempts at getting the document referenced by this URI.
getFetchNonResponses() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getFetchResponses() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getFetchStatus() - Method in class org.archive.crawler.datamodel.CrawlURI
Return the overall/fetch status of this CrawlURI for its current trip through the processing loop.
getFetchSuccesses() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getFextra() - Method in class org.archive.io.GzipHeader
 
getFile() - Method in exception org.archive.crawler.framework.exceptions.ConfigurationException
 
getFile() - Method in class org.archive.crawler.processor.CrawlMapper.FilePrintWriter
 
getFile() - Method in class org.archive.crawler.writer.MirrorWriterProcessor.URIToFileReturn
Gets this path as a File.
getFile() - Method in class org.archive.net.rsync.RsyncURLConnection
 
getFileDistribution() - Method in class org.archive.crawler.admin.StatisticsTracker
Returns a HashMap that contains information about distributions of encountered mime types.
getFilenameSeries() - Method in class org.archive.io.GenerationFileHandler
 
getFilesWithPrefix(File, String) - Static method in class org.archive.util.FileUtils
Get a list of all files in directory that have passed prefix.
getFilterOffPosition(CrawlURI) - Method in class org.archive.crawler.filter.PathDepthFilter
 
getFilterOffPosition(CrawlURI) - Method in class org.archive.crawler.filter.PathologicalPathFilter
 
getFilterOffPosition(CrawlURI) - Method in class org.archive.crawler.framework.Filter
If the filter is disabled, the value returned by this method is what filters return as their disabled setting.
getFirstARecord(Record[]) - Method in class org.archive.crawler.fetcher.FetchDNS
 
getFirstChain() - Method in class org.archive.crawler.framework.ProcessorChainList
Get the first processor chain.
getFirstKey() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
 
getFirstProcessor() - Method in class org.archive.crawler.framework.ProcessorChain
Get the first processor in the chain.
getFirstProcessorChain() - Method in class org.archive.crawler.framework.CrawlController
Get the first processor chain.
getFirstWord(String) - Static method in class org.archive.util.TextUtils
 
getFlg() - Method in class org.archive.io.GzipHeader
 
getFormItems(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
getFrom(CrawlURI) - Method in class org.archive.crawler.datamodel.CrawlOrder
 
getFrom(FrontierMarker, int) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
 
getFrom() - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
Get the beginning of the time frame to check against.
getFromSeries(String, int, int) - Static method in class org.archive.crawler.util.LogReader
Gets a portion of a log spread across a numbered series of files.
getFrontier() - Method in class org.archive.crawler.framework.CrawlController
 
getFrontierJournal() - Method in interface org.archive.crawler.framework.Frontier
 
getFrontierJournal() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getFrontierJournal() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
getFrontierOneLine() - Method in class org.archive.crawler.admin.CrawlJob
 
getFrontierReport(String) - Method in class org.archive.crawler.admin.CrawlJob
 
getGlobalSettings() - Method in class org.archive.crawler.settings.SettingsCache
 
getGlobalSettings() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getGroup(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Get the 'frontier group' (usually queue) for the given CrawlURI.
getGroup(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
getGroup(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getGzipHeader() - Method in class org.archive.io.GzippedInputStream
 
getHeaderFieldKeys() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getHeaderFields() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getHeaderValue(String) - Method in class org.archive.io.arc.ARCRecordMetaData
 
getHeritrixHome() - Static method in class org.archive.crawler.Heritrix
Exploit -Dheritrix.home if available to us.
getHeritrixOut() - Static method in class org.archive.crawler.Heritrix
 
getHolder() - Method in class org.archive.crawler.datamodel.CrawlURI
Return the 'holder' for the convenience of an external facility.
getHolderCost() - Method in class org.archive.crawler.datamodel.CrawlURI
Return the 'holderCost' for convenience of external facility (frontier)
getHolderKey() - Method in class org.archive.crawler.datamodel.CrawlURI
Return the 'holderKey' for convenience of an external facility (Frontier).
getHopType() - Method in class org.archive.crawler.extractor.Link
 
getHost() - Method in class org.archive.net.UURI
 
getHostAddress(String) - Static method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
Get host address using first the heritrix cache of addresses, then, failing that, go to the dnsjava cache.
getHostAddress(String) - Static method in class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
Get host address using first the heritrix cache of addresses, then, failing that, go to the dnsjava cache.
getHostAddress(String) - Static method in class org.archive.util.DNSJavaUtil
Return an InetAddress for passed host.
getHostBasename() - Method in class org.archive.net.UURI
Strips www variants from the host.
getHostFor(String) - Method in class org.archive.crawler.datamodel.ServerCache
Get the CrawlHost associated with name.
getHostFor(CrawlURI) - Method in class org.archive.crawler.datamodel.ServerCache
Get the CrawlHost associated with curi.
getHostingHeritrix() - Method in class org.archive.crawler.admin.CrawlJob
 
getHostLastFinished(String) - Method in class org.archive.crawler.admin.StatisticsTracker
Returns the time (in millisec) when a URI belonging to a given host was last finished processing.
getHostName() - Method in class org.archive.crawler.datamodel.CrawlHost
Get the host name.
getHostName() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns the HQ's name
getHQ(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Get the AdaptiveRevisitHostQueue for the given CrawlURI, creating it if necessary.
getHQ(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
Get an AdaptiveRevisitHostQueue for the specified host.
getHtdocs() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getHttp() - Method in class org.archive.crawler.fetcher.FetchHTTP
 
getHttpHeaders() - Method in class org.archive.io.arc.ARCRecord
 
getHttpMethod(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
getHttpRecorder() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the http recorder associated with this uri.
getHttpRecorder() - Method in class org.archive.crawler.framework.ToeThread
Used to get current threads HttpRecorder instance.
getHttpRecorder() - Method in class org.archive.httpclient.HttpRecorderMethod
 
getHttpRecorder() - Static method in class org.archive.util.HttpRecorder
Get the current threads' HttpRecorder.
getHttpRecorder() - Method in interface org.archive.util.HttpRecorderMarker
 
getHttpServer() - Static method in class org.archive.crawler.Heritrix
 
getIgnoredSeeds() - Method in class org.archive.crawler.admin.CrawlJob
Utility method to get the stored list of ignored seed items (if any), from the last time the seeds were imported to the frontier.
getInflater() - Method in class org.archive.io.GzippedInputStream
 
getInFromFile(String) - Method in class org.archive.crawler.extractor.PDFParser
Read a file named 'doc' and store its' bytes for later processing.
getInitialMarker(String, boolean) - Method in class org.archive.crawler.admin.CrawlJob
Returns a URIFrontierMarker for the current, paused, job.
getInitialMarker(String, boolean) - Method in class org.archive.crawler.admin.CrawlJobHandler
Returns a URIFrontierMarker for the current, paused, job.
getInitialMarker(String, boolean) - Method in interface org.archive.crawler.framework.Frontier
Get a URIFrontierMarker initialized with the given regular expression at the 'start' of the Frontier.
getInitialMarker(String, boolean) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
getInitialMarker(String, boolean) - Method in class org.archive.crawler.frontier.BdbFrontier
 
getInitialMarker(String) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Get a marker for beginning a scan over all contents
getInputSource() - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
getInputStream(String) - Static method in class org.archive.crawler.util.IoUtils
 
getInputStream(File, String) - Static method in class org.archive.crawler.util.IoUtils
Get inputstream.
getInputStream(File, long) - Method in class org.archive.io.arc.ARCReader
Convenience method for constructors.
getInputStream() - Method in class org.archive.io.arc.ARCReader
 
getInputStream() - Method in class org.archive.io.GzippedInputStream
 
getInputStream() - Method in class org.archive.net.rsync.RsyncURLConnection
 
getInstance() - Static method in class org.archive.io.ReplayCharSequenceFactory
 
getInstance() - Static method in class org.archive.io.SinkHandler
 
getInstance(String) - Static method in class org.archive.net.UURIFactory
 
getInstance(String, String) - Static method in class org.archive.net.UURIFactory
 
getInstance(UURI, String) - Static method in class org.archive.net.UURIFactory
 
getInstances() - Static method in class org.archive.crawler.Heritrix
 
getInt(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
getInteger(String) - Method in class org.archive.configuration.Configuration
 
getIntProperty(String, int) - Static method in class org.archive.util.PropertyUtils
 
getIntValue() - Method in class org.archive.crawler.util.StringIntPair
 
getIP() - Method in class org.archive.crawler.datamodel.CrawlHost
Get the IP address for this host.
getIp() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getIpFetched() - Method in class org.archive.crawler.datamodel.CrawlHost
Get the time when the IP address for this host was last looked up.
getIPHostAddress(String) - Static method in class org.archive.util.InetAddressUtil
Returns InetAddress for passed host IF its in IPV4 quads format (e.g.
getIpTTL() - Method in class org.archive.crawler.datamodel.CrawlHost
Get the TTL value from the dns record for this host.
getIPValidityDuration(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Get the maximum time a dns-record is valid.
getIterator(boolean) - Method in class org.archive.queue.MemQueue
 
getIterator(boolean) - Method in interface org.archive.queue.Queue
Returns an iterator for the queue.
getIteratorOfURLsSuccessfullyCrawledFromSeedUrl(String) - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getJeLogsFilter() - Static method in class org.archive.crawler.util.CheckpointUtils
 
getJmxJobName() - Method in class org.archive.crawler.admin.CrawlJob
 
getJmxObjectName() - Static method in class org.archive.crawler.Heritrix
 
getJmxObjectName(String) - Static method in class org.archive.crawler.Heritrix
 
getJmxObjectName(String, String) - Static method in class org.archive.crawler.Heritrix
 
getJndiContainerName() - Static method in class org.archive.crawler.Heritrix
 
getJndiContext() - Static method in class org.archive.crawler.Heritrix
 
getJob(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
Return a job with the given UID.
getJobHandler() - Method in class org.archive.crawler.Heritrix
Get the job handler
getJobName() - Method in class org.archive.crawler.admin.CrawlJob
Returns this job's 'name'.
getJobPriority() - Method in class org.archive.crawler.admin.CrawlJob
Get this job's level of priority.
getJobsdir() - Static method in class org.archive.crawler.Heritrix
 
getKey(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
 
getKey() - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
getKey(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
getKey(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
getKey() - Method in class org.archive.crawler.settings.SoftSettingsHash.SettingsEntry
 
getLastCacheMissDiff() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
getLastChain() - Method in class org.archive.crawler.framework.ProcessorChainList
Get the last processor chain.
getLastSavedTime() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the time when this CrawlerSettings was last saved to persistent storage.
getLegalValues() - Method in class org.archive.crawler.settings.ComplexType
 
getLegalValues() - Method in class org.archive.crawler.settings.ListType
The getLegalValues is not applicable for list so this method will always return null.
getLegalValues() - Method in class org.archive.crawler.settings.ModuleAttributeInfo
 
getLegalValues() - Method in class org.archive.crawler.settings.SimpleType
Get the array of legal values for this Type.
getLegalValues() - Method in class org.archive.crawler.settings.Type
Get the legal values for this type.
getLegalValueType() - Method in class org.archive.crawler.settings.Type
Get the class values of this Type must be an instance of.
getLength() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getLength() - Method in class org.archive.io.GzipHeader
 
getLevel() - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
 
getLevel() - Method in class org.archive.crawler.settings.Constraint.FailedCheck
Get the severity level.
getLevel() - Method in class org.archive.io.SinkHandlerLogRecord
 
getLinkCount() - Method in class org.archive.crawler.extractor.CrawlUriSWFAction
 
getLinkHopCount() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the link hop count.
getListOfAllFiles() - Method in class org.archive.crawler.settings.SettingsHandler
Creates and returns a List of all files comprising the current settings framework.
getListOfAllFiles() - Method in class org.archive.crawler.settings.XMLSettingsHandler
 
getLocalAttribute(CrawlerSettings, String) - Method in class org.archive.crawler.settings.ComplexType
Obtain the value of a specific attribute that is valid for a specific CrawlerSettings object.
getLocalAttributeInfoList() - Method in class org.archive.crawler.settings.DataContainer
 
getLocalizedMessage() - Method in exception org.archive.io.arc.ARCReader.RecoverableIOException
 
getLog14Date() - Static method in class org.archive.util.ArchiveUtils
Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC.
getLog14Date(long) - Static method in class org.archive.util.ArchiveUtils
Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC.
getLog17Date() - Static method in class org.archive.util.ArchiveUtils
Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC.
getLog17Date(long) - Static method in class org.archive.util.ArchiveUtils
Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC.
getLogger() - Static method in class org.archive.crawler.util.RecoveryLogMapper
 
getLogger() - Method in class org.archive.io.arc.ARCReader
 
getLoggerName() - Method in class org.archive.io.SinkHandlerLogRecord
 
getLoggers() - Method in class org.archive.crawler.datamodel.CrawlOrder
Returns the Map of the StatisticsTracking modules that are included in the configuration that the current instance of this class is representing.
getLogin(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
getLoginUri(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
getLogPath(String) - Method in class org.archive.crawler.admin.CrawlJob
Returns the absolute path of the specified log.
getLogRegistrationMsg(String, MBeanServer, boolean) - Static method in class org.archive.util.JmxUtils
Return a string suitable for logging on registration.
getLogsDir() - Method in class org.archive.crawler.framework.CrawlController
 
getLogsDir() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getLogUnregistrationMsg(String, MBeanServer) - Static method in class org.archive.util.JmxUtils
 
getLogWriteInterval() - Method in class org.archive.crawler.framework.AbstractTracker
The number of seconds to wait between writing snapshot data to log file.
getLong(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
getMatchDomainURI() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getMatcher(CharSequence) - Method in class org.archive.util.PatternMatcherRecycler
Get a Matcher for the internal Pattern, against the given input sequence.
getMatcher(String, CharSequence) - Static method in class org.archive.util.TextUtils
Get a matcher object for a precompiled regex pattern.
getMatchExpression() - Method in interface org.archive.crawler.framework.FrontierMarker
Returns the regular expression that this marker uses.
getMatchExpression() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
getMatchHostURI() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getMatchReturnValue(Object) - Method in class org.archive.crawler.filter.URIListRegExpFilter
 
getMaxToes() - Method in class org.archive.crawler.datamodel.CrawlOrder
Returns the set number of maximum toe threads.
getMaxToWrite() - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
getMBeanInfo() - Method in class org.archive.configuration.Configuration
 
getMBeanInfo() - Method in class org.archive.crawler.admin.CrawlJob
 
getMBeanInfo() - Method in class org.archive.crawler.Heritrix
 
getMBeanInfo() - Method in class org.archive.crawler.settings.ComplexType
 
getMBeanInfo(Object) - Method in class org.archive.crawler.settings.ComplexType
 
getMBeanInfo() - Method in class org.archive.crawler.settings.DataContainer
 
getMBeanInfo() - Method in class org.archive.util.JEApplicationMBean
 
getMbeanName() - Method in class org.archive.crawler.admin.CrawlJob
 
getMBeanName() - Method in class org.archive.crawler.Heritrix
 
getMBeanServer() - Static method in class org.archive.util.JmxUtils
Get MBeanServer.
getMessage() - Method in class org.archive.crawler.settings.Constraint.FailedCheck
Get the error message.
getMessage() - Method in exception org.archive.io.arc.ARCReader.RecoverableIOException
 
getMetadata() - Method in class org.archive.crawler.writer.ARCWriterProcessor
Return list of metadatas to add to first arc file metadata record.
getMetaData() - Method in class org.archive.io.arc.ARCRecord
 
getMetadata() - Method in class org.archive.io.arc.ARCWriter.ARCWriterSettingsImpl
 
getMetadata() - Method in interface org.archive.io.arc.ARCWriterSettings
 
getMetadataBody(File) - Method in class org.archive.crawler.writer.ARCWriterProcessor
Write the arc metadata body content.
getMetadataHeaderLinesTwoAndThree(String) - Method in class org.archive.io.arc.ARCWriter
 
getMetaDatas() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getMetaLine(String, String, String, long, int) - Method in class org.archive.io.arc.ARCWriter
 
getMimetype() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getModule(String) - Method in class org.archive.crawler.settings.CrawlerSettings
 
getModule(String) - Method in class org.archive.crawler.settings.SettingsHandler
Get a module by name.
getMtime() - Method in class org.archive.io.GzipHeader
 
getName() - Method in interface org.archive.configuration.Configurable
 
getName() - Method in class org.archive.configuration.registry.CrawlOrder
 
getName() - Method in class org.archive.configuration.registry.TestProcessor
 
getName() - Method in class org.archive.crawler.datamodel.Checkpoint
 
getName() - Method in class org.archive.crawler.datamodel.CrawlServer
 
getName() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the name of this CrawlerSettings object.
getName() - Method in interface org.archive.crawler.settings.refinements.Criteria
Returns the name of the Criteria type.
getName() - Method in class org.archive.crawler.settings.refinements.PortnumberCriteria
 
getName() - Method in class org.archive.crawler.settings.refinements.RegularExpressionCriteria
 
getName() - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
 
getName() - Method in interface org.archive.crawler.url.CanonicalizationRule
 
getName() - Method in interface org.archive.io.arc.ARCLocation
 
getNeedReset() - Method in class org.archive.util.JEMBeanHelper
Tell the MBean if the available set of functionality has changed.
getNewAlerts() - Method in class org.archive.crawler.Heritrix
 
getNewAlertsCount() - Method in class org.archive.crawler.Heritrix
 
getNewAll() - Method in interface org.archive.crawler.framework.AlertManager
 
getNewCount() - Method in interface org.archive.crawler.framework.AlertManager
 
getNewJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
Get the handler's 'new job'
getNextCheckpoint() - Method in class org.archive.crawler.framework.Checkpointer
 
getNextCheckpointName() - Method in class org.archive.crawler.framework.Checkpointer
 
getNextDirectory(List) - Method in class org.archive.io.arc.ARCWriter
 
getNextItemNumber() - Method in interface org.archive.crawler.framework.FrontierMarker
Returns the number of the next match after the marker.
getNextItemNumber() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
getNextJobUID() - Method in class org.archive.crawler.admin.CrawlJobHandler
Returns a unique job ID.
getNextNearestItem(DatabaseEntry, DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
 
getNextProcessorChain() - Method in class org.archive.crawler.framework.ProcessorChain
Get the processor chain that the URI should be working through after finishing this one.
getNextReadyTime() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns the time when the HQ will next be ready to issue a URI.
getNoJmxName() - Method in class org.archive.crawler.Heritrix
 
getNotificationInfo(Environment) - Method in class org.archive.util.JEMBeanHelper
No notifications are supported.
getNotificationsSequenceNumber() - Static method in class org.archive.crawler.admin.CrawlJob
 
getNullOrAttribute(String, Object) - Method in class org.archive.crawler.url.canonicalize.RegexRule
 
getNumActive() - Method in class org.archive.io.arc.ARCWriterPool
 
getNumberOfJournalEntries() - Method in class org.archive.crawler.admin.CrawlJob
 
getNumIdle() - Method in class org.archive.io.arc.ARCWriterPool
 
getObject(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
getObjectName() - Method in class org.archive.configuration.Pointer
 
getObjectName(String, Class) - Method in class org.archive.configuration.registry.JmxRegistry
 
getObjectName(String, Class, String) - Method in class org.archive.configuration.registry.JmxRegistry
Create an ObjectName Default access so can be used by unit tests.
getObjectName() - Method in class org.archive.configuration.StoreElement
 
getOffset() - Method in interface org.archive.io.arc.ARCLocation
 
getOffset() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getOpenType(String) - Static method in class org.archive.util.JmxUtils
 
getOpenType(String, OpenType) - Static method in class org.archive.util.JmxUtils
 
getOperationList(Environment) - Method in class org.archive.util.JEMBeanHelper
Get mbean operation metadata for this environment.
getOperationNames() - Method in class org.archive.configuration.Configuration
 
getOperator() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the name of operator of this crawl from this CrawlerSettings object.
getOperator() - Method in class org.archive.crawler.settings.refinements.Refinement
 
getOrCreateSettingsObject(String) - Method in class org.archive.crawler.settings.SettingsHandler
Get or create CrawlerSettings object for a host or domain.
getOrCreateSettingsObject(String, String) - Method in class org.archive.crawler.settings.SettingsHandler
 
getOrder() - Method in class org.archive.crawler.framework.CrawlController
 
getOrder() - Method in class org.archive.crawler.settings.SettingsHandler
Get the CrawlOrder.
getOrderFile() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getOrderFile() - Method in class org.archive.crawler.settings.XMLSettingsHandler
Get the File object pointing to the order file.
getOrdinal() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the ordinal (serial number) assigned at creation.
getOrganization() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the name of the organization running this crawl from this CrawlerSettings object.
getOrganization() - Method in class org.archive.crawler.settings.refinements.Refinement
 
getOs() - Method in class org.archive.io.GzipHeader
 
getOutLinks() - Method in class org.archive.crawler.datamodel.CrawlURI
 
getOutputDirs() - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
getOutputDirs() - Method in class org.archive.io.arc.ARCWriter.ARCWriterSettingsImpl
 
getOutputDirs() - Method in interface org.archive.io.arc.ARCWriterSettings
 
getOwner() - Method in class org.archive.crawler.settings.Constraint.FailedCheck
Get the ComplexType owning the checked attribute.
getParams() - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
Returns parameters associated with this connection manager.
getParent() - Method in class org.archive.crawler.settings.ComplexType
Get the parent of this ComplexType.
getParent() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the parent of this CrawlerSettings object.
getParent(UURI) - Method in class org.archive.crawler.settings.CrawlerSettings
Get the parent of this CrawlerSettings object.
getParentScope(String) - Method in class org.archive.crawler.settings.SettingsHandler
Strip off the leftmost part of a domain name.
getPassword(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
getPath() - Method in class org.archive.net.LaxURI
 
getPathFromSeed() - Method in class org.archive.crawler.datamodel.CandidateURI
 
getPathQuery() - Method in class org.archive.net.LaxURI
 
getPathRelativeToWorkingDirectory(String) - Method in class org.archive.crawler.settings.SettingsHandler
Transforms a relative path so that it is relative to a location that is regarded as a working dir for these settings.
getPathRelativeToWorkingDirectory(String) - Method in class org.archive.crawler.settings.XMLSettingsHandler
Transforms a relative path so that it is relative to the location of the order file.
getPattern() - Method in class org.archive.util.PatternMatcherRecycler
 
getPayload() - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
getPendingJobs() - Method in class org.archive.crawler.admin.CrawlJobHandler
A List of all pending jobs
getPendingURIsList(FrontierMarker, int, boolean) - Method in class org.archive.crawler.admin.CrawlJob
Returns the frontiers URI list based on the provided marker.
getPendingURIsList(FrontierMarker, int, boolean) - Method in class org.archive.crawler.admin.CrawlJobHandler
Returns the frontiers URI list based on the provided marker.
getPerDomainSettings() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getPerHostSettings() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getPoolMaximumActive() - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
getPoolMaximumWait() - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
getPoolState() - Method in class org.archive.io.arc.ARCWriterPool
 
getPoolState(long) - Method in class org.archive.io.arc.ARCWriterPool
 
getPort() - Method in class org.archive.crawler.datamodel.CrawlServer
Get the port number for this server.
getPort() - Method in class org.archive.crawler.SimpleHttpServer
 
getPortNumber() - Method in class org.archive.crawler.settings.refinements.PortnumberCriteria
Get the port number that is to be checked against a URI.
getPosition() - Method in class org.archive.io.arc.ARCWriter
 
getPostprocessorChain() - Method in class org.archive.crawler.framework.CrawlController
Get the postprocessor chain.
getPredecessorCheckpoints() - Method in class org.archive.crawler.framework.Checkpointer
 
getPrefixClassKey(byte[]) - Static method in class org.archive.crawler.frontier.BdbWorkQueue
 
getPrefixes() - Method in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
Synchronized get of prefix set to use
getPrefixes(Object) - Method in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
Synchronized get of prefix set to use.
getPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
Return the authentication URI, either absolute or relative, that serves as prerequisite the passed curi.
getPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
getPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
getPrerequisiteUri() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the prerequisite for this URI.
getPreservedFields() - Method in class org.archive.crawler.settings.ComplexType
Get a list of attribute names that the complex type should attempt to preserve if the module is exchanged with an other one.
getProcessor(Class) - Method in class org.archive.crawler.framework.ProcessorChain
Get the first processor that is of class classType or a subclass of it.
getProcessorChain(int) - Method in class org.archive.crawler.framework.ProcessorChainList
Get a processor chain by its index in the list of chains.
getProcessorChain(String) - Method in class org.archive.crawler.framework.ProcessorChainList
Get a processor chain by its name.
getProcessorChainList() - Method in class org.archive.crawler.framework.CrawlController
Get the list of processor chains.
getProcessorsReport() - Method in class org.archive.crawler.admin.CrawlJob
Get the Processors report for the running crawl.
getProfiles() - Method in class org.archive.crawler.admin.CrawlJobHandler
Returns a List of all known profiles.
getProgressStatistics() - Method in class org.archive.crawler.admin.StatisticsTracker
 
getProgressStatistics() - Method in interface org.archive.crawler.framework.StatisticsTracking
 
getProgressStatisticsLine(Date) - Method in class org.archive.crawler.admin.StatisticsTracker
Return one line of current progress-statistics
getProgressStatisticsLine() - Method in class org.archive.crawler.admin.StatisticsTracker
Return one line of current progress-statistics
getProgressStatisticsLine() - Method in interface org.archive.crawler.framework.StatisticsTracking
 
getPropertiesInputStream() - Static method in class org.archive.crawler.Heritrix
 
getProperty(String) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
getPropertyOrNull(String) - Static method in class org.archive.util.PropertyUtils
 
getQueueFor(CrawlURI) - Method in class org.archive.crawler.frontier.BdbFrontier
Return the work queue for the given CrawlURI's classKey.
getQueueFor(String) - Method in class org.archive.crawler.frontier.BdbFrontier
Return the work queue for the given classKey, or null if no such queue exists.
getQueueFor(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return the work queue for the given CrawlURI's classKey.
getQueueFor(String) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return the work queue for the given classKey, or null if no such queue exists.
getReadReader() - Static method in class org.archive.crawler.selftest.SelfTestCase
Returns the selftest read ARCReader.
getRealm(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
getRecordedInput() - Method in class org.archive.util.HttpRecorder
Return the internal RecordingInputStream
getRecordedOutput() - Method in class org.archive.util.HttpRecorder
 
getRedirectUri() - Method in class org.archive.crawler.admin.SeedRecord
 
getReference() - Method in class org.archive.crawler.settings.refinements.Refinement
Get the reference to this refinement's settings object.
getReference(ObjectName) - Static method in class org.archive.util.JndiUtils
 
getReferencedHost() - Method in class org.archive.net.UURI
Return the referenced host in the UURI, if any, also extracting the host of a DNS-lookup URI where necessary.
getRefinement(String) - Method in class org.archive.crawler.settings.CrawlerSettings
Get a refinement with a given reference.
getRegexp(Object) - Method in class org.archive.crawler.deciderules.HopsPathMatchesRegExpDecideRule
Get the regular expression string to match the URI against.
getRegexp(Object) - Method in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
Use a preset if configured to do so.
getRegexp(Object) - Method in class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
Get the regular expressions list to match the URI against.
getRegexp(Object) - Method in class org.archive.crawler.deciderules.MatchesRegExpDecideRule
Get the regular expression string to match the URI against.
getRegexp(Object) - Method in class org.archive.crawler.deciderules.PathologicalPathDecideRule
Construct the regexp string to be matched against the URI.
getRegexp(Object) - Method in class org.archive.crawler.filter.FilePatternFilter
 
getRegexp(Object) - Method in class org.archive.crawler.filter.PathologicalPathFilter
Construct the regexp string to be matched aginst the URI.
getRegexp(Object) - Method in class org.archive.crawler.filter.URIListRegExpFilter
Get the regular expressions list to match the URI against.
getRegexp(Object) - Method in class org.archive.crawler.filter.URIRegExpFilter
Get the regular expression string to match the URI against.
getRegexp() - Method in class org.archive.crawler.settings.refinements.RegularExpressionCriteria
Get the regular expression to be matched against a URI.
getRegexpFileFilter(String) - Static method in class org.archive.util.FileUtils
Get a @link java.io.FileFilter that filters files based on a regular expression.
getRegisteredInstance(Registry) - Method in class org.archive.configuration.Pointer
Create an instance of pointed to Configurable with its Configuration registered in the passed Registry.
getRelativePath() - Method in class org.archive.crawler.writer.MirrorWriterProcessor.URIToFileReturn
Gets this path as a relative path from the base directory.
getReplayCharSequence() - Method in class org.archive.io.RecordingInputStream
 
getReplayCharSequence(String) - Method in class org.archive.io.RecordingInputStream
 
getReplayCharSequence() - Method in class org.archive.io.RecordingOutputStream
 
getReplayCharSequence(String) - Method in class org.archive.io.RecordingOutputStream
 
getReplayCharSequence(byte[], long, long, String, String) - Method in class org.archive.io.ReplayCharSequenceFactory
Return appropriate ReplayCharSequence switching off passed encoding.
getReplayCharSequence() - Method in class org.archive.util.HttpRecorder
 
getReplayInputStream() - Method in class org.archive.io.RecordingInputStream
 
getReplayInputStream() - Method in class org.archive.io.RecordingOutputStream
 
getReplayInputStream() - Method in class org.archive.util.HttpRecorder
 
getReports() - Method in class org.archive.crawler.datamodel.CandidateURI
 
getReports() - Method in class org.archive.crawler.framework.CrawlController
 
getReports() - Method in class org.archive.crawler.framework.ToePool
 
getReports() - Method in class org.archive.crawler.framework.ToeThread
 
getReports() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
getReports() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
getReports() - Method in class org.archive.crawler.frontier.WorkQueue
 
getReports() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getReports() - Method in interface org.archive.util.Reporter
Get an array of report names offered by this Reporter.
getResponseContentLength() - Method in class org.archive.io.RecordingInputStream
 
getResponseContentLength() - Method in class org.archive.io.RecordingOutputStream
 
getResponseContentLength() - Method in class org.archive.util.HttpRecorder
 
getResult() - Method in class org.archive.util.ProcessUtils.ProcessResult
 
getReverseSortedCopy(Map) - Method in class org.archive.crawler.admin.StatisticsTracker
Sort the entries of the given HashMap in descending order by their values, which must be longs wrapped with LongWrapper.
getReverseSortedHostsDistribution() - Method in class org.archive.crawler.admin.StatisticsTracker
Return a copy of the hosts distribution in reverse-sorted (largest first) order.
getRobots() - Method in class org.archive.crawler.datamodel.CrawlServer
Get the robots exclusion policy for this server.
getRobotsFetchedTime() - Method in class org.archive.crawler.datamodel.CrawlServer
 
getRobotsHonoringPolicy() - Method in class org.archive.crawler.datamodel.CrawlOrder
This method gets the RobotsHonoringPolicy object from the orders file.
getRobotsValidityDuration(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Get the maximum time a robots.txt is valid.
getRootWebappName() - Static method in class org.archive.crawler.SimpleHttpServer
 
getRules(Object) - Method in class org.archive.crawler.deciderules.DecideRuleSequence
 
getSchedulingDirective() - Method in class org.archive.crawler.datamodel.CandidateURI
 
getSchedulingFor(Link, boolean) - Method in class org.archive.crawler.postprocessor.LinksScoper
 
getScope(Object) - Method in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
Decide whether using host or domain scope
getScope() - Method in class org.archive.crawler.framework.CrawlController
 
getScope() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the scope of this CrawlerSettings object.
getScratchDisk() - Method in class org.archive.crawler.framework.CrawlController
 
getSeedCollection() - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getSeedFile(SettingsHandler) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
 
getSeedfile() - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Dig through everything to get the crawl-global seeds file.
getSeedfile() - Method in class org.archive.crawler.framework.CrawlScope
 
getSeedForUrl(String) - Method in class org.archive.crawler.util.RecoveryLogMapper
Returns seed for urlString (null if seed not found).
getSeedRecordsSortedByStatusCode() - Method in class org.archive.crawler.admin.StatisticsTracker
 
getSeedRecordsSortedByStatusCode(Iterator) - Method in class org.archive.crawler.admin.StatisticsTracker
 
getSeedRecordsSortedByStatusCode() - Method in interface org.archive.crawler.framework.StatisticsTracking
Get a SeedRecord iterator for the job being monitored.
getSeeds() - Method in class org.archive.crawler.admin.StatisticsTracker
Get a seed iterator for the job being monitored.
getSeedStream(SettingsHandler) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Return seeds as a stream.
getSeedUrlToDiscoveredUrlsMap() - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getSelftestURL() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getSelftestURLWithTrailingSlash() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getSerialNo() - Static method in class org.archive.io.arc.ARCWriter
 
getSerialNumber() - Method in class org.archive.crawler.framework.ToeThread
 
getServer(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getServer(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
getServer() - Method in class org.archive.crawler.SimpleHttpServer
 
getServerCache() - Method in class org.archive.crawler.framework.CrawlController
 
getServerDetail(MBeanServer) - Static method in class org.archive.util.JmxUtils
 
getServerFor(String) - Method in class org.archive.crawler.datamodel.ServerCache
Get the CrawlServer associated with name.
getServerFor(CrawlURI) - Method in class org.archive.crawler.datamodel.ServerCache
Get the CrawlServer associated with curi.
getServerKey(CrawlURI) - Static method in class org.archive.crawler.datamodel.CrawlServer
Get key to use doing lookup on server instances.
getServerLogging() - Method in class org.archive.crawler.SimpleHttpServer
Setup log files.
getSessionBalance() - Method in class org.archive.crawler.frontier.WorkQueue
Return current session 'activity budget balance'
getSettings() - Method in class org.archive.crawler.settings.Constraint.FailedCheck
Get the CrawlerSettings for the checked attribute.
getSettings() - Method in class org.archive.crawler.settings.DataContainer
Get the settings object for which this DataContainers data are valid.
getSettings() - Method in class org.archive.crawler.settings.refinements.Refinement
Get the CrawlerSettings object this refinement refers to.
getSettings(String, String) - Method in class org.archive.crawler.settings.SettingsCache
Get the effective settings for a host.
getSettings(String) - Method in class org.archive.crawler.settings.SettingsHandler
Get CrawlerSettings object in effect for a host or domain.
getSettings(String, UURI) - Method in class org.archive.crawler.settings.SettingsHandler
Get CrawlerSettings object in effect for a host or domain.
getSettings() - Method in class org.archive.io.arc.ARCWriter
 
getSettings() - Method in class org.archive.io.arc.ARCWriterPool
 
getSettingsDir(String) - Method in class org.archive.crawler.datamodel.CrawlOrder
Return fullpath to the directory named by key in settings.
getSettingsDir(String) - Method in class org.archive.crawler.framework.CrawlController
Return fullpath to the directory named by key in settings.
getSettingsDir() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getSettingsDirectory() - Method in class org.archive.crawler.admin.CrawlJob
Returns the directory where the configuration files for this job are located.
getSettingsForHost(String) - Method in class org.archive.crawler.settings.SettingsHandler
 
getSettingsFromObject(Object, String) - Method in class org.archive.crawler.settings.ComplexType
Get settings object valid for a URI.
getSettingsFromObject(Object) - Method in class org.archive.crawler.settings.ComplexType
Get settings object valid for a URI.
getSettingsHandler() - Method in class org.archive.crawler.admin.CrawlJob
Returns the settings handler for this job.
getSettingsHandler() - Method in class org.archive.crawler.datamodel.CrawlServer
Get the settings handler.
getSettingsHandler() - Method in class org.archive.crawler.framework.CrawlController
 
getSettingsHandler() - Method in class org.archive.crawler.settings.ComplexType
 
getSettingsHandler() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the SettingHandler this CrawlerSettings object belongs to.
getSettingsHandler() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getSettingsObject(String, String) - Method in class org.archive.crawler.settings.SettingsCache
Get a settings object.
getSettingsObject(String) - Method in class org.archive.crawler.settings.SettingsHandler
Get CrawlerSettings object for a host or domain.
getSettingsObject(String, String) - Method in class org.archive.crawler.settings.SettingsHandler
Get CrawlerSettings object for a host/domain and a particular refinement.
getShortArcFileName(ARCRecordMetaData) - Static method in class org.archive.io.arc.ARCReader
 
getShortMessage() - Method in class org.archive.io.SinkHandlerLogRecord
 
getShutdownThread(boolean, int, String) - Static method in class org.archive.crawler.Heritrix
 
getSingleInstance() - Static method in class org.archive.crawler.Heritrix
 
getSink() - Method in class org.archive.util.ProcessUtils.StreamGobbler
 
getSize() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns the size of the HQ.
getSize() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
Returns the number of URIs in all the HQs in this list
getSize() - Method in class org.archive.io.RecordingInputStream
 
getSize() - Method in class org.archive.io.RecordingOutputStream
 
getSize() - Method in class org.archive.io.ReplayInputStream
Total size of stream content.
getSizeBytes() - Method in interface org.archive.util.BloomFilter
The amount of memory in bytes consumed by the bloom bitfield.
getSizeBytes() - Method in class org.archive.util.BloomFilter32bit
 
getSizeBytes() - Method in class org.archive.util.BloomFilter32bitSplit
 
getSizeBytes() - Method in class org.archive.util.BloomFilter32bp2
 
getSizeBytes() - Method in class org.archive.util.BloomFilter32bp2Split
 
getSizeBytes() - Method in class org.archive.util.BloomFilter64bit
 
getSlotState(long) - Method in class org.archive.util.AbstractLongFPSet
Check the state of a slot in the storage.
getSlotState(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
getSocketFactory() - Static method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
 
getSorted() - Method in class org.archive.util.Histotable
 
getSortedDirContent(File, FilenameFilter) - Static method in class org.archive.util.FileUtils
 
getSource() - Method in class org.archive.crawler.extractor.Link
 
getStackTrace() - Method in exception org.archive.io.arc.ARCReader.RecoverableIOException
 
getStartKey() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
getState() - Method in class org.archive.crawler.framework.CrawlController
 
getState() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns the current state of the HQ.
getStateByName() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Same as getState() except this method returns a human readable name for the state instead of its constant integer value.
getStateDisk() - Method in class org.archive.crawler.framework.CrawlController
 
getStateJobFile(File) - Method in class org.archive.crawler.admin.CrawlJobHandler
Find the state.job file in the job directory.
getStatistics() - Method in class org.archive.crawler.framework.CrawlController
 
getStatisticsTracking() - Method in class org.archive.crawler.admin.CrawlJob
 
getStatus() - Method in class org.archive.crawler.admin.CrawlJob
Get the current status of this CrawlJob
getStatus() - Method in class org.archive.crawler.Heritrix
 
getStatusCode() - Method in class org.archive.crawler.admin.SeedRecord
 
getStatusCode() - Method in class org.archive.io.arc.ARCRecord
Return status code for this record.
getStatusCode() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getStatusCodeDistribution() - Method in class org.archive.crawler.admin.StatisticsTracker
Return a HashMap representing the distribution of status codes for successfully fetched curis, as represented by a hashmap where key -> val represents (string)code -> (integer)count.
getStderr() - Method in class org.archive.util.ProcessUtils.ProcessResult
 
getStdout() - Method in class org.archive.util.ProcessUtils.ProcessResult
 
getStep() - Method in class org.archive.crawler.framework.ToeThread
 
getString(String) - Method in class org.archive.configuration.Configuration
 
getString(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
getStringValue() - Method in class org.archive.crawler.util.StringIntPair
 
getSubContext(String) - Static method in class org.archive.util.JndiUtils
Get subcontext.
getSubContext(CompoundName) - Static method in class org.archive.util.JndiUtils
Get subcontext.
getSubDir(String) - Static method in class org.archive.crawler.Heritrix
Get and check for existence of expected subdir.
getSubDir(String, boolean) - Static method in class org.archive.crawler.Heritrix
Get and optionally check for existence of subdir.
getSubstats() - Method in class org.archive.crawler.datamodel.CrawlHost
 
getSubstats() - Method in class org.archive.crawler.datamodel.CrawlServer
 
getSubstats() - Method in interface org.archive.crawler.datamodel.CrawlSubstats.HasCrawlSubstats
 
getSubstats() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
 
getSubstats() - Method in class org.archive.crawler.frontier.WorkQueue
 
getSuccessBytes() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getSuccessfullyCrawledUrls() - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getSurtAuthority(String) - Method in class org.archive.crawler.frontier.SurtAuthorityQueueAssignmentPolicy
 
getSurtForm() - Method in class org.archive.net.UURI
 
getTestName() - Method in class org.archive.crawler.selftest.SelfTestCase
Calculates test name by stripping SelfTest from current class name.
getThreadNumber() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the number of the ToeThread responsible for processing this uri.
getThreadOneLine() - Method in class org.archive.crawler.admin.CrawlJob
 
getThreadsReport() - Method in class org.archive.crawler.admin.CrawlJob
Get the CrawlControllers ToeThreads report for the running crawl.
getThrown() - Method in class org.archive.io.SinkHandlerLogRecord
 
getThrownToString() - Method in class org.archive.io.SinkHandlerLogRecord
 
getTimestamp() - Method in class org.archive.crawler.datamodel.Checkpoint
 
getTmpDir() - Method in class org.archive.util.TmpDirTestCase
 
getTo() - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
Get the end of the time frame to check against.
getToeCount() - Method in class org.archive.crawler.framework.CrawlController
 
getToeCount() - Method in class org.archive.crawler.framework.ToePool
 
getToePool() - Method in class org.archive.crawler.framework.CrawlController
 
getTopHQ() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
getTopLevelModule(String) - Method in class org.archive.crawler.settings.CrawlerSettings
 
getTotalBytes() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getTotalExpenditure() - Method in class org.archive.crawler.frontier.WorkQueue
Return the tally of all expenditures on this queue
getTransHops() - Method in class org.archive.crawler.datamodel.CandidateURI
Tally up the number of transitive (non-simple-link) hops at the end of this CandidateURI's pathFromSeed.
getType() - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
getType(Object) - Method in class org.archive.crawler.datamodel.RobotsHonoringPolicy
Get the policy-type.
getType() - Method in class org.archive.crawler.settings.ModuleAttributeInfo
 
getTypeName(String) - Static method in class org.archive.crawler.settings.SettingsHandler
 
getUID() - Method in class org.archive.crawler.admin.CrawlJob
Returns this jobs unique ID (UID) that was issued by the CrawlJobHandler() when this job was first created.
getUid(ObjectName) - Static method in class org.archive.util.JmxUtils
Returns the UID portion of the name key property of an object name representing a "CrawlService.Job" bean.
getUncheckedAttribute(Object, String) - Method in class org.archive.crawler.settings.ComplexType
Obtain the value of a specific attribute that is valid for a specific CrawlerSettings object.
getUncheckedAttribute(Object, String) - Method in class org.archive.crawler.settings.MapType
 
getUnMatchedURI() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getUnreadCount() - Method in class org.archive.io.SinkHandler
 
getUri() - Method in class org.archive.crawler.admin.SeedRecord
 
getUri() - Method in class org.archive.crawler.util.BdbUriUniqFilterTest
 
getUri() - Method in class org.archive.crawler.util.BloomUriUniqFilterTest
 
getUri() - Method in class org.archive.crawler.util.FPUriUniqFilterTest
 
getURI() - Method in class org.archive.net.LaxURI
 
getUriCount() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
The total number of URIs queued in all the HQs belonging to this list.
getURIs() - Method in class org.archive.crawler.extractor.PDFParser
Get a list of URIs retrieved from the Pdf during the extractURIs operation.
getURIsList(FrontierMarker, int, boolean) - Method in interface org.archive.crawler.framework.Frontier
Returns a list of all uncrawled URIs starting from a specified marker until numberOfMatches is reached.
getURIsList(FrontierMarker, int, boolean) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
getURIsList(FrontierMarker, int, boolean) - Method in class org.archive.crawler.frontier.BdbFrontier
Return list of urls.
getURIString() - Method in class org.archive.crawler.datamodel.CandidateURI
Deprecated. Use CandidateURI.toString().
getURL(String, String) - Method in class org.archive.crawler.extractor.CrawlUriSWFAction
Overwrite handling of discovered URIs.
getUrl() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getUserAgent(CrawlURI) - Method in class org.archive.crawler.datamodel.CrawlOrder
 
getUserAgent() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the user agent to use for crawling this URI.
getUserAgents(CrawlerSettings) - Method in class org.archive.crawler.datamodel.RobotsHonoringPolicy
If policy-type is most favored crawler of set, then this method gets a list of all useragents in that set.
getUURI() - Method in class org.archive.crawler.datamodel.CandidateURI
 
getUURI(String) - Method in class org.archive.crawler.extractor.ExtractorHTMLTest
 
getValue() - Method in class org.archive.crawler.settings.ComplexType
Returns this object.
getValue() - Method in class org.archive.crawler.settings.Constraint.FailedCheck
Get the value of the checked attribute.
getValue() - Method in class org.archive.crawler.settings.ListType
Returns this object.
getValue() - Method in class org.archive.crawler.settings.SoftSettingsHash.SettingsEntry
 
getVersion() - Method in class org.archive.crawler.CommandLineParser
 
getVersion() - Static method in class org.archive.crawler.Heritrix
Get the heritrix version.
getVersion() - Method in class org.archive.io.arc.ARCReader
Returns version of this ARC file.
getVersion() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getVia() - Method in class org.archive.crawler.datamodel.CandidateURI
 
getViaContext() - Method in class org.archive.crawler.datamodel.CandidateURI
 
getWakeTime() - Method in class org.archive.crawler.frontier.WorkQueue
 
getWarsdir() - Static method in class org.archive.crawler.Heritrix
 
getWebappPath(String) - Method in class org.archive.crawler.SimpleHttpServer
Get path to named webapp.
getWorkQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
 
getXfl() - Method in class org.archive.io.GzipHeader
 
getXMLReader() - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
globalSettings() - Method in class org.archive.crawler.settings.ComplexType
Get the global settings object (aka order).
gotoEOR(ARCRecord) - Method in class org.archive.io.arc.ARCReader
Skip over any trailing new lines at end of the record so we're lined up ready to read the next.
gotoEOR(int) - Method in class org.archive.io.GzippedInputStream
Exhaust current GZIP member content.
gotoEOR() - Method in class org.archive.io.GzippedInputStream
Exhaust current GZIP member content.
GUI_PORT - Static variable in class org.archive.util.JmxUtils
 
gzip(byte[]) - Static method in class org.archive.io.GzippedInputStream
Gzip passed bytes.
GZIP_HEADER_BEGIN - Static variable in interface org.archive.io.arc.ARCConstants
Start of a GZIP header that uses default deflater.
GZIP_SUFFIX - Static variable in class org.archive.crawler.frontier.RecoveryJournal
 
GzipHeader - Class in org.archive.io
Read in the GZIP header.
GzipHeader() - Constructor for class org.archive.io.GzipHeader
Shutdown constructor.
GzipHeader(InputStream) - Constructor for class org.archive.io.GzipHeader
Constructor.
GZIPMEMBER_COUNT - Static variable in class org.archive.io.GzippedInputStreamTest
Number of records in gzip member file.
gzipMemberSeek(long) - Method in class org.archive.io.GzippedInputStream
Seek to a gzip member.
gzipMemberSeek() - Method in class org.archive.io.GzippedInputStream
 
GzippedInputStream - Class in org.archive.io
Subclass of GZIPInputStream that can handle a stream made of multiple concatenated GZIP members/records.
GzippedInputStream(InputStream) - Constructor for class org.archive.io.GzippedInputStream
 
GzippedInputStream(InputStream, int) - Constructor for class org.archive.io.GzippedInputStream
 
GzippedInputStreamTest - Class in org.archive.io
 
GzippedInputStreamTest() - Constructor for class org.archive.io.GzippedInputStreamTest
 
GzippedInputStreamTest.RepositionableRandomAccessInputStream - Class in org.archive.io
 
GzippedInputStreamTest.RepositionableRandomAccessInputStream(File) - Constructor for class org.archive.io.GzippedInputStreamTest.RepositionableRandomAccessInputStream
 
GzippedInputStreamTest.RepositionableRandomAccessInputStream(File, long) - Constructor for class org.archive.io.GzippedInputStreamTest.RepositionableRandomAccessInputStream
 

H

handle401(HttpMethod, CrawlURI) - Method in class org.archive.crawler.fetcher.FetchHTTP
Server is looking for basic/digest auth credentials (RFC2617).
handleAddProxyConnectionHeader(HttpMethod) - Method in class org.archive.httpclient.HttpRecorderMethod
If a 'Proxy-Connection' header has been added to the request, it'll be of a 'keep-alive' type.
handleJobAction(CrawlJobHandler, HttpServletRequest, HttpServletResponse, String, String, String) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Handle job action.
handlePrerequisite(CrawlURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
The CrawlURI has a prerequisite; apply scoping and update Link to CandidateURI in manner analogous to outlink handling.
handlePrerequisites(CrawlURI) - Method in class org.archive.crawler.postprocessor.FrontierScheduler
 
Handler - Class in org.archive.net.rsync
A protocol handler that uses native rsync client to do copy.
Handler() - Constructor for class org.archive.net.rsync.Handler
 
handleValueError(Constraint.FailedCheck) - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
 
handleValueError(Constraint.FailedCheck) - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
 
handleValueError(Constraint.FailedCheck) - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
handleValueError(Constraint.FailedCheck) - Method in interface org.archive.crawler.settings.ValueErrorHandler
 
hasAttributes() - Method in class org.archive.crawler.settings.DataContainer
 
hasBdbjeLogs() - Method in class org.archive.crawler.datamodel.Checkpoint
 
hasBeenLinkExtracted() - Method in class org.archive.crawler.datamodel.CrawlURI
If true then a link extractor has already claimed this CrawlURI and performed link extraction on the document content.
hasBeenLookedUp() - Method in class org.archive.crawler.datamodel.CrawlHost
Return true if the IP for this host has been looked up.
hasCredentialAvatars() - Method in class org.archive.crawler.datamodel.CrawlServer
 
hasCredentialAvatars() - Method in class org.archive.crawler.datamodel.CrawlURI
 
hasError() - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
Has there been an error with severity (level) equal to or higher then this handlers set level.
hasError(Level) - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
Has there been an error with severity (level) equal to or higher then specified.
hash(String) - Static method in class org.archive.crawler.settings.SoftSettingsHash
Make hash value from a String.
HASH_COUNT_KEY - Static variable in class org.archive.crawler.util.BloomUriUniqFilter
 
hashCode() - Method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
All instances of DefaultProtocolSocketFactory have the same hash code.
hashCode() - Method in class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
 
hashCode() - Method in class org.archive.crawler.settings.TextField
 
hashSet - Variable in class org.archive.crawler.util.MemUriUniqFilter
 
hasNext() - Method in interface org.archive.crawler.framework.FrontierMarker
Returns false if no more URIs can be found matching the expression beyond those already covered.
hasNext() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
hasNext() - Method in class org.archive.crawler.settings.ComplexType.MBeanAttributeInfoIterator
 
hasNext() - Method in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
hasNext() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
Test whether any items remain; loads next item into holding 'next' field.
hasNext() - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
hasNext() - Method in class org.archive.io.arc.ARCReader.ARCRecordIterator
 
hasNext() - Method in class org.archive.util.iterator.CompositeIterator
 
hasNext() - Method in class org.archive.util.iterator.LookaheadIterator
Test whether any items remain; loads next item into holding 'next' field.
hasPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
 
hasPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
hasPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
hasPrerequisiteUri() - Method in class org.archive.crawler.datamodel.CrawlURI
 
hasRefinements() - Method in class org.archive.crawler.settings.CrawlerSettings
Returns true if this settings object has refinements attached to it.
hasRfc2617CredentialAvatar() - Method in class org.archive.crawler.datamodel.CrawlURI
 
hasScheme(String) - Static method in class org.archive.net.UURI
Test if passed String has likely URI scheme prefix.
hasSupportedScheme(String) - Static method in class org.archive.net.UURIFactory
Test of whether passed String has an allowed URI scheme.
haveSeen(int, int) - Method in class org.archive.crawler.extractor.PDFParser
Indicates, based on a PDFObject's generation/id pair whether the parser has already encountered this object (or a reference to it) so we don't infinitely loop on circuits within the PDF.
HEADER_FIELD_SEPARATOR - Static variable in interface org.archive.io.arc.ARCConstants
ARC header field seperator character.
HEADER_PREDICTS_CHANGED - Static variable in class org.archive.crawler.filter.HTTPMidFetchUnchangedFilter
 
HEADER_PREDICTS_MISSING - Static variable in class org.archive.crawler.filter.HTTPMidFetchUnchangedFilter
 
HEADER_PREDICTS_UNCHANGED - Static variable in class org.archive.crawler.filter.HTTPMidFetchUnchangedFilter
 
headerFields - Variable in class org.archive.io.arc.ARCRecordMetaData
Map of record header fields.
height() - Method in interface org.archive.queue.Stack
Number of items in the Stack.
Heritrix - Class in org.archive.crawler
Main class for Heritrix crawler.
Heritrix() - Constructor for class org.archive.crawler.Heritrix
Constructor.
Heritrix(boolean) - Constructor for class org.archive.crawler.Heritrix
 
Heritrix(String, boolean) - Constructor for class org.archive.crawler.Heritrix
Constructor.
Heritrix(String, boolean, CrawlJobHandler) - Constructor for class org.archive.crawler.Heritrix
Constructor.
HeritrixHttpMethodRetryHandler - Class in org.archive.crawler.fetcher
Retry handler that tries ten times to establish connection and then once established, if a GET method, tries ten times to get response (If POST, it tries once only).
HeritrixHttpMethodRetryHandler() - Constructor for class org.archive.crawler.fetcher.HeritrixHttpMethodRetryHandler
Constructor.
HeritrixHttpMethodRetryHandler(int) - Constructor for class org.archive.crawler.fetcher.HeritrixHttpMethodRetryHandler
Constructor.
HeritrixProtocolSocketFactory - Class in org.archive.crawler.fetcher
Version of protocol socket factory that tries to get IP from heritrix IP cache.
HeritrixSSLProtocolSocketFactory - Class in org.archive.crawler.fetcher
Implementation of the commons-httpclient SSLProtocolSocketFactory so we can return SSLSockets whose trust manager is ConfigurableX509TrustManager.
HeritrixSSLProtocolSocketFactory() - Constructor for class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
 
HeritrixSSLProtocolSocketFactory(String) - Constructor for class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
Constructor.
HIGH - Static variable in class org.archive.crawler.datamodel.CandidateURI
High scheduling priority.
HIGHEST - Static variable in class org.archive.crawler.datamodel.CandidateURI
Highest scheduling priority.
highestEncounteredLevel - Variable in class org.archive.crawler.admin.CrawlJobErrorHandler
 
Histotable - Class in org.archive.util
Collect and report frequency information.
Histotable() - Constructor for class org.archive.util.Histotable
 
holder - Variable in class org.archive.crawler.datamodel.CrawlURI
 
holderCost - Variable in class org.archive.crawler.datamodel.CrawlURI
spot for an integer cost to be placed by external facility (frontier).
holderKey - Variable in class org.archive.crawler.datamodel.CrawlURI
 
holeyUrl(String, boolean, String) - Method in class org.archive.io.arc.ARCWriterTest
 
honoringPolicy - Variable in class org.archive.crawler.datamodel.RobotsExclusionPolicy
 
honorRobots - Variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
HopsFilter - Class in org.archive.crawler.filter
Accepts (returns for)) for all CandidateURIs passed in with a link-hop-count greater than the max-link-hops value.
HopsFilter(String) - Constructor for class org.archive.crawler.filter.HopsFilter
 
HopsPathMatchesRegExpDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any CrawlURIs whose 'hops-path' (string like "LLXE" etc.) matches the supplied regexp.
HopsPathMatchesRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.HopsPathMatchesRegExpDecideRule
Usual constructor.
HOST - Static variable in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
 
HOST - Static variable in class org.archive.util.JmxUtils
 
HOST_DEFERRED - Static variable in interface org.archive.crawler.framework.FrontierHostStatistics
Host has been deferred for some amount of time, will become ready once once that time has elapsed.
HOST_INACTIVE - Static variable in interface org.archive.crawler.framework.FrontierHostStatistics
Host has been encountered and all availible URIs for it have been processed already.
HOST_INPROCESS - Static variable in interface org.archive.crawler.framework.FrontierHostStatistics
Host has URIs currently being proessed.
HOST_READY - Static variable in interface org.archive.crawler.framework.FrontierHostStatistics
Host has URIs ready to be emited.
HOST_UNKNOWN - Static variable in interface org.archive.crawler.framework.FrontierHostStatistics
Host has not been encountered by the Frontier, or has been encountered but has been inactive so long that it has expired.
hostName - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Name of the host that this AdaptiveRevisitHostQueue represents
HOSTNAME_VARIABLE - Static variable in class org.archive.crawler.writer.ARCWriterProcessor
Value to interpolate with actual hostname.
HostnameQueueAssignmentPolicy - Class in org.archive.crawler.frontier
QueueAssignmentPolicy based on the hostname:port evident in the given CrawlURI.
HostnameQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
 
hosts - Variable in class org.archive.crawler.datamodel.ServerCache
hostname -> CrawlHost.
hostsBytes - Variable in class org.archive.crawler.admin.StatisticsTracker
 
HostScope - Class in org.archive.crawler.scope
A core CrawlScope suitable for the most common crawl needs.
HostScope(String) - Constructor for class org.archive.crawler.scope.HostScope
 
hostsDistribution - Variable in class org.archive.crawler.admin.StatisticsTracker
Keep track of hosts.
hostsLastFinished - Variable in class org.archive.crawler.admin.StatisticsTracker
 
hostStatus(String) - Method in interface org.archive.crawler.framework.FrontierHostStatistics
Get the status of a host.
HQSTATE_BUSY - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
HQ has maximum number of CrawlURI currently being processed.
HQSTATE_EMPTY - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
HQ contains no queued CrawlURIs elements.
HQSTATE_READY - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
HQ has a CrawlURI ready for processing
HQSTATE_SNOOZED - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
HQ is in a suspended state until it can be woken back up
HtmlFormCredential - Class in org.archive.crawler.datamodel.credential
Credential that holds all needed to do a GET/POST to a HTML form.
HtmlFormCredential(String) - Constructor for class org.archive.crawler.datamodel.credential.HtmlFormCredential
Constructor.
HTTP - Static variable in class org.archive.net.UURIFactory
 
HTTP_PORT - Static variable in class org.archive.net.UURIFactory
 
HTTP_SCHEME - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
HTTP_SCHEME - Static variable in class org.archive.net.LaxURI
 
HTTP_SCHEME_SLASHES - Static variable in class org.archive.net.UURIFactory
Pattern that looks for case of three or more slashes after the scheme.
HTTPContentDigest - Class in org.archive.crawler.extractor
A processor for calculating custum HTTP content digests in place of the default (if any) computed by the HTTP fetcer processors.
HTTPContentDigest(String) - Constructor for class org.archive.crawler.extractor.HTTPContentDigest
Constructor
HTTPMidFetchUnchangedFilter - Class in org.archive.crawler.filter
A mid fetch filter for HTTP fetcher processors.
HTTPMidFetchUnchangedFilter(String) - Constructor for class org.archive.crawler.filter.HTTPMidFetchUnchangedFilter
Constructor
HTTPMidFetchUnchangedFilter(String, String) - Constructor for class org.archive.crawler.filter.HTTPMidFetchUnchangedFilter
Constructor
HttpRecorder - Class in org.archive.util
Pairs together a RecordingInputStream and RecordingOutputStream to capture exactly a single HTTP transaction.
HttpRecorder() - Constructor for class org.archive.util.HttpRecorder
Constructor with limited access.
HttpRecorder(File, String, int, int) - Constructor for class org.archive.util.HttpRecorder
Create an HttpRecorder.
HttpRecorder(File, String) - Constructor for class org.archive.util.HttpRecorder
Create an HttpRecorder.
HttpRecorderGetMethod - Class in org.archive.httpclient
Override of GetMethod that marks the passed HttpRecorder w/ the transition from HTTP head to body and that forces a close on the http connection.
HttpRecorderGetMethod(String, HttpRecorder) - Constructor for class org.archive.httpclient.HttpRecorderGetMethod
 
HttpRecorderMarker - Interface in org.archive.util
A marker interface to denote a class with a gettable HttpRecorder.
httpRecorderMethod - Variable in class org.archive.httpclient.HttpRecorderGetMethod
Instance of http recorder method.
HttpRecorderMethod - Class in org.archive.httpclient
This class encapsulates the specializations supplied by the overrides HttpRecorderGetMethod and HttpRecorderPostMethod.
HttpRecorderMethod(HttpRecorder) - Constructor for class org.archive.httpclient.HttpRecorderMethod
 
httpRecorderMethod - Variable in class org.archive.httpclient.HttpRecorderPostMethod
Instance of http recorder method.
HttpRecorderPostMethod - Class in org.archive.httpclient
Override of PostMethod that marks the passed HttpRecorder w/ the transition from HTTP head to body and that forces a close on the responseConnection.
HttpRecorderPostMethod(String, HttpRecorder) - Constructor for class org.archive.httpclient.HttpRecorderPostMethod
 
HTTPS - Static variable in class org.archive.net.UURIFactory
 
HTTPS_PORT - Static variable in class org.archive.net.UURIFactory
 
HTTPS_SCHEME - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
HTTPS_SCHEME - Static variable in class org.archive.net.LaxURI
 

I

IFRAME - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
IGNORE - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
ignored - Variable in class org.archive.crawler.scope.SeedFileIterator
 
IGNORED_SCHEME - Static variable in class org.archive.net.UURIFactory
 
IGNORED_SEEDS_FILENAME - Static variable in class org.archive.crawler.frontier.AbstractFrontier
file collecting report of ignored seed-file entries (if any)
ignoreLine - Variable in class org.archive.util.iterator.RegexpLineIterator
 
ignoreUnexpectedHTML - Variable in class org.archive.crawler.extractor.ExtractorHTML
 
illegalElementError(String) - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
 
IMAGES - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
IMAGES - Static variable in class org.archive.crawler.filter.FilePatternFilter
 
IMAGES_PATTERNS - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
IMAGES_PATTERNS - Static variable in class org.archive.crawler.filter.FilePatternFilter
 
ImageWaitEvaluator - Class in org.archive.crawler.postprocessor
A specialized ContentBasedWaitEvaluator.
ImageWaitEvaluator(String) - Constructor for class org.archive.crawler.postprocessor.ImageWaitEvaluator
Constructor
importFrom(Reader) - Method in class org.archive.util.SurtPrefixSet
Read a set of SURT prefixes from a reader source; keep sorted and with redundant entries removed.
importFromMixed(Reader, boolean) - Method in class org.archive.util.SurtPrefixSet
Import SURT prefixes from a file with mixed URI and SURT prefix format.
importFromUris(Reader) - Method in class org.archive.util.SurtPrefixSet
 
importRecoverLog(String, boolean) - Method in interface org.archive.crawler.framework.Frontier
Recover earlier state by reading a recovery log.
importRecoverLog(String, boolean) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
importRecoverLog(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Method is not supported by this Frontier implementation..
importRecoverLog(String, boolean) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
This method is not supported by this Frontier implementation
importRecoverLog(File, Frontier, boolean) - Static method in class org.archive.crawler.frontier.RecoveryJournal
Utility method for scanning a recovery journal and applying it to a Frontier.
importUri(String, boolean, boolean) - Method in class org.archive.crawler.admin.CrawlJob
Schedule a uri.
importUri(String, boolean, boolean, boolean) - Method in class org.archive.crawler.admin.CrawlJob
Schedule a uri.
importUri(String, boolean, boolean) - Method in class org.archive.crawler.admin.CrawlJobHandler
Schedule a uri.
importUri(String, boolean, boolean, boolean) - Method in class org.archive.crawler.admin.CrawlJobHandler
Schedule a uri.
importUris(String, String, String) - Method in class org.archive.crawler.admin.CrawlJob
 
importUris(String, String, boolean) - Method in class org.archive.crawler.admin.CrawlJob
 
importUris(String, String, boolean, boolean) - Method in class org.archive.crawler.admin.CrawlJob
 
importUris(InputStream, String, boolean) - Method in class org.archive.crawler.admin.CrawlJob
 
importUris(InputStream, String, boolean, boolean) - Method in class org.archive.crawler.admin.CrawlJob
 
importUris(String, String, String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
importUris(String, String, boolean) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
importUris(InputStream, String, boolean) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
IMPROPERESC - Static variable in class org.archive.net.UURIFactory
 
IMPROPERESC_REPLACE - Static variable in class org.archive.net.UURIFactory
 
in - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
in - Variable in class org.archive.io.arc.ARCReader
ARC file input stream.
inactiveHosts() - Method in interface org.archive.crawler.framework.FrontierHostStatistics
Total number of inactive hosts.
inactiveQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
All 'inactive' queues, not yet in active rotation.
incrementConsecutiveConnectionErrors() - Method in class org.archive.crawler.datamodel.CrawlServer
 
incrementDeferrals() - Method in class org.archive.crawler.datamodel.CrawlURI
Increment the deferral count.
incrementDisregardedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of disregarded URIs.
incrementFailedFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of failed URIs.
incrementFetchAttempts() - Method in class org.archive.crawler.datamodel.CrawlURI
Increment the number of attempts at getting the document referenced by this URI.
incrementHostCounters(CrawlURI) - Method in class org.archive.crawler.frontier.DomainSensitiveFrontier
 
incrementMapCount(Map, String) - Static method in class org.archive.crawler.admin.StatisticsTracker
Increment a counter for a key in a given HashMap.
incrementMapCount(Map, String, long) - Static method in class org.archive.crawler.admin.StatisticsTracker
Increment a counter for a key in a given HashMap by an arbitrary amount.
incrementQueuedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of queued URIs.
incrementQueuedUriCount(long) - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of queued URIs.
incrementSessionBalance(int) - Method in class org.archive.crawler.frontier.WorkQueue
Increase the internal running budget to be used before deactivating the queue
incrementSucceededFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of successfully fetched URIs.
index - Variable in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
indexFor(int, int) - Static method in class org.archive.crawler.settings.SoftSettingsHash
Return index for hash code h.
indexOf(Object) - Method in class org.archive.crawler.settings.ListType
 
indexOfCurrentIterator - Variable in class org.archive.util.iterator.CompositeIterator
 
InetAddressUtil - Class in org.archive.util
InetAddress utility.
init(FilterConfig) - Method in class org.archive.crawler.admin.ui.RootFilter
 
initCause(Throwable) - Method in exception org.archive.io.arc.ARCReader.RecoverableIOException
 
InitializationException - Exception in org.archive.crawler.framework.exceptions
InitializationExceptions should be thrown when there is a problem with the crawl's initialization, such as file creation problems, etc.
InitializationException() - Constructor for exception org.archive.crawler.framework.exceptions.InitializationException
 
InitializationException(String) - Constructor for exception org.archive.crawler.framework.exceptions.InitializationException
 
InitializationException(String, Throwable) - Constructor for exception org.archive.crawler.framework.exceptions.InitializationException
 
InitializationException(Throwable) - Constructor for exception org.archive.crawler.framework.exceptions.InitializationException
 
initialize(Registry) - Method in interface org.archive.configuration.Configurable
Called soon after construction.
initialize() - Method in class org.archive.configuration.Configuration
 
initialize() - Method in class org.archive.configuration.registry.CrawlOrder.CrawlOrderConfiguration
 
initialize(Registry) - Method in class org.archive.configuration.registry.CrawlOrder
 
initialize() - Method in class org.archive.configuration.registry.CrawlOrderSubClass.CrawlOrderSubClassConfiguration
 
initialize(String, Store) - Method in interface org.archive.configuration.Registry
Initialize the registry.
initialize(String, Store) - Method in class org.archive.configuration.registry.JmxRegistry
 
initialize(Registry) - Method in class org.archive.configuration.registry.TestProcessor
 
initialize(CrawlController) - Method in class org.archive.crawler.admin.StatisticsTracker
 
initialize() - Method in class org.archive.crawler.extractor.PDFParser
Initialize opens the document for reading.
initialize(CrawlController) - Static method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
Initialize this factory.
initialize(CrawlController) - Method in class org.archive.crawler.framework.AbstractTracker
Sets up the Logger (including logInterval) and registers with the CrawlController for CrawlStatus and CrawlURIDisposition events.
initialize(CrawlController, String) - Method in class org.archive.crawler.framework.Checkpointer
 
initialize(SettingsHandler) - Method in class org.archive.crawler.framework.CrawlController
Starting from nothing, set up CrawlController and associated classes to be ready for a first crawl.
initialize(CrawlController) - Method in class org.archive.crawler.framework.CrawlScope
Initialize is called just before the crawler starts to run.
initialize(CrawlController) - Method in interface org.archive.crawler.framework.Frontier
Initialize the Frontier.
initialize(CrawlController) - Method in interface org.archive.crawler.framework.StatisticsTracking
Do initialization.
initialize(CrawlController) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
initialize(CrawlController) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
initialize(CrawlController) - Method in class org.archive.crawler.frontier.BdbFrontier
 
initialize(CrawlController) - Method in class org.archive.crawler.frontier.DomainSensitiveFrontier
 
initialize(CrawlController) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Initializes the Frontier, given the supplied CrawlController.
initialize(Registry) - Method in class org.archive.crawler.Heritrix
 
initialize(CrawlController) - Method in class org.archive.crawler.scope.SurtPrefixScope
 
initialize(String, CrawlJob, File, File) - Static method in class org.archive.crawler.selftest.SelfTestCase
Static initializer.
initialize() - Method in class org.archive.crawler.settings.SettingsHandler
Initialize the SettingsHandler.
initialize() - Method in class org.archive.crawler.settings.XMLSettingsHandler
Initialize the SettingsHandler.
initialize(File) - Method in class org.archive.crawler.settings.XMLSettingsHandler
Initialize the SettingsHandler from a source.
initialize(int) - Method in class org.archive.crawler.SimpleHttpServer
Initialize the server.
initialize(Environment) - Method in class org.archive.crawler.util.BdbUriUniqFilter
Method shared by constructors.
initialize(int, int) - Method in class org.archive.crawler.util.BloomUriUniqFilter
Initializer shared by constructors.
initialize(String) - Method in class org.archive.io.arc.ARCReader
Convenience method used by subclass constructors.
initialize(Environment, Class, Class, StoredClassCatalog) - Method in class org.archive.util.CachedBdbMap
Call this method when you have an instance when you used the default constructor or when you have a deserialized instance that you want to reconnect with an extant bdbje environment.
initializeInstance() - Method in class org.archive.util.CachedBdbMap
Do any instance setup.
initialTasks() - Method in class org.archive.crawler.fetcher.FetchHTTP
 
initialTasks() - Method in class org.archive.crawler.framework.Processor
Classes subclassing this one should override this method to perform processor specific actions.
initialTasks() - Method in class org.archive.crawler.framework.Scoper
 
initialTasks() - Method in class org.archive.crawler.processor.CrawlMapper
 
initialTasks() - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
initQueue() - Method in class org.archive.crawler.frontier.BdbFrontier
 
initQueue() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
inner - Variable in class org.archive.io.CharSubSequence
 
inner - Variable in class org.archive.util.iterator.TransformingIteratorWrapper
 
innerAccepts(Object) - Method in class org.archive.crawler.deciderules.DecidingFilter
 
innerAccepts(Object) - Method in class org.archive.crawler.deciderules.DecidingScope
 
innerAccepts(Object) - Method in class org.archive.crawler.filter.ContentTypeRegExpFilter
 
innerAccepts(Object) - Method in class org.archive.crawler.filter.HopsFilter
 
innerAccepts(Object) - Method in class org.archive.crawler.filter.HTTPMidFetchUnchangedFilter
 
innerAccepts(Object) - Method in class org.archive.crawler.filter.OrFilter
 
innerAccepts(Object) - Method in class org.archive.crawler.filter.PathDepthFilter
 
innerAccepts(Object) - Method in class org.archive.crawler.filter.SurtPrefixFilter
 
innerAccepts(Object) - Method in class org.archive.crawler.filter.TransclusionFilter
 
innerAccepts(Object) - Method in class org.archive.crawler.filter.URIListRegExpFilter
 
innerAccepts(Object) - Method in class org.archive.crawler.filter.URIRegExpFilter
 
innerAccepts(Object) - Method in class org.archive.crawler.framework.Filter
Classes subclassing this one should override this method to perfrom their custom determination of whether or not the object given to it.
innerAccepts(Object) - Method in class org.archive.crawler.scope.ClassicScope
Returns whether the given object (typically a CandidateURI) falls within this scope.
innerCheck(CrawlerSettings, ComplexType, Type, Object) - Method in class org.archive.crawler.settings.Constraint
The method all subclasses should implement to do the actual checking.
innerCheck(CrawlerSettings, ComplexType, Type, Object) - Method in class org.archive.crawler.settings.LegalValueListConstraint
 
innerCheck(CrawlerSettings, ComplexType, Type, Object) - Method in class org.archive.crawler.settings.LegalValueTypeConstraint
 
innerCheck(CrawlerSettings, ComplexType, Type, Object) - Method in class org.archive.crawler.settings.RegularExpressionConstraint
 
innerFinished(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
innerHasNext() - Method in class org.archive.io.arc.ARCReader.ARCRecordIterator
 
innerNext() - Method in class org.archive.io.arc.ARCReader.ARCRecordIterator
 
innerPredicate - Variable in class org.archive.util.Inverter
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.extractor.ChangeEvaluator
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.extractor.Extractor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorHTTP
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.extractor.HTTPContentDigest
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.fetcher.FetchDNS
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.framework.Processor
Classes subclassing this one should override this method to perform their custom actions on the CrawlURI.
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.ContentBasedWaitEvaluator
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.CrawlStateUpdater
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.FrontierScheduler
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Notes a CrawlURI's content size in its running tally.
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.WaitEvaluator
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.Preselector
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.processor.Test
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.writer.ARCWriterProcessor
Takes a CrawlURI and generates an arc record, writing it to disk.
innerProcess(CrawlURI) - Method in class org.archive.crawler.writer.MirrorWriterProcessor
 
innerRejectProcess(CrawlURI) - Method in class org.archive.crawler.framework.Processor
 
innerSchedule(CandidateURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
inProcessHosts() - Method in interface org.archive.crawler.framework.FrontierHostStatistics
Total number of hosts with URIs in process.
inProcessing - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Number of URIs belonging to this queue that are being processed at the moment.
inProcessing(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns true if this HQ has a CrawlURI matching the uri string currently being processed.
inProcessQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
all per-class queues from whom a URI is outstanding
input - Variable in class org.archive.crawler.scope.SeedFileIterator
 
inputWrap(InputStream) - Method in class org.archive.util.HttpRecorder
Wrap the provided stream with the internal RecordingInputStream Its safe to call multiple times.
insertItem(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
insertItem(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Insert the given curi, whether it is already present or not.
instanceMain(String[]) - Method in class org.archive.crawler.util.BenchmarkUriUniqFilters
 
instanceMain(String[]) - Method in class org.archive.util.BenchmarkBlooms
 
InstancePerThread - Interface in org.archive.crawler.datamodel
indicates that a processor should have an instance per ToeThread
instantiateModuleTypeFromClassName(String, String) - Static method in class org.archive.crawler.settings.SettingsHandler
Instatiate a new ModuleType given its name and className.
INTEGER - Static variable in class org.archive.crawler.settings.SettingsHandler
Datatypes supported by the settings framwork
INTEGER_LIST - Static variable in class org.archive.crawler.settings.SettingsHandler
 
IntegerList - Class in org.archive.crawler.settings
List of Integer values
IntegerList(String, String) - Constructor for class org.archive.crawler.settings.IntegerList
Creates a new IntegerList.
IntegerList(String, String, IntegerList) - Constructor for class org.archive.crawler.settings.IntegerList
Creates a new IntegerList and initializes it with the values from another IntegerList.
IntegerList(String, String, Integer[]) - Constructor for class org.archive.crawler.settings.IntegerList
Creates a new IntegerList and initializes it with the values from an array of Integers.
IntegerList(String, String, int[]) - Constructor for class org.archive.crawler.settings.IntegerList
Creates a new IntegerList and initializes it with the values from an int array.
interrupt(String) - Method in class org.archive.crawler.Heritrix
 
invalidateARCWriter(ARCWriter) - Method in class org.archive.io.arc.ARCWriterPool
 
InvalidFrontierMarkerException - Exception in org.archive.crawler.framework.exceptions
An exception that is thrown when there is an attempt to use a URIFrontierMarker that has become invalid.
InvalidFrontierMarkerException() - Constructor for exception org.archive.crawler.framework.exceptions.InvalidFrontierMarkerException
 
InvalidJobFileException - Exception in org.archive.crawler.admin
An exception that is thrown when a program encounters a jobfile that is corrupt or otherwise incomplete or invalid.
InvalidJobFileException(String) - Constructor for exception org.archive.crawler.admin.InvalidJobFileException
 
Inverter - Class in org.archive.util
A predicate that inverts another.
Inverter(Predicate) - Constructor for class org.archive.util.Inverter
 
invoke(String, Object[], String[]) - Method in class org.archive.configuration.Configuration
 
invoke(String, Object[], String[]) - Method in class org.archive.crawler.admin.CrawlJob
 
invoke(String, Object[], String[]) - Method in class org.archive.crawler.Heritrix
 
invoke(String, Object[], String[]) - Method in class org.archive.crawler.settings.ComplexType
 
invoke(String, Object[], String[]) - Method in class org.archive.util.JEApplicationMBean
 
invoke(Environment, String, Object[], String[]) - Method in class org.archive.util.JEMBeanHelper
Invoke an operation for the given environment.
IoUtils - Class in org.archive.crawler.util
Logging utils.
IoUtils() - Constructor for class org.archive.crawler.util.IoUtils
 
IoUtils - Class in org.archive.util
I/O Utility methods.
IoUtils() - Constructor for class org.archive.util.IoUtils
 
IoUtilsTest - Class in org.archive.crawler.util
Test IoUtils.
IoUtilsTest() - Constructor for class org.archive.crawler.util.IoUtilsTest
 
IoUtilsTest - Class in org.archive.util
 
IoUtilsTest() - Constructor for class org.archive.util.IoUtilsTest
 
IP_ADDRESS - Static variable in class org.archive.crawler.extractor.ExtractorUniversal
Matches any string that begins with http:// or https:// followed by something that looks like an ip address (four numbers, none longer then 3 chars seperated by 3 dots).
IP_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header IP field.
IP_NEVER_EXPIRES - Static variable in class org.archive.crawler.datamodel.CrawlHost
Flag value indicating always-valid IP
IP_NEVER_LOOKED_UP - Static variable in class org.archive.crawler.datamodel.CrawlHost
Flag value indicating an IP has not yet been looked up
IPQueueAssignmentPolicy - Class in org.archive.crawler.frontier
Uses target IP as basis for queue-assignment, unless it is unavailable, in which case it behaves as HostnameQueueAssignmentPolicy.
IPQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.IPQueueAssignmentPolicy
 
IPV4_QUADS - Static variable in class org.archive.util.InetAddressUtil
ipv4 address.
isActive() - Method in class org.archive.crawler.framework.ToeThread
Is this thread processing a URI, not paused or waiting for a URI?
isAtBeginning() - Method in class org.archive.crawler.framework.Checkpointer
 
isCheckpointErrors() - Method in class org.archive.crawler.framework.Checkpointer
 
isCheckpointFailed() - Method in class org.archive.crawler.framework.Checkpointer
 
isCheckpointing() - Method in class org.archive.crawler.admin.CrawlJob
 
isCheckpointing() - Method in class org.archive.crawler.framework.Checkpointer
 
isCheckpointing() - Method in class org.archive.crawler.framework.CrawlController
 
isCheckpointRecover(CrawlOrder) - Static method in class org.archive.crawler.framework.CrawlController
 
isCheckpointRecover() - Method in class org.archive.crawler.framework.CrawlController
 
isCommandLine() - Static method in class org.archive.crawler.Heritrix
 
isComplexType() - Method in class org.archive.crawler.settings.ModuleAttributeInfo
Returns true if this attribute refers to a ComplexType.
isCompressed() - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
isCompressed() - Method in class org.archive.io.arc.ARCReader
 
isCompressed(File) - Static method in class org.archive.io.arc.ARCUtils
 
isCompressed() - Method in class org.archive.io.arc.ARCWriter.ARCWriterSettingsImpl
 
isCompressed() - Method in interface org.archive.io.arc.ARCWriterSettings
 
isConnectionStaleCheckingEnabled() - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
Deprecated. Use HttpConnectionParams.isStaleCheckingEnabled(), HttpConnectionManager.getParams().
isContentToProcess(CrawlURI) - Method in class org.archive.crawler.framework.Processor
 
isCrawling() - Method in class org.archive.crawler.admin.CrawlJob
 
isCrawling() - Method in class org.archive.crawler.admin.CrawlJobHandler
Is a crawl job being crawled?
isDevelopment() - Static method in class org.archive.crawler.Heritrix
 
isDigest() - Method in class org.archive.io.arc.ARCReader
 
isDisregarded(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
isDisregarded(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
isEmpty(Object) - Method in class org.archive.crawler.filter.OrFilter
 
isEmpty() - Method in interface org.archive.crawler.framework.Frontier
Returns true if the frontier contains no more URIs to crawl.
isEmpty() - Method in class org.archive.crawler.frontier.AbstractFrontier
Frontier is empty only if all queues are empty and no URIs are in-process
isEmpty() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
isEmpty() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
isEmpty() - Method in class org.archive.crawler.settings.ListType
Returns true if this list contains no elements.
isEmpty(Object) - Method in class org.archive.crawler.settings.MapType
Returns true if this map is empty.
isEmpty() - Method in interface org.archive.queue.Queue
is the queue empty?
isEmpty() - Method in interface org.archive.queue.Stack
 
isEnabled(Object) - Method in interface org.archive.crawler.url.CanonicalizationRule
 
isEnabled(Object) - Method in class org.archive.crawler.url.canonicalize.BaseRule
 
isEveryTime() - Method in class org.archive.crawler.datamodel.credential.Credential
 
isEveryTime() - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
isEveryTime() - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
isExpectedMimeType(String, String) - Method in class org.archive.crawler.framework.Processor
 
isExpertSetting() - Method in class org.archive.crawler.settings.ModuleAttributeInfo
Returns true if this Type should only show up in expert mode in UI.
isExpertSetting() - Method in class org.archive.crawler.settings.Type
Returns true if this Type should only show up in expert mode in UI.
isHeld() - Method in class org.archive.crawler.frontier.WorkQueue
Whether the queue is already in a lifecycle stage -- such as ready, in-progress, snoozed -- and thus should not be redundantly inserted to readyClassQueues
isHtmlExpectedHere(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorHTML
Test whether this HTML is so unexpected (eg in place of a GIF URI) that it shouldn't be scanned for links.
isHttpTransaction() - Method in class org.archive.crawler.datamodel.CrawlURI
Return true if this is a http transaction.
isHttpTransactionContentToProcess(CrawlURI) - Method in class org.archive.crawler.framework.Processor
 
isInitialized() - Method in class org.archive.crawler.settings.ComplexType
Returns true if this ComplexType is initialized.
isInScope(CandidateURI) - Method in class org.archive.crawler.framework.Scoper
Schedule the given CandidateURI with the Frontier.
isInScope(CandidateURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
isIpExpired(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Return true if ip should be looked up.
isListLogicOR(Object) - Method in class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
 
isListLogicOR(Object) - Method in class org.archive.crawler.filter.URIListRegExpFilter
 
isLocation() - Method in class org.archive.crawler.datamodel.CandidateURI
 
isNew() - Method in class org.archive.crawler.admin.CrawlJob
Is this a new job?
isOpen() - Method in class org.archive.io.RecordingInputStream
 
isOpen() - Method in class org.archive.io.RecordingOutputStream
 
isOpenType(Class) - Static method in class org.archive.util.JmxUtils
 
isOpenType(String) - Static method in class org.archive.util.JmxUtils
 
isOverBudget() - Method in class org.archive.crawler.frontier.WorkQueue
Check whether queue has temporarily or permanently exceeded its budget.
isOverridden(CrawlerSettings, String) - Method in class org.archive.crawler.settings.ComplexType
Returns true if an element is overridden for this settings object.
isOverrideable() - Method in class org.archive.crawler.settings.ModuleAttributeInfo
Returns true if this attribute could be overridden in per settings.
isOverrideable() - Method in class org.archive.crawler.settings.Type
Is this an 'overrideable' setting.
isOverrideLogger(Object) - Method in class org.archive.crawler.framework.Scoper
 
isParseHttpHeaders() - Method in class org.archive.io.arc.ARCReader
 
isPaused() - Method in class org.archive.crawler.framework.CrawlController
Tell if the controller is paused
isPersistentAlistMember(String) - Method in class org.archive.crawler.datamodel.CrawlURI
 
isPointer(CompositeData) - Static method in class org.archive.configuration.Pointer
Test if a CompositeData is a Pointer.
isPost() - Method in class org.archive.crawler.datamodel.CrawlURI
Returns true if this URI should be fetched by sending a HTTP POST request.
isPost(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
 
isPost(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
isPost(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
isPrerequisite() - Method in class org.archive.crawler.datamodel.CrawlURI
Returns true if this CrawlURI is a prerequisite.
isPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
 
isPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
isPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
isProfile() - Method in class org.archive.crawler.admin.CrawlJob
Set if the job is considered to be a profile
isRead() - Method in class org.archive.io.SinkHandlerLogRecord
 
isReadOnly() - Method in class org.archive.crawler.admin.CrawlJob
Is job read only?
isRefinement() - Method in class org.archive.crawler.settings.CrawlerSettings
Returns true if this settings object is a refinement.
isRegistered(String, Class<?>) - Method in interface org.archive.configuration.Registry
 
isRegistered(String, Class<?>, String) - Method in interface org.archive.configuration.Registry
 
isRegistered(String, Class) - Method in class org.archive.configuration.registry.JmxRegistry
 
isRegistered(String, Class, String) - Method in class org.archive.configuration.registry.JmxRegistry
 
isRetired() - Method in class org.archive.crawler.frontier.WorkQueue
 
isRobotsExpired(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Is the robots policy expired.
isRunning() - Method in class org.archive.crawler.admin.CrawlJob
Returns true if the job is being crawled.
isRunning() - Method in class org.archive.crawler.admin.CrawlJobHandler
Is the crawler accepting crawl jobs to run?
isRunning() - Method in class org.archive.crawler.framework.CrawlController
 
isSameHost(UURI, UURI) - Method in class org.archive.crawler.framework.CrawlScope
 
isSeed() - Method in class org.archive.crawler.datamodel.CandidateURI
 
isSeed(Object) - Method in class org.archive.crawler.framework.CrawlScope
Check if a URI is in the seeds.
isSingleInstance() - Static method in class org.archive.crawler.Heritrix
 
isStarted() - Method in class org.archive.crawler.Heritrix
 
isStrict() - Method in class org.archive.io.arc.ARCReader
 
isStrict() - Method in class org.archive.io.arc.ARCRecord
 
isSuccess() - Method in class org.archive.crawler.datamodel.CrawlURI
Ask this URI if it was a success or not.
isTransient() - Method in class org.archive.crawler.settings.ModuleAttributeInfo
Returns true if this attribute should be hidden from UI and not be serialized to persistent storage.
isTransient() - Method in class org.archive.crawler.settings.Type
Returns true if this ComplexType should be saved to persistent storage.
isType(Object, int) - Method in class org.archive.crawler.datamodel.RobotsHonoringPolicy
Check if policy is of a certain type.
isValid() - Method in class org.archive.crawler.datamodel.Checkpoint
 
isValid() - Method in class org.archive.io.arc.ARCReader
Test ARC is valid.
isValidLoginPasswordString(String) - Static method in class org.archive.crawler.Heritrix
Test string is valid login/password string.
isValidRobots() - Method in class org.archive.crawler.datamodel.CrawlServer
If true then valid robots.txt information has been retrieved.
isWithinRefinementBounds(UURI) - Method in interface org.archive.crawler.settings.refinements.Criteria
Check if a uri is within the bounds of this criteria.
isWithinRefinementBounds(UURI) - Method in class org.archive.crawler.settings.refinements.PortnumberCriteria
 
isWithinRefinementBounds(UURI) - Method in class org.archive.crawler.settings.refinements.Refinement
Check if a URI is within the bounds of every criteria set for this refinement.
isWithinRefinementBounds(UURI) - Method in class org.archive.crawler.settings.refinements.RegularExpressionCriteria
 
isWithinRefinementBounds(UURI) - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
 
iterateRecords(ARCReader) - Method in class org.archive.io.arc.ARCWriterTest
 
iterator(Object) - Method in class org.archive.crawler.datamodel.CredentialStore
 
iterator(Object) - Method in class org.archive.crawler.filter.OrFilter
 
iterator() - Method in class org.archive.crawler.framework.ProcessorChain
Get an iterator over the processors in this chain.
iterator() - Method in class org.archive.crawler.framework.ProcessorChainList
Get an iterator over the processor chains.
iterator(Object) - Method in class org.archive.crawler.settings.ComplexType
Get an Iterator over all the attributes in this ComplexType.
iterator() - Method in class org.archive.crawler.settings.ListType
Returns an iterator over the elements in this list in proper sequence.
iterator() - Method in class org.archive.crawler.settings.SoftSettingsHash
 
iterator() - Method in class org.archive.io.arc.ARCReader
 
iterator() - Method in class org.archive.io.GzippedInputStream
Returns a GZIP Member Iterator.
iterators - Variable in class org.archive.util.iterator.CompositeIterator
 

J

JavaLiterals - Class in org.archive.util
Utility functions to escape or unescape Java literal strings.
JavaLiterals() - Constructor for class org.archive.util.JavaLiterals
 
JAVASCRIPT - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
JAVASCRIPT - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
JAVASCRIPT_LIKELY_URI_EXTRACTOR - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
JAVASCRIPT_STRING_EXTRACTOR - Static variable in class org.archive.crawler.extractor.ExtractorJS
 
JAVASCRIPT_STRING_EXTRACTOR - Static variable in class org.archive.extractor.RegexpJSLinkExtractor
 
JEApplicationMBean - Class in org.archive.util
JEApplicationMBean is an example of how a JE application can incorporate JE monitoring into its existing MBean.
JEApplicationMBean(Environment) - Constructor for class org.archive.util.JEApplicationMBean
Instantiate a JEApplicationMBean
JEMBeanHelper - Class in org.archive.util
JEMBeanHelper is a utility class for the MBean implementation which wants to add management of a JE environment to its capabilities.
JEMBeanHelper(EnvironmentConfig, File, boolean) - Constructor for class org.archive.util.JEMBeanHelper
Instantiate a helper, specifying environment home and open capabilities.
JMX_PORT - Static variable in class org.archive.util.JmxUtils
 
JmxRegistry - Class in org.archive.configuration.registry
Implementation of Configuration Registry that uses JMX Agent.
JmxRegistry() - Constructor for class org.archive.configuration.registry.JmxRegistry
 
JmxRegistryTest - Class in org.archive.configuration.registry
 
JmxRegistryTest() - Constructor for class org.archive.configuration.registry.JmxRegistryTest
 
JmxUtils - Class in org.archive.util
Static utility used by JMX.
JmxUtils() - Constructor for class org.archive.util.JmxUtils
 
JmxUtilsTest - Class in org.archive.util
 
JmxUtilsTest() - Constructor for class org.archive.util.JmxUtilsTest
 
JndiUtils - Class in org.archive.util
JNDI utilities.
JndiUtils() - Constructor for class org.archive.util.JndiUtils
 
JOB - Static variable in class org.archive.util.JmxUtils
 
JobConfigureUtils - Class in org.archive.crawler.admin.ui
Utility methods used configuring jobs in the admin UI.
JobConfigureUtils() - Constructor for class org.archive.crawler.admin.ui.JobConfigureUtils
 
JS_MISC - Static variable in class org.archive.crawler.extractor.Link
stand-in value for js-discovered urls without other context
JSSTRING - Static variable in class org.archive.crawler.extractor.CrawlUriSWFAction
 

K

KEY - Static variable in class org.archive.util.JmxUtils
 
keys() - Method in class org.archive.crawler.datamodel.CandidateURI
 
keySet() - Method in class org.archive.util.CachedBdbMap
As this keySet is a union of the cache and diskMap KeySets, it does not support the remove operations typical of keySet views on underlying maps.
kickUpdate() - Method in class org.archive.crawler.admin.CrawlJob
Forward a 'kick' update to current controller if any.
kickUpdate() - Method in class org.archive.crawler.admin.CrawlJobHandler
Forward a 'kick' update to current job if any.
kickUpdate() - Method in class org.archive.crawler.deciderules.DecideRule
Respond to a settings update, refreshing any internal settings-derived state.
kickUpdate() - Method in class org.archive.crawler.deciderules.DecideRuleSequence
 
kickUpdate() - Method in class org.archive.crawler.deciderules.DecidingFilter
Note that configuration updates may be necessary.
kickUpdate() - Method in class org.archive.crawler.deciderules.DecidingScope
Note that configuration updates may be necessary.
kickUpdate() - Method in class org.archive.crawler.deciderules.PathologicalPathDecideRule
Repetitions may have changed; refresh constructedRegexp
kickUpdate() - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Re-read prefixes after an update.
kickUpdate() - Method in class org.archive.crawler.filter.OrFilter
Note that configuration updates may be necessary.
kickUpdate() - Method in class org.archive.crawler.filter.SurtPrefixFilter
Re-read prefixes after a settings update.
kickUpdate() - Method in class org.archive.crawler.framework.CrawlController
While many settings will update automatically when the SettingsHandler is modified, some settings need to be explicitly changed to reflect new settings.
kickUpdate() - Method in class org.archive.crawler.framework.CrawlScope
Take note of a situation (such as settings edit) where involved reconfiguration (such as reading from external files) may be necessary.
kickUpdate() - Method in class org.archive.crawler.framework.Filter
 
kickUpdate() - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that it should consider updating configuration info that may have changed in external files.
kickUpdate() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
kickUpdate() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
kickUpdate() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Accomodate any changes in settings.
kickUpdate() - Method in class org.archive.crawler.scope.ClassicScope
Take note of a situation (such as settings edit) where involved reconfiguration (such as reading from external files) may be necessary.
kickUpdate() - Method in class org.archive.crawler.scope.SurtPrefixScope
Re-read prefixes after an update.
kill() - Method in class org.archive.crawler.framework.ToeThread
Terminates a thread.
killThread(int, boolean) - Method in class org.archive.crawler.admin.CrawlJob
Kills a thread.
killThread(int, boolean) - Method in class org.archive.crawler.framework.CrawlController
Kills a thread.
killThread(int, boolean) - Method in class org.archive.crawler.framework.ToePool
Kills specified thread.

L

lastCacheMiss - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
lastCacheMissDiff - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
lastIndexOf(Object) - Method in class org.archive.crawler.settings.ListType
 
lastLogPointTime - Variable in class org.archive.crawler.framework.AbstractTracker
Timestamp of when this logger last wrote something to the log
lastMaxBandwidthKB - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
lastPagesFetchedCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
lastProcessedBytesCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
lastReturned - Variable in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
launch() - Method in class org.archive.crawler.Heritrix
Launch the crawler for a web UI.
launch(String, boolean) - Method in class org.archive.crawler.Heritrix
Launch the crawler for a web UI.
lax(BitSet) - Method in class org.archive.net.LaxURI
Given a BitSet -- typically one of the URI superclass's predefined static variables -- possibly replace it with a more-lax version to better match the character sets actually left unencoded in web browser requests
lax_abs_path - Static variable in class org.archive.net.LaxURI
 
lax_rel_segment - Static variable in class org.archive.net.LaxURI
 
LaxURI - Class in org.archive.net
URI subclass which allows partial/inconsistent encoding, matching the URIs which will be relayed in requests from popular web browsers (esp.
LaxURI(String, boolean, String) - Constructor for class org.archive.net.LaxURI
 
LaxURI(URI, URI) - Constructor for class org.archive.net.LaxURI
 
LaxURI(String, boolean) - Constructor for class org.archive.net.LaxURI
 
LaxURI() - Constructor for class org.archive.net.LaxURI
 
LaxURLCodec - Class in org.archive.net
 
LaxURLCodec(String) - Constructor for class org.archive.net.LaxURLCodec
 
LCURBRACKET - Static variable in class org.archive.net.UURIFactory
 
LCURBRACKET_PATTERN - Static variable in class org.archive.net.UURIFactory
 
LEGAL_LIST_LOGIC - Static variable in class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
 
LEGAL_LIST_LOGIC - Static variable in class org.archive.crawler.filter.URIListRegExpFilter
 
LegalValueListConstraint - Class in org.archive.crawler.settings
A constraint that checks that an attribute value matches one of the items in the list of legal values.
LegalValueListConstraint(Level, String) - Constructor for class org.archive.crawler.settings.LegalValueListConstraint
Constructs a new LegalValueListConstraint.
LegalValueListConstraint(String) - Constructor for class org.archive.crawler.settings.LegalValueListConstraint
Constructs a new LegalValueListConstraint using default severity level (Level.WARNING).
LegalValueListConstraint(Level) - Constructor for class org.archive.crawler.settings.LegalValueListConstraint
Constructs a new LegalValueListConstraint using default error message.
LegalValueListConstraint() - Constructor for class org.archive.crawler.settings.LegalValueListConstraint
Constructs a new LegalValueListConstraint using default severity level (Level.WARNING) and default error message.
LegalValueTypeConstraint - Class in org.archive.crawler.settings
A constraint that checks that an attribute value is of the right type
LegalValueTypeConstraint(Level, String) - Constructor for class org.archive.crawler.settings.LegalValueTypeConstraint
Constructs a new LegalValueListConstraint.
LegalValueTypeConstraint(String) - Constructor for class org.archive.crawler.settings.LegalValueTypeConstraint
Constructs a new LegalValueListConstraint using default severity level (Level.WARNING).
LegalValueTypeConstraint(Level) - Constructor for class org.archive.crawler.settings.LegalValueTypeConstraint
Constructs a new LegalValueListConstraint using default error message.
LegalValueTypeConstraint() - Constructor for class org.archive.crawler.settings.LegalValueTypeConstraint
Constructs a new LegalValueListConstraint using default severity level (Level.WARNING) and default error message.
length() - Method in class org.archive.crawler.settings.TextField
 
length() - Method in class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
Gets the length of this string.
length() - Method in class org.archive.io.CharSubSequence
 
length - Variable in class org.archive.io.GzipHeader
Total length of the gzip header.
length() - Method in class org.archive.net.UURI
 
length() - Method in class org.archive.queue.MemQueue
 
length() - Method in interface org.archive.queue.Queue
get the number of elements in the queue
LENGTH_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for length field.
LENGTH_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header length field.
lengthTooLong(String, boolean, boolean) - Method in class org.archive.io.arc.ARCWriterTest
 
lengthTooShort(String, boolean, boolean) - Method in class org.archive.io.arc.ARCWriterTest
 
level - Variable in class org.archive.crawler.admin.CrawlJobErrorHandler
 
LEVELS_AS_ARRAY - Static variable in class org.archive.httpclient.ConfigurableX509TrustManager
All the levels of trust as an array from babe-in-the-wood to strict.
LIKELY_URI_PATH - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
LIKELY_URI_PATH - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
LINE_SEPARATOR - Static variable in interface org.archive.io.arc.ARCConstants
ARC file line seperator character.
linePos - Variable in class org.archive.util.PaddingStringBuffer
 
LineReadingIterator - Class in org.archive.util.iterator
Utility class providing an Iterator interface over line-oriented text input, as a thin wrapper over a BufferedReader.
LineReadingIterator(BufferedReader) - Constructor for class org.archive.util.iterator.LineReadingIterator
 
LINK - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
Link - Class in org.archive.crawler.extractor
Link represents one discovered "edge" of the web graph: the source URI, the destination URI, and the type of reference (represented by the context in which it was found).
Link(CharSequence, CharSequence, CharSequence, char) - Constructor for class org.archive.crawler.extractor.Link
Create a Link with the given fields.
LINK - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
LinkExtractor - Interface in org.archive.extractor
LinkExtractor is a general interface for classes which, when given an InputStream and Charset, can scan for Links and return them via an Iterator interface.
linkExtractorFinished() - Method in class org.archive.crawler.datamodel.CrawlURI
Note that link extraction has been performed on this CrawlURI.
LinksScoper - Class in org.archive.crawler.postprocessor
Determine which extracted links are within scope.
LinksScoper(String) - Constructor for class org.archive.crawler.postprocessor.LinksScoper
 
listIterator() - Method in class org.archive.crawler.settings.ListType
 
listIterator(int) - Method in class org.archive.crawler.settings.ListType
 
ListType - Class in org.archive.crawler.settings
Super type for all lists.
ListType(String, String) - Constructor for class org.archive.crawler.settings.ListType
Constructs a new ListType.
listUsedFiles(List) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
listUsedFiles(List) - Method in class org.archive.crawler.framework.CrawlScope
 
listUsedFiles(List) - Method in class org.archive.crawler.settings.ModuleType
Those Modules that use files on disk should list them all when this method is called.
load(Store, String) - Method in class org.archive.configuration.registry.JmxRegistry
 
load(Store, String) - Method in interface org.archive.configuration.Registry
Load (or reload) settings from store.
load() - Method in interface org.archive.configuration.Store
 
load(String) - Method in interface org.archive.configuration.Store
 
load() - Method in class org.archive.configuration.store.SerializeStore
 
load(String) - Method in class org.archive.configuration.store.SerializeStore
 
load(File) - Method in class org.archive.configuration.store.SerializeStore
 
load(String) - Method in class org.archive.crawler.util.RecoveryLogMapper
 
loadCookies(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
Load cookies from a file before the first fetch.
loadCookies() - Method in class org.archive.crawler.fetcher.FetchHTTP
Load cookies from the file specified in the order file.
loadFactor - Variable in class org.archive.util.AbstractLongFPSet
The load factor, as a fraction.
loadJob(File) - Method in class org.archive.crawler.admin.CrawlJobHandler
Loads a job given a specific job file.
loadMap() - Method in class org.archive.crawler.processor.CrawlMapper
Retrieve and parse the mapping specification from a local path or HTTP URL.
loadOptions(String) - Static method in class org.archive.crawler.admin.CrawlJobHandler
Loads options from a file.
loadProfile(File) - Method in class org.archive.crawler.admin.CrawlJobHandler
Load one profile.
loadProperties() - Static method in class org.archive.crawler.Heritrix
Load the heritrix.properties file.
loadSeeds() - Method in interface org.archive.crawler.framework.Frontier
Request that the Frontier load (or reload) crawl seeds, typically by contacting the Scope.
loadSeeds() - Method in class org.archive.crawler.frontier.AbstractFrontier
Load up the seeds.
loadSeeds() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Loads the seeds
LocalErrorFormatter - Class in org.archive.crawler.io
 
LocalErrorFormatter() - Constructor for class org.archive.crawler.io.LocalErrorFormatter
 
localErrors - Variable in class org.archive.crawler.framework.CrawlController
This logger is for job-scoped logging, specifically errors which happen and are handled within a particular processor.
LocalizedError - Class in org.archive.crawler.datamodel
 
LocalizedError(String, Throwable, String) - Constructor for class org.archive.crawler.datamodel.LocalizedError
 
localName - Variable in class org.archive.crawler.processor.CrawlMapper
name of the enclosing crawler (URIs mapped here stay put)
LOCATION_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header Location field.
log(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Log to the main crawl.log
LOG_ERROR - Static variable in class org.archive.crawler.frontier.RecoveryJournal
 
LOG_TIMESTAMP - Static variable in class org.archive.crawler.frontier.RecoveryJournal
 
logGeneration - Variable in class org.archive.crawler.processor.CrawlMapper
Truncated timestamp prefix for diversion logs; when current time doesn't match, it's time to close all current logs.
logger - Static variable in class org.archive.crawler.datamodel.CredentialStoreTest
 
logger - Static variable in class org.archive.crawler.extractor.AggressiveExtractorHTML
 
logger - Variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
logger - Static variable in class org.archive.crawler.url.canonicalize.RegexRule
 
logger - Static variable in class org.archive.httpclient.ConfigurableX509TrustManager
Logging instance.
logger - Static variable in class org.archive.httpclient.HttpRecorderGetMethod
 
logger - Static variable in class org.archive.httpclient.HttpRecorderMethod
 
logger - Variable in class org.archive.io.arc.ARCReader
 
logger - Static variable in class org.archive.io.arc.ARCWriterPool
Logger instance used by this class.
logger - Static variable in class org.archive.io.RecordingInputStream
 
logger - Static variable in class org.archive.io.ReplayCharSequenceFactory
Logger.
logger - Static variable in class org.archive.util.DevUtils
 
logger - Static variable in class org.archive.util.HttpRecorder
 
logLocalizedErrors(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Take note of any processor-local errors that have been entered into the CrawlURI.
LOGNAME_RECOVER - Static variable in interface org.archive.crawler.frontier.FrontierJournal
 
logNote(String) - Method in class org.archive.crawler.framework.AbstractTracker
 
logProgressStatistics(String) - Method in class org.archive.crawler.framework.CrawlController
Log to the progress statistics log.
LogReader - Class in org.archive.crawler.util
This class contains a variety of methods for reading log files (or other text files containing repeated lines with similar information).
LogReader() - Constructor for class org.archive.crawler.util.LogReader
 
logStdErr(Level, String) - Static method in class org.archive.io.arc.ARCReader
Log on stderr.
logUriError(URIException, UURI, CharSequence) - Method in class org.archive.crawler.framework.CrawlController
Log a URIException from deep inside other components to the crawl's shared log.
LogUtils - Class in org.archive.crawler.util
Logging utils.
LogUtils() - Constructor for class org.archive.crawler.util.LogUtils
 
LONG - Static variable in class org.archive.crawler.settings.SettingsHandler
 
LONG_LIST - Static variable in class org.archive.crawler.settings.SettingsHandler
 
longerThan(int) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.URIToFileReturn
Tests if this path is longer than a given value.
longestActiveQueue - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
LongFPSet - Interface in org.archive.util.fingerprint
Set for holding primitive long fingerprints.
LongFPSetCache - Class in org.archive.util.fingerprint
Like a MemLongFPSet, but with fixed capacity and maximum size.
LongFPSetCache() - Constructor for class org.archive.util.fingerprint.LongFPSetCache
 
LongFPSetCache(int, float) - Constructor for class org.archive.util.fingerprint.LongFPSetCache
 
LongFPSetCacheTest - Class in org.archive.util.fingerprint
JUnit test suite for LongFPSetCache
LongFPSetCacheTest(String) - Constructor for class org.archive.util.fingerprint.LongFPSetCacheTest
Create a new LongFPSetCacheTest object
LongFPSetTestCase - Class in org.archive.util.fingerprint
JUnit test suite for LongFPSet.
LongFPSetTestCase(String) - Constructor for class org.archive.util.fingerprint.LongFPSetTestCase
Create a new LongFPSetTest object
longIntoByteArray(long, byte[], int) - Static method in class org.archive.util.ArchiveUtils
Copy the raw bytes of a long into a byte array, starting at the specified offset.
LongList - Class in org.archive.crawler.settings
List of Long values
LongList(String, String) - Constructor for class org.archive.crawler.settings.LongList
Creates a new LongList.
LongList(String, String, LongList) - Constructor for class org.archive.crawler.settings.LongList
Creates a new LongList and initializes it with the values from another LongList.
LongList(String, String, Long[]) - Constructor for class org.archive.crawler.settings.LongList
Creates a new LongList and initializes it with the values from an array of Long.
LongList(String, String, long[]) - Constructor for class org.archive.crawler.settings.LongList
Creates a new LongList and initializes it with the values from an array of long.
longValue - Variable in class org.archive.util.LongWrapper
 
LongWrapper - Class in org.archive.util
Wraps a long.
LongWrapper(long) - Constructor for class org.archive.util.LongWrapper
 
lookahead() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
Check if there's a next by trying to read it.
lookahead() - Method in class org.archive.util.iterator.LineReadingIterator
Loads next line into lookahead spot
lookahead() - Method in class org.archive.util.iterator.LookaheadIterator
 
lookahead() - Method in class org.archive.util.iterator.TransformingIteratorWrapper
 
LookaheadIterator - Class in org.archive.util.iterator
Superclass for Iterators which must probe ahead to know if a 'next' exists, and thus have a cached next between a call to hasNext() and next().
LookaheadIterator() - Constructor for class org.archive.util.iterator.LookaheadIterator
 
lookup(Object) - Method in interface org.archive.crawler.deciderules.ExternalGeoLookupInterface
 
LOOSE - Static variable in class org.archive.httpclient.ConfigurableX509TrustManager
Trust any valid cert including self-signed certificates.
LowDiskPauseProcessor - Class in org.archive.crawler.postprocessor
Processor module which uses 'df -k', where available and with the expected output format (on Linux), to monitor available disk space and pause the crawl if free space on monitored filesystems falls below certain thresholds.
LowDiskPauseProcessor(String) - Constructor for class org.archive.crawler.postprocessor.LowDiskPauseProcessor
 
LowercaseRule - Class in org.archive.crawler.url.canonicalize
Lowercases the URL.
LowercaseRule(String) - Constructor for class org.archive.crawler.url.canonicalize.LowercaseRule
 
LowercaseRuleTest - Class in org.archive.crawler.url.canonicalize
Unit test lowercase rule.
LowercaseRuleTest() - Constructor for class org.archive.crawler.url.canonicalize.LowercaseRuleTest
 
LSQRBRACKET - Static variable in class org.archive.net.UURIFactory
 
LSQRBRACKET_PATTERN - Static variable in class org.archive.net.UURIFactory
 

M

m - Variable in class org.archive.util.BloomFilter32bit
The number of bits in this filter.
m - Variable in class org.archive.util.BloomFilter32bitSplit
The number of bits in this filter.
m - Variable in class org.archive.util.BloomFilter32bp2
The number of bits in this filter.
m - Variable in class org.archive.util.BloomFilter32bp2Split
The number of bits in this filter.
m - Variable in class org.archive.util.BloomFilter64bit
The number of bits in this filter.
main(String[]) - Static method in class org.archive.configuration.registry.JmxRegistryTest
 
main(String[]) - Static method in class org.archive.crawler.datamodel.Robotstxt
 
main(String[]) - Static method in class org.archive.crawler.extractor.ExtractorHTMLTest
 
main(String[]) - Static method in class org.archive.crawler.extractor.ExtractorTool
 
main(String[]) - Static method in class org.archive.crawler.extractor.PDFParser
 
main(String[]) - Static method in class org.archive.crawler.frontier.RecoveryJournalTest
 
main(String[]) - Static method in class org.archive.crawler.Heritrix
Launch program.
main(String[]) - Static method in class org.archive.crawler.util.BdbUriUniqFilterTest
 
main(String[]) - Static method in class org.archive.crawler.util.BenchmarkUriUniqFilters
Test the UriUniqFilter implementation (MemUriUniqFilter, BloomUriUniqFilter, or BdbUriUniqFilter) named in first argument against the file of one-per-line URIs named in the second argument.
main(String[]) - Static method in class org.archive.crawler.util.RecoveryLogMapper
 
main(String[]) - Static method in class org.archive.io.arc.ARCReader
Command-line interface to ARCReader.
main(String[]) - Static method in class org.archive.io.GzippedInputStreamTest
 
main(String[]) - Static method in class org.archive.net.rsync.Handler
Main dumps rsync file to STDOUT.
main(String[]) - Static method in class org.archive.queue.MemQueueTest
run all the tests for MemQueueTest
main(String[]) - Static method in class org.archive.queue.QueueCat
 
main(String[]) - Static method in class org.archive.util.ArchiveUtilsTest
run all the tests for ArchiveUtilsTest
main(String[]) - Static method in class org.archive.util.Base32
For testing, take a command-line argument in Base32, decode, print in hex, encode, print
main(String[]) - Static method in class org.archive.util.BenchmarkBlooms
 
main(String[]) - Static method in class org.archive.util.CachedBdbMapTest
 
main(String[]) - Static method in class org.archive.util.fingerprint.LongFPSetCacheTest
run all the tests for LongFPSetCacheTest
main(String[]) - Static method in class org.archive.util.fingerprint.MemLongFPSetTest
run all the tests for MemLongFPSetTest
main(String[]) - Static method in class org.archive.util.JndiUtils
Testing code.
main(String[]) - Static method in class org.archive.util.OneLineSimpleLogger
Test this logger.
main(String[]) - Static method in class org.archive.util.PaddingStringBufferTest
run all the tests for PaddingStringBufferTest
main(String[]) - Static method in class org.archive.util.SURT
Allow class to be used as a command-line tool for converting URL lists (or naked host or host/path fragments implied to be HTTP URLs) to SURT form.
main(String[]) - Static method in class org.archive.util.SurtPrefixSet
Allow class to be used as a command-line tool for converting URL lists (or naked host or host/path fragments implied to be HTTP URLs) to implied SURT prefix form.
main(String[]) - Static method in class org.archive.util.SurtPrefixSetTest
run all the tests for SurtPrefixSetTest
main(String[]) - Static method in class org.archive.util.SURTTest
run all the tests for MemQueueTest
main(String[]) - Static method in class org.archive.util.TextUtilsTest
run all the tests for TextUtilsTest
mainPart - Variable in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
The main part of this segment.
makeARCLocal(URLConnection) - Static method in class org.archive.io.arc.ARCReaderFactory
 
makeJobsTabularData(List) - Method in class org.archive.crawler.Heritrix
 
makeLongFPSet() - Method in class org.archive.util.fingerprint.LongFPSetCacheTest
 
makeLongFPSet() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
 
makeLongFPSet() - Method in class org.archive.util.fingerprint.MemLongFPSetTest
 
makeMetaline(String, String, String, String, String) - Method in class org.archive.io.arc.ARCWriter
 
makeQueue() - Method in class org.archive.queue.MemQueueTest
 
makeQueue() - Method in class org.archive.queue.QueueTestBase
The abstract subclass constructor.
makeSpace() - Method in class org.archive.util.AbstractLongFPSet
Make additional space to keep the load under the target loadFactor level.
makeSpace() - Method in class org.archive.util.fingerprint.LongFPSetCache
 
makeSpace() - Method in class org.archive.util.fingerprint.MemLongFPSet
 
MANIFEST_CONFIG_FILE - Static variable in class org.archive.crawler.framework.CrawlController
abbrieviation label for config files in manifest
MANIFEST_LOG_FILE - Static variable in class org.archive.crawler.framework.CrawlController
abbrieviation label for log files in manifest
MANIFEST_REPORT - Static variable in class org.archive.crawler.framework.CrawlController
 
MANIFEST_REPORT_FILE - Static variable in class org.archive.crawler.framework.CrawlController
abbrieviation label for report files in manifest
map - Variable in class org.archive.crawler.processor.CrawlMapper
Mapping of classKey ranges (as represented by their start) to crawlers (by abstract name/filename)
MAP - Static variable in class org.archive.crawler.settings.SettingsHandler
 
MapType - Class in org.archive.crawler.settings
This class represents a container of settings.
MapType(String, String) - Constructor for class org.archive.crawler.settings.MapType
Construct a new MapType object.
MapType(String, String, Class) - Constructor for class org.archive.crawler.settings.MapType
Construct a new MapType object.
MapTypeTest - Class in org.archive.crawler.settings
JUnit tests for MapType
MapTypeTest() - Constructor for class org.archive.crawler.settings.MapTypeTest
 
mark(int) - Method in class org.archive.io.RandomAccessInputStream
 
mark(int) - Method in class org.archive.io.ReplayInputStream
 
mark(int) - Method in class org.archive.io.RepositionableInputStream
 
markAsSeed() - Method in class org.archive.crawler.datamodel.CrawlURI
Mark this uri as being a seed.
markAsSeen(int, int) - Method in class org.archive.crawler.extractor.PDFParser
Note that an object (id/generation pair) has been seen by this parser so that it can be handled differently when it is encountered again.
markContentBegin(HttpConnection) - Method in class org.archive.httpclient.HttpRecorderMethod
 
markContentBegin() - Method in class org.archive.io.RecordingInputStream
 
markContentBegin() - Method in class org.archive.io.RecordingOutputStream
Remember the current position as the start of the "response body".
markContentBegin() - Method in class org.archive.util.HttpRecorder
Mark current position as the point where the HTTP headers end.
markPrerequisite(String, ProcessorChain) - Method in class org.archive.crawler.datamodel.CrawlURI
Do all actions associated with setting a CrawlURI as requiring a prerequisite.
markSupported() - Method in class org.archive.io.arc.ARCRecord
 
markSupported() - Method in class org.archive.io.RandomAccessInputStream
 
markSupported() - Method in class org.archive.io.ReplayInputStream
 
MASSAGEHOST_PATTERN - Static variable in class org.archive.net.UURI
 
match(Class) - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
match(Class, String) - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
matcherStack - Variable in class org.archive.extractor.RegexpJSLinkExtractor
 
matches(String, CharSequence) - Static method in class org.archive.util.TextUtils
Utility method using a precompiled pattern instead of using the matches method of the String class.
MatchesFilePatternDecideRule - Class in org.archive.crawler.deciderules
Compares suffix of a passed CrawlURI, UURI, or String against a regular expression pattern, applying its configured decision to all matches.
MatchesFilePatternDecideRule(String) - Constructor for class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
Usual constructor.
MatchesListRegExpDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexps.
MatchesListRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
Usual constructor.
MatchesRegExpDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexp.
MatchesRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.MatchesRegExpDecideRule
Usual constructor.
MAX_ATTR_VAL_LENGTH - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
MAX_INT_CHAR_WIDTH - Static variable in class org.archive.util.ArchiveUtils
 
MAX_METADATA_LINE_LENGTH - Static variable in interface org.archive.io.arc.ARCConstants
Maximum length for a metadata line.
MAX_OUTLINKS - Static variable in class org.archive.crawler.datamodel.CrawlURI
Protection against outlink overflow.
MAX_URL_LENGTH - Static variable in class org.archive.net.UURI
Consider URIs too long for IE as illegal.
maxEmbedHops - Variable in class org.archive.crawler.filter.TransclusionFilter
 
maximumNumberOfKeys() - Method in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
maximumNumberOfKeys() - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
Returns the maximum number of different keys this policy can create.
maxLinkHops - Variable in class org.archive.crawler.filter.HopsFilter
 
MaxLinkHopsSelfTest - Class in org.archive.crawler.selftest
Test the max-link-hops setting.
MaxLinkHopsSelfTest() - Constructor for class org.archive.crawler.selftest.MaxLinkHopsSelfTest
 
maxPathDepth - Variable in class org.archive.crawler.filter.PathDepthFilter
 
maxPending - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
size at which to force flush of pending items
maxReferralHops - Variable in class org.archive.crawler.filter.TransclusionFilter
 
maxSegLen - Variable in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
The maximum number of characters allowed in one file system path segment.
maxSpeculativeHops - Variable in class org.archive.crawler.filter.TransclusionFilter
 
maxTransHops - Variable in class org.archive.crawler.filter.HopsFilter
 
maxTransHops - Variable in class org.archive.crawler.filter.TransclusionFilter
 
MBEAN_SERVER_DELEGATE - Static variable in class org.archive.util.JmxUtils
 
MEDIUM - Static variable in class org.archive.crawler.datamodel.CandidateURI
Medium priority.
MemFPMergeUriUniqFilter - Class in org.archive.crawler.util
Crude all-in-memory FP-merging UriUniqFilter.
MemFPMergeUriUniqFilter() - Constructor for class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
MemLongFPSet - Class in org.archive.util.fingerprint
Open-addressing in-memory hash set for holding primitive long fingerprints.
MemLongFPSet() - Constructor for class org.archive.util.fingerprint.MemLongFPSet
 
MemLongFPSet(int, float) - Constructor for class org.archive.util.fingerprint.MemLongFPSet
 
MemLongFPSetTest - Class in org.archive.util.fingerprint
JUnit test suite for MemLongFPSet
MemLongFPSetTest(String) - Constructor for class org.archive.util.fingerprint.MemLongFPSetTest
Create a new MemLongFPSetTest object
MemQueue - Class in org.archive.queue
An in-memory implementation of a Queue.
MemQueue() - Constructor for class org.archive.queue.MemQueue
Create a new, empty MemQueue
MemQueueTest - Class in org.archive.queue
JUnit test suite for MemQueue
MemQueueTest(String) - Constructor for class org.archive.queue.MemQueueTest
Create a new MemQueueTest object
MemUriUniqFilter - Class in org.archive.crawler.util
A purely in-memory UriUniqFilter based on a HashSet, which remembers every full URI string it sees.
MemUriUniqFilter() - Constructor for class org.archive.crawler.util.MemUriUniqFilter
 
mergeDupAtLast - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
mergeDuplicateCount - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
message(String, int) - Method in class org.archive.crawler.CommandLineParser
Print message and then exit.
messageArguments - Variable in class org.archive.crawler.settings.Constraint.FailedCheck
 
MIDFETCH_ATTR_FILTERS - Static variable in class org.archive.crawler.fetcher.FetchHTTP
Filters to apply mid-fetch, just after receipt of the response headers before we start to download body.
MIMETYPE_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for mimetype field.
MIMETYPE_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header mimetype field.
mimeTypeBytes - Variable in class org.archive.crawler.admin.StatisticsTracker
 
mimeTypeDistribution - Variable in class org.archive.crawler.admin.StatisticsTracker
Keep track of the file types we see (mime type -> count)
MimetypeUtils - Class in org.archive.util
Class of mimetype utilities.
MimetypeUtils() - Constructor for class org.archive.util.MimetypeUtils
 
MimetypeUtilsTest - Class in org.archive.util
 
MimetypeUtilsTest() - Constructor for class org.archive.util.MimetypeUtilsTest
 
MINIMAL_GZIP_HEADER_LENGTH - Static variable in class org.archive.io.GzipHeader
Length of minimal GZIP header.
MINIMUM_RECORD_LENGTH - Static variable in interface org.archive.io.arc.ARCConstants
Minimum possible record length.
MirrorWriterProcessor - Class in org.archive.crawler.writer
Processor module that writes the results of successful fetches to files on disk.
MirrorWriterProcessor(String) - Constructor for class org.archive.crawler.writer.MirrorWriterProcessor
 
MirrorWriterProcessor.DirSegment - Class in org.archive.crawler.writer
This class represents one directory segment (component) of a URI path.
MirrorWriterProcessor.DirSegment(String, int, int, int, boolean, CrawlURI, Map, String, String, Set) - Constructor for class org.archive.crawler.writer.MirrorWriterProcessor.DirSegment
Creates a DirSegment.
MirrorWriterProcessor.EndSegment - Class in org.archive.crawler.writer
This class represents the last segment (component) of a URI path.
MirrorWriterProcessor.EndSegment(String, int, int, int, boolean, CrawlURI, Map, String, String, String, int, boolean) - Constructor for class org.archive.crawler.writer.MirrorWriterProcessor.EndSegment
Creates an EndSegment.
MirrorWriterProcessor.LumpyString - Class in org.archive.crawler.writer
This class represents a dynamically growable string consisting of substrings ("lumps") that are treated atomically.
MirrorWriterProcessor.LumpyString(String, int, int, int, int, Map, String) - Constructor for class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
Creates a LumpyString.
MirrorWriterProcessor.PathSegment - Class in org.archive.crawler.writer
This class represents one segment (component) of a URI path.
MirrorWriterProcessor.PathSegment(int, boolean, CrawlURI) - Constructor for class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
Creates a new PathSegment.
MirrorWriterProcessor.PathSegment.CaseInsensitiveFilenameFilter - Class in org.archive.crawler.writer
This class implements a FilenameFilter that matches by name, ignoring case.
MirrorWriterProcessor.PathSegment.CaseInsensitiveFilenameFilter(String) - Constructor for class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment.CaseInsensitiveFilenameFilter
Creates a CaseInsensitiveFilenameFilter.
MirrorWriterProcessor.URIToFileReturn - Class in org.archive.crawler.writer
This class is returned by uriToFile.
MirrorWriterProcessor.URIToFileReturn(String, String, int) - Constructor for class org.archive.crawler.writer.MirrorWriterProcessor.URIToFileReturn
Creates a URIToFileReturn.
MISC - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
MISC - Static variable in class org.archive.crawler.filter.FilePatternFilter
 
MISC_PATTERNS - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
MISC_PATTERNS - Static variable in class org.archive.crawler.filter.FilePatternFilter
 
mkdirs() - Method in class org.archive.crawler.writer.MirrorWriterProcessor.URIToFileReturn
Creates all directories in this path as needed.
ModuleAttributeInfo - Class in org.archive.crawler.settings
 
ModuleAttributeInfo(Type) - Constructor for class org.archive.crawler.settings.ModuleAttributeInfo
Construct a new instance of ModuleAttributeInfo.
ModuleAttributeInfo(ModuleAttributeInfo) - Constructor for class org.archive.crawler.settings.ModuleAttributeInfo
 
ModuleType - Class in org.archive.crawler.settings
Superclass of all modules that should be configurable.
ModuleType(String, String) - Constructor for class org.archive.crawler.settings.ModuleType
Creates a new ModuleType.
ModuleType(String) - Constructor for class org.archive.crawler.settings.ModuleType
Every subclass should implement this constructor
MOST_FAVORED - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
MOST_FAVORED_SET - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
MOTHER - Static variable in class org.archive.util.JmxUtils
Key for name of the Heritrix instance hosting a Job: i.e.
moveElementDown(String) - Method in class org.archive.crawler.settings.DataContainer
Move an attribute down one place in the list.
moveElementDown(CrawlerSettings, String) - Method in class org.archive.crawler.settings.MapType
Move an attribute down one place in the list.
moveElementUp(String) - Method in class org.archive.crawler.settings.DataContainer
Move an attribute up one place in the list.
moveElementUp(CrawlerSettings, String) - Method in class org.archive.crawler.settings.MapType
Move an attribute up one place in the list.
moveToNextGzipMember() - Method in class org.archive.io.GzippedInputStream
 
MULTIPLE_SLASHES - Static variable in class org.archive.net.UURIFactory
Pattern that looks for case of two or more slashes in a path.
multiThreadMode() - Method in class org.archive.crawler.framework.CrawlController
Go to back to regular multi thread mode, where all ToeThreads may proceed at once
mustBeCrawling() - Method in class org.archive.crawler.admin.CrawlJob
 

N

NAME - Static variable in class org.archive.util.JmxUtils
 
NAME_KEY - Static variable in class org.archive.configuration.Configuration
 
NAVLINK_HOP - Static variable in class org.archive.crawler.extractor.Link
navigation links, like A/@HREF
NAVLINK_MISC - Static variable in class org.archive.crawler.extractor.Link
stand-in value for navlink urls without other context
NBSP - Static variable in class org.archive.net.UURIFactory
 
needsImmediateScheduling() - Method in class org.archive.crawler.datamodel.CandidateURI
 
needsPromptRetry(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Checks if a recently completed CrawlURI that did not finish successfully needs to be retried immediately (processed again as soon as politeness allows.)
needsRetrying(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Checks if a recently completed CrawlURI that did not finish successfully needs to be retried (processed again after some time elapses)
needsRetrying(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Checks if a recently completed CrawlURI that did not finish successfully needs to be retried (processed again after some time elapses)
needsSoonScheduling() - Method in class org.archive.crawler.datamodel.CandidateURI
 
newCount - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
newDefaultInstance() - Static method in class org.archive.extractor.CharSequenceLinkExtractor
 
newDefaultInstance() - Static method in class org.archive.extractor.RegexpCSSLinkExtractor
 
newDefaultInstance() - Static method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
newDefaultInstance() - Static method in class org.archive.extractor.RegexpJSLinkExtractor
 
newFps - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
newFps - Variable in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
newFpsFile - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
newJob(CrawlJob, String, String, String, String, int) - Method in class org.archive.crawler.admin.CrawlJobHandler
Creates a new job.
newJob(File, String, String, String) - Method in class org.archive.crawler.admin.CrawlJobHandler
Creates a new job.
NEWLINE - Static variable in class org.archive.net.UURIFactory
 
newline() - Method in class org.archive.util.PaddingStringBuffer
Forces a new line in the buffer.
newProfile(CrawlJob, String, String, String) - Method in class org.archive.crawler.admin.CrawlJobHandler
Creates a new profile.
next() - Method in interface org.archive.crawler.framework.Frontier
Get the next URI that should be processed.
next() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
next() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns the 'top' URI in the AdaptiveRevisitHostQueue.
next() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return the next CrawlURI to be processed (and presumably visited/fetched) by a a worker thread.
next() - Method in class org.archive.crawler.settings.ComplexType.MBeanAttributeInfoIterator
 
next() - Method in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
The common parts of next() across different types of iterators
next - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
next() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
Return the next item.
next - Variable in class org.archive.extractor.CharSequenceLinkExtractor
 
next() - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
next - Variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
next() - Method in class org.archive.io.arc.ARCReader.ARCRecordIterator
Tries to move to next record if we get ARCReader.RecoverableIOException.
next() - Method in class org.archive.util.iterator.CompositeIterator
 
next - Variable in class org.archive.util.iterator.LookaheadIterator
 
next() - Method in class org.archive.util.iterator.LookaheadIterator
Return the next item.
nextEntry() - Method in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
nextFlushAllowableAfter - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
time-based throttle on flush-merge operations
nextIsValid - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
nextItemNumber - Variable in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
nextKey - Variable in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
Strong reference needed to avoid disappearance of key between hasNext and next
nextLink() - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
nextLink() - Method in interface org.archive.extractor.LinkExtractor
Alternative to Iterator.next() which returns type Link.
nextLong() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
nextOrdinal - Variable in class org.archive.crawler.frontier.AbstractFrontier
ordinal numbers to assign to created CrawlURIs
nextProcessor() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the next processor to process this URI.
nextProcessorChain() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the processor chain that should be processing this URI after the current chain is finished with it.
nextReadyTime - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Time (in milliseconds) when the HQ will next be ready to issue a URI for processing.
nextSerialNumber - Variable in class org.archive.crawler.framework.ToePool
 
NO_TYPE_MIMETYPE - Static variable in class org.archive.util.MimetypeUtils
The 'no-type' content-type.
noChangeExpected(String) - Method in class org.archive.net.UURIFactoryTest
 
NoGzipMagicException - Exception in org.archive.io
 
NoGzipMagicException() - Constructor for exception org.archive.io.NoGzipMagicException
 
NON_HTML_PATH_EXTENSION - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
NON_HTML_PATH_EXTENSION - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
NONWHITESPACE_ENTRY_TRAILING_COMMENT - Static variable in class org.archive.util.iterator.RegexpLineIterator
 
NoopUriUniqFilter - Class in org.archive.crawler.util
A UriUniqFilter that doesn't actually provide any uniqueness filter on presented items: all are passed through.
NoopUriUniqFilter() - Constructor for class org.archive.crawler.util.NoopUriUniqFilter
 
NORMAL - Static variable in class org.archive.crawler.datamodel.CandidateURI
Normal/low priority.
NORMAL - Static variable in class org.archive.httpclient.ConfigurableX509TrustManager
Normal jsse behavior.
note(String) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Note item as seen, without passing through to receiver.
note(String) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
note(String) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
noteAboutToEmit(CrawlURI, WorkQueue) - Method in class org.archive.crawler.frontier.AbstractFrontier
Perform fixups on a CrawlURI about to be returned via next().
noteAboutToEmit(CrawlURI, WorkQueue) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
noteAccess(long) - Method in class org.archive.util.fingerprint.LongFPSetCache
 
noteError(int) - Method in class org.archive.crawler.frontier.WorkQueue
Note an error and assess an extra penalty.
noteExhausted() - Method in class org.archive.crawler.scope.SeedFileIterator
Clean-up when hasNext() has returned null: close open files.
noteExhausted() - Method in class org.archive.util.iterator.TransformingIteratorWrapper
Any cleanup to occur when hasNext() is about to return false
noteExtractError(IOException, UURI, CharSequence) - Method in interface org.archive.extractor.ExtractErrorListener
Callback to report an extraction error.
noteStart() - Method in class org.archive.crawler.framework.AbstractTracker
Notify tracker that crawl has begun.
noteStart() - Method in interface org.archive.crawler.framework.StatisticsTracking
Start the tracker's crawl timing.
NotMatchesFilePatternDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs which do *not* match the supplied (file-pattern) regexp.
NotMatchesFilePatternDecideRule(String) - Constructor for class org.archive.crawler.deciderules.NotMatchesFilePatternDecideRule
Usual constructor.
NotMatchesListRegExpDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs which do *not* match the supplied regexp.
NotMatchesListRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.NotMatchesListRegExpDecideRule
Usual constructor.
NotMatchesRegExpDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs which do *not* match the supplied regexp.
NotMatchesRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.NotMatchesRegExpDecideRule
Usual constructor.
NotOnDomainsDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs that are *not* in one of the domains in the configured set of domains, filled from the seed set.
NotOnDomainsDecideRule(String) - Constructor for class org.archive.crawler.deciderules.NotOnDomainsDecideRule
Usual constructor.
NotOnHostsDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs that are *not* on one of the hosts in the configured set of hosts, filled from the seed set.
NotOnHostsDecideRule(String) - Constructor for class org.archive.crawler.deciderules.NotOnHostsDecideRule
Usual constructor.
NotSurtPrefixedDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs that, when expressed in SURT form, do *not* begin with one of the prefixes in the configured set.
NotSurtPrefixedDecideRule(String) - Constructor for class org.archive.crawler.deciderules.NotSurtPrefixedDecideRule
Usual constructor.
NUMBER_OF_WEIGHTS - Static variable in class org.archive.util.BloomFilter32bit
The number of weights used to create hash functions.
NUMBER_OF_WEIGHTS - Static variable in class org.archive.util.BloomFilter32bitSplit
The number of weights used to create hash functions.
NUMBER_OF_WEIGHTS - Static variable in class org.archive.util.BloomFilter32bp2
The number of weights used to create hash functions.
NUMBER_OF_WEIGHTS - Static variable in class org.archive.util.BloomFilter32bp2Split
The number of weights used to create hash functions.
NUMBER_OF_WEIGHTS - Static variable in class org.archive.util.BloomFilter64bit
The number of weights used to create hash functions.
numberOfCURIsHandled - Variable in class org.archive.crawler.extractor.ExtractorHTML
 
numberOfCURIsHandled - Variable in class org.archive.crawler.extractor.ExtractorHTTP
 
numberOfCURIsHandled - Variable in class org.archive.crawler.extractor.ExtractorJS
 
numberOfCURIsHandled - Variable in class org.archive.crawler.extractor.ExtractorPDF
 
numberOfCURIsHandled - Variable in class org.archive.crawler.extractor.ExtractorSWF
 
numberOfCURIsHandled - Variable in class org.archive.crawler.extractor.ExtractorUniversal
 
numberOfCURIsHandled - Variable in class org.archive.crawler.processor.Test
 
numberOfLinksExtracted - Variable in class org.archive.crawler.extractor.ExtractorHTML
 
numberOfLinksExtracted - Variable in class org.archive.crawler.extractor.ExtractorHTTP
 
numberOfLinksExtracted - Static variable in class org.archive.crawler.extractor.ExtractorJS
 
numberOfLinksExtracted - Variable in class org.archive.crawler.extractor.ExtractorPDF
 
numberOfLinksExtracted - Variable in class org.archive.crawler.extractor.ExtractorSWF
 
numberOfLinksExtracted - Variable in class org.archive.crawler.extractor.ExtractorUniversal
 
numberOfLinksExtracted - Variable in class org.archive.crawler.processor.Test
 

O

OBJECT - Static variable in class org.archive.crawler.settings.SettingsHandler
 
ObjectPlusFilesInputStream - Class in org.archive.io
Enhanced ObjectOutputStream with support for restoring files that had been saved, in parallel with object serialization.
ObjectPlusFilesInputStream(InputStream, File) - Constructor for class org.archive.io.ObjectPlusFilesInputStream
Instantiate over the given stream and using the supplied auxiliary storage directory.
ObjectPlusFilesOutputStream - Class in org.archive.io
Enhanced ObjectOutputStream which maintains (a stack of) auxiliary directories and offers convenience methods for serialized objects to save their related disk files alongside their serialized version.
ObjectPlusFilesOutputStream(OutputStream, File) - Constructor for class org.archive.io.ObjectPlusFilesOutputStream
Constructor
objectToEntry(Object, DatabaseEntry) - Method in class org.archive.crawler.frontier.RecyclingSerialBinding
Copies superclass simply to allow different source for FastOoutputStream.
OCCUPIED_SUFFIX - Static variable in class org.archive.io.arc.ARCWriter
Suffix given to files currently being written by Heritrix.
OFFSET_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for offset field.
OFFSET_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header Offset field.
oldFps - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
OnDomainsDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs that are on one of the domains in the configured set of domains, filled from the seed set.
OnDomainsDecideRule(String) - Constructor for class org.archive.crawler.deciderules.OnDomainsDecideRule
Usual constructor.
oneLineReportThreads() - Method in class org.archive.crawler.framework.CrawlController
 
OneLineSimpleLogger - Class in org.archive.util
Logger that writes entry on one line with less verbose date.
OneLineSimpleLogger() - Constructor for class org.archive.util.OneLineSimpleLogger
 
OnHostsDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs that are on one of the hosts in the configured set of hosts, filled from the seed set.
OnHostsDecideRule(String) - Constructor for class org.archive.crawler.deciderules.OnHostsDecideRule
Usual constructor.
OP_CHECKPOINT - Static variable in class org.archive.util.JEMBeanHelper
 
OP_CLEAN - Static variable in class org.archive.util.JEMBeanHelper
 
OP_CLOSE - Static variable in class org.archive.util.JEApplicationMBean
This MBean provides a close operation to release the JE environment.
OP_DB_NAMES - Static variable in class org.archive.util.JEMBeanHelper
 
OP_DB_STAT - Static variable in class org.archive.util.JEMBeanHelper
 
OP_ENV_STAT - Static variable in class org.archive.util.JEMBeanHelper
 
OP_ENV_STAT_STR - Static variable in class org.archive.util.JEMBeanHelper
 
OP_EVICT - Static variable in class org.archive.util.JEMBeanHelper
 
OP_LOCK_STAT - Static variable in class org.archive.util.JEMBeanHelper
 
OP_LOCK_STAT_STR - Static variable in class org.archive.util.JEMBeanHelper
 
OP_OPEN - Static variable in class org.archive.util.JEApplicationMBean
This MBean provides an open operation to open the JE environment.
OP_SYNC - Static variable in class org.archive.util.JEMBeanHelper
 
OP_TXN_STAT - Static variable in class org.archive.util.JEMBeanHelper
 
open(Environment, DatabaseConfig) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
OPEN - Static variable in class org.archive.httpclient.ConfigurableX509TrustManager
Trust anything given us.
open(InputStream) - Method in class org.archive.io.RecordingInputStream
 
open() - Method in class org.archive.io.RecordingOutputStream
Wrap the given stream, both recording and passing along any data written to this RecordingOutputStream.
open(OutputStream) - Method in class org.archive.io.RecordingOutputStream
Wrap the given stream, both recording and passing along any data written to this RecordingOutputStream.
openConnection(URL) - Method in class org.archive.net.rsync.Handler
 
openDatabase(Environment, String) - Method in class org.archive.util.CachedBdbMap
 
openDbCount - Variable in class org.archive.util.CachedBdbMap.DbEnvironmentEntry
 
ORDER_FILE_NAME - Static variable in class org.archive.crawler.admin.CrawlJobHandler
 
ordinal - Variable in class org.archive.crawler.datamodel.CrawlURI
Monotonically increasing number within a crawl; useful for tending towards breadth-first ordering.
OrFilter - Class in org.archive.crawler.filter
OrFilter allows any number of other filters to be set up inside it, as child elements.
OrFilter(String, String) - Constructor for class org.archive.crawler.filter.OrFilter
 
OrFilter(String) - Constructor for class org.archive.crawler.filter.OrFilter
 
org.archive.configuration - package org.archive.configuration
Provides application configuration.
org.archive.configuration.registry - package org.archive.configuration.registry
 
org.archive.configuration.store - package org.archive.configuration.store
 
org.archive.crawler - package org.archive.crawler
Introduction to Heritrix.
org.archive.crawler.admin - package org.archive.crawler.admin
Contains classes that the web UI uses to monitor and control crawls.
org.archive.crawler.admin.ui - package org.archive.crawler.admin.ui
 
org.archive.crawler.datamodel - package org.archive.crawler.datamodel
 
org.archive.crawler.datamodel.credential - package org.archive.crawler.datamodel.credential
Contains html form login and basic and digest credentials used by Heritrix logging into sites.
org.archive.crawler.deciderules - package org.archive.crawler.deciderules
Provides classes for a simple decision rules framework.
org.archive.crawler.event - package org.archive.crawler.event
 
org.archive.crawler.extractor - package org.archive.crawler.extractor
 
org.archive.crawler.fetcher - package org.archive.crawler.fetcher
 
org.archive.crawler.filter - package org.archive.crawler.filter
 
org.archive.crawler.framework - package org.archive.crawler.framework
 
org.archive.crawler.framework.exceptions - package org.archive.crawler.framework.exceptions
 
org.archive.crawler.frontier - package org.archive.crawler.frontier
 
org.archive.crawler.io - package org.archive.crawler.io
 
org.archive.crawler.postprocessor - package org.archive.crawler.postprocessor
 
org.archive.crawler.prefetch - package org.archive.crawler.prefetch
 
org.archive.crawler.processor - package org.archive.crawler.processor
 
org.archive.crawler.scope - package org.archive.crawler.scope
 
org.archive.crawler.selftest - package org.archive.crawler.selftest
Provides the client-side aspect of the heritrix integration self test.
org.archive.crawler.settings - package org.archive.crawler.settings
Provides classes for the settings framework.
org.archive.crawler.settings.refinements - package org.archive.crawler.settings.refinements
 
org.archive.crawler.url - package org.archive.crawler.url
 
org.archive.crawler.url.canonicalize - package org.archive.crawler.url.canonicalize
 
org.archive.crawler.util - package org.archive.crawler.util
 
org.archive.crawler.writer - package org.archive.crawler.writer
 
org.archive.extractor - package org.archive.extractor
 
org.archive.httpclient - package org.archive.httpclient
Provides specializations on apache jakarta commons httpclient.
org.archive.io - package org.archive.io
 
org.archive.io.arc - package org.archive.io.arc
ARC file reading and writing.
org.archive.net - package org.archive.net
 
org.archive.net.rsync - package org.archive.net.rsync
 
org.archive.queue - package org.archive.queue
 
org.archive.util - package org.archive.util
 
org.archive.util.fingerprint - package org.archive.util.fingerprint
 
org.archive.util.iterator - package org.archive.util.iterator
 
os - Variable in class org.archive.io.RecyclingFastBufferedOutputStream
The underlying output stream.
outLinks - Variable in class org.archive.crawler.datamodel.CrawlURI
all discovered outbound Links (navlinks, embeds, etc.)
outlinks(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorTool
 
outlinksSize() - Method in class org.archive.crawler.datamodel.CrawlURI
 
outOfScope(CandidateURI) - Method in class org.archive.crawler.framework.Scoper
Called when a CandidateUri is ruled out of scope.
outOfScope(CandidateURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
 
outOfScope(CandidateURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
Called when a CandidateUri is ruled out of scope.
output(String, boolean, String, boolean) - Static method in class org.archive.io.arc.ARCReader
Write out the arcfile.
outputARCRecord(ARCReader, ARCRecord, String) - Static method in class org.archive.io.arc.ARCReader
Output passed record using passed format specifier.
outputARCRecordCdx(ARCRecord) - Static method in class org.archive.io.arc.ARCReader
 
outputTemplate - Variable in class org.archive.util.iterator.RegexpLineIterator
 
outputWrap(OutputStream) - Method in class org.archive.util.HttpRecorder
Wrap the provided stream with the internal RecordingOutputStream Its safe to call multiple times.
overMaxRetries(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
OverrideTest - Class in org.archive.crawler.settings
Test the concept of overrides.
OverrideTest() - Constructor for class org.archive.crawler.settings.OverrideTest
 

P

PaddingStringBuffer - Class in org.archive.util
StringBuffer-like utility which can add spaces to reach a certain column.
PaddingStringBuffer() - Constructor for class org.archive.util.PaddingStringBuffer
Create a new PaddingStringBuffer
PaddingStringBufferTest - Class in org.archive.util
JUnit test suite for PaddingStringBuffer
PaddingStringBufferTest(String) - Constructor for class org.archive.util.PaddingStringBufferTest
Create a new PaddingStringBufferTest object
padTo(int, int) - Static method in class org.archive.util.ArchiveUtils
Convert an int to a String, and pad it to pad spaces.
padTo(String, int) - Static method in class org.archive.util.ArchiveUtils
Pad the given String to pad characters wide by pre-pending spaces.
padTo(String, int, char) - Static method in class org.archive.util.ArchiveUtils
Pad the given String to pad characters wide by pre-pending padChar.
padTo(int) - Method in class org.archive.util.PaddingStringBuffer
Pad to a given column.
parse(BufferedReader, LinkedList, Map) - Static method in class org.archive.crawler.datamodel.Robotstxt
 
parse(InputSource) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
parse(String) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
parse12DigitDate(String) - Static method in class org.archive.util.ArchiveUtils
Utility function for parsing arc-style date stamps in the format yyyMMddHHmm.
parse14DigitDate(String) - Static method in class org.archive.util.ArchiveUtils
Utility function for parsing arc-style date stamps in the format yyyMMddHHmmss.
parse17DigitDate(String) - Static method in class org.archive.util.ArchiveUtils
Utility function for parsing arc-style date stamps in the format yyyMMddHHmmssSSS.
parseArcFilename(String) - Static method in class org.archive.io.arc.ARCUtils
 
parseAuthority(String, boolean) - Method in class org.archive.net.LaxURI
Coalesce the _host and _authority fields where possible.
PASS - Static variable in class org.archive.crawler.deciderules.DecideRule
 
patchLogging() - Static method in class org.archive.crawler.Heritrix
If the user hasn't altered the default logging parameters, tighten them up somewhat: some of our libraries are way too verbose at the INFO or WARNING levels.
PathDepthFilter - Class in org.archive.crawler.filter
Accepts all urls passed in with a path depth less or equal than the max-path-depth value.
PathDepthFilter(String) - Constructor for class org.archive.crawler.filter.PathDepthFilter
 
PathologicalPathDecideRule - Class in org.archive.crawler.deciderules
Rule REJECTs any URI which contains an excessive number of identical, consecutive path-segments (eg http://example.com/a/a/a/boo.html == 3 '/a' segments)
PathologicalPathDecideRule(String) - Constructor for class org.archive.crawler.deciderules.PathologicalPathDecideRule
Constructs a new PathologicalPathFilter.
PathologicalPathFilter - Class in org.archive.crawler.filter
Checks if a URI contains a repeated pattern.
PathologicalPathFilter(String) - Constructor for class org.archive.crawler.filter.PathologicalPathFilter
Constructs a new PathologicalPathFilter.
PathologicalPathFilterTest - Class in org.archive.crawler.filter
 
PathologicalPathFilterTest() - Constructor for class org.archive.crawler.filter.PathologicalPathFilterTest
 
PathScope - Class in org.archive.crawler.scope
A core CrawlScope suitable for the most common crawl needs.
PathScope(String) - Constructor for class org.archive.crawler.scope.PathScope
 
pattern - Variable in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
PatternMatcherRecycler - Class in org.archive.util
Utility class to retain a compiled Pattern and multiple corresponding Matcher instances for reuse.
PatternMatcherRecycler(Pattern) - Constructor for class org.archive.util.PatternMatcherRecycler
 
pause() - Method in class org.archive.crawler.admin.CrawlJob
 
pause() - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that it should not release any URIs, instead holding all threads, until instructed otherwise.
pause() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
pause() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
pauseJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
Cause the current job to pause.
PDFParser - Class in org.archive.crawler.extractor
Supports PDF parsing operations.
PDFParser(String) - Constructor for class org.archive.crawler.extractor.PDFParser
 
PDFParser(byte[]) - Constructor for class org.archive.crawler.extractor.PDFParser
 
peek() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns the URI with the earliest time of next processing.
peek(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
Return the topmost queue item -- and remember it, such that even later higher-priority inserts don't change it.
peek() - Method in class org.archive.queue.MemQueue
 
peek() - Method in interface org.archive.queue.Queue
Give the top object in the queue, leaving it in place to be returned by future peek() or dequeue() invocations.
peek() - Method in interface org.archive.queue.Stack
 
peekItem(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
peekItem(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
Returns first item from queue (does not delete)
pend(long, CandidateURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Place the given FP/CandidateURI pair into the pending set, awaiting a merge to determine if it's actually accepted.
pendDupAtLast - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
pendDuplicateCount - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
pending() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Count of items added, but not yet filtered in or out.
pending() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
pending() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
pendingSet - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
items awaiting merge TODO: consider only sorting just pre-merge TODO: consider using a fastutil long->Object class TODO: consider actually writing items to disk file, as in Najork/Heydon
pendingUris - Variable in class org.archive.crawler.frontier.BdbFrontier
all URIs scheduled to be crawled
PERCENT_SIGN - Static variable in class org.archive.net.UURIFactory
 
percentOfDiscoveredUrisCompleted() - Method in class org.archive.crawler.admin.StatisticsTracker
This returns the number of completed URIs as a percentage of the total number of URIs encountered (should be inverse to the discovery curve)
performHeritrixShutDown() - Static method in class org.archive.crawler.Heritrix
Exit program.
performHeritrixShutDown(int) - Static method in class org.archive.crawler.Heritrix
Exit program.
PIPE - Static variable in class org.archive.net.UURIFactory
 
PIPE_PATTERN - Static variable in class org.archive.net.UURIFactory
 
Pointer - Class in org.archive.configuration
Utility class to build Configuration Pointers.
Pointer(ObjectName) - Constructor for class org.archive.configuration.Pointer
 
Pointer(CompositeData) - Constructor for class org.archive.configuration.Pointer
 
policyFor(CrawlerSettings, BufferedReader, RobotsHonoringPolicy) - Static method in class org.archive.crawler.datamodel.RobotsExclusionPolicy
 
politenessDelayFor(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Update any scheduling structures with the new information in this CrawlURI.
pop() - Method in interface org.archive.queue.Stack
Remove and return item from top of Stack
popAuxiliaryDirectory() - Method in class org.archive.io.ObjectPlusFilesInputStream
Discard the top auxiliary directory.
popAuxiliaryDirectory() - Method in class org.archive.io.ObjectPlusFilesOutputStream
Remove the top subdirectory.
populate(CrawlURI, HttpClient, HttpMethod, String) - Method in class org.archive.crawler.datamodel.credential.Credential
 
populate(CrawlURI, HttpClient, HttpMethod, String) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
populate(CrawlURI, HttpClient, HttpMethod, String) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
PortnumberCriteria - Class in org.archive.crawler.settings.refinements
A refinement criterion that checks if a URI matches a specific port number.
PortnumberCriteria() - Constructor for class org.archive.crawler.settings.refinements.PortnumberCriteria
Create a new instance of PortnumberCriteria.
PortnumberCriteria(String) - Constructor for class org.archive.crawler.settings.refinements.PortnumberCriteria
Create a new instance of PortnumberCriteria.
PORTREGEX - Static variable in class org.archive.net.UURIFactory
Authority port number regex.
pos - Variable in class org.archive.io.RecyclingFastBufferedOutputStream
The current position in the buffer.
position(long) - Method in class org.archive.io.GzippedInputStream
Seek to passed offset.
position() - Method in class org.archive.io.GzippedInputStream
 
position() - Method in class org.archive.io.RandomAccessInputStream
 
position(long) - Method in class org.archive.io.RandomAccessInputStream
 
position(long) - Method in class org.archive.io.RepositionableInputStream
 
position() - Method in class org.archive.io.RepositionableInputStream
 
postDeregister() - Method in class org.archive.configuration.Configuration
 
postDeregister() - Method in class org.archive.crawler.admin.CrawlJob
 
postDeregister() - Method in class org.archive.crawler.Heritrix
 
postRegister(Boolean) - Method in class org.archive.configuration.Configuration
 
postRegister(Boolean) - Method in class org.archive.crawler.admin.CrawlJob
 
postRegister(Boolean) - Method in class org.archive.crawler.Heritrix
 
postRestoreTasks - Variable in class org.archive.io.ObjectPlusFilesInputStream
 
power - Variable in class org.archive.util.BloomFilter32bp2
the power-of-two that m is
power - Variable in class org.archive.util.BloomFilter32bp2Split
the power-of-two that m is
PreconditionEnforcer - Class in org.archive.crawler.prefetch
Ensures the preconditions for a fetch -- such as DNS lookup or acquiring a robots.txt policy -- are satisfied before a URI is passed to subsequent stages.
PreconditionEnforcer(String) - Constructor for class org.archive.crawler.prefetch.PreconditionEnforcer
 
preDeregister() - Method in class org.archive.configuration.Configuration
 
preDeregister() - Method in class org.archive.crawler.admin.CrawlJob
 
preDeregister() - Method in class org.archive.crawler.Heritrix
 
PredicatedDecideRule - Class in org.archive.crawler.deciderules
Rule which applies the configured decision only if a test evaluates to true.
PredicatedDecideRule(String) - Constructor for class org.archive.crawler.deciderules.PredicatedDecideRule
 
prefixFromPlain(String) - Static method in class org.archive.util.SurtPrefixSet
Given a plain URI or hostname/hostname+path, deduce an implied SURT prefix from it.
PreJ15Utils - Class in org.archive.util
A collection of utility methods doing things that are easier in Java 1.5.
PreJ15Utils() - Constructor for class org.archive.util.PreJ15Utils
 
preNext(long) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
prepareHeritrixShutDown() - Static method in class org.archive.crawler.Heritrix
Prepars for program shutdown.
prepend(char) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
Prepends one character, as a lump, to this string.
preRegister(MBeanServer, ObjectName) - Method in class org.archive.configuration.Configuration
 
preRegister(MBeanServer, ObjectName) - Method in class org.archive.crawler.admin.CrawlJob
 
preRegister(MBeanServer, ObjectName) - Method in class org.archive.crawler.Heritrix
 
PREREQ_HOP - Static variable in class org.archive.crawler.extractor.Link
implied prerequisite links, like dns or robots
PREREQ_MISC - Static variable in class org.archive.crawler.extractor.Link
stanf-in value for prerequisite without other context
PrerequisiteAcceptDecideRule - Class in org.archive.crawler.deciderules
Rule which ACCEPTs all 'prerequisite' URIs (those with a 'P' in the last hopsPath position).
PrerequisiteAcceptDecideRule(String) - Constructor for class org.archive.crawler.deciderules.PrerequisiteAcceptDecideRule
 
Preselector - Class in org.archive.crawler.prefetch
If set to recheck the crawl's scope, gives a yes/no on whether a CrawlURI should be processed at all.
Preselector(String) - Constructor for class org.archive.crawler.prefetch.Preselector
Constructor.
primaryKeyBinding - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
A binding for the serialization of the primary key (URI string)
primaryUriDB - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Database containing the URI priority queue, indexed by the the URI string.
printOutSeeds(SettingsHandler, String) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Print complete seeds list on passed in PrintWriter.
printOutSeeds(SettingsHandler, Writer) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Print complete seeds list on passed in PrintWriter.
printStackTrace() - Method in exception org.archive.io.arc.ARCReader.RecoverableIOException
 
printStackTrace(PrintStream) - Method in exception org.archive.io.arc.ARCReader.RecoverableIOException
 
printStackTrace(PrintWriter) - Method in exception org.archive.io.arc.ARCReader.RecoverableIOException
 
printUsage(PrintWriter, int, String) - Method in class org.archive.crawler.CommandLineParser.HeritrixHelpFormatter
 
printUsage(PrintWriter, int, String, Options) - Method in class org.archive.crawler.CommandLineParser.HeritrixHelpFormatter
 
PRIORITY_AVERAGE - Static variable in class org.archive.crawler.admin.CrawlJob
average
PRIORITY_CRITICAL - Static variable in class org.archive.crawler.admin.CrawlJob
highest
PRIORITY_HIGH - Static variable in class org.archive.crawler.admin.CrawlJob
high
PRIORITY_LOW - Static variable in class org.archive.crawler.admin.CrawlJob
low
PRIORITY_MINIMAL - Static variable in class org.archive.crawler.admin.CrawlJob
lowest
process(CrawlURI) - Method in class org.archive.crawler.framework.Processor
Perform processing on the given CrawlURI.
processBdbLogs(File, String) - Method in class org.archive.crawler.framework.CrawlController
 
processedBytesAfterLastEmittedURI - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
processedDocsPerSec() - Method in class org.archive.crawler.admin.StatisticsTracker
 
processedDocsPerSec() - Method in interface org.archive.crawler.framework.StatisticsTracking
Returns the number of documents that have been processed per second over the life of the crawl (as of last snapshot)
processedKBPerSec() - Method in class org.archive.crawler.admin.StatisticsTracker
 
processedKBPerSec() - Method in interface org.archive.crawler.framework.StatisticsTracking
Calculates the rate that data, in kb, has been processed over the life of the crawl (as of last snapshot.)
processedSeedsRecords - Variable in class org.archive.crawler.admin.StatisticsTracker
Record of seeds' latest actions.
processEmbed(CrawlURI, CharSequence, CharSequence) - Method in class org.archive.crawler.extractor.ExtractorHTML
 
processEmbed(CrawlURI, CharSequence, CharSequence, char) - Method in class org.archive.crawler.extractor.ExtractorHTML
 
processEmbed(CharSequence, CharSequence) - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
processGeneralTag(CrawlURI, CharSequence, CharSequence) - Method in class org.archive.crawler.extractor.ExtractorHTML
 
processGeneralTag(CharSequence, CharSequence) - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
processingCleanup() - Method in class org.archive.crawler.datamodel.CrawlURI
Clean up after a run through the processing chain.
processingUriDB - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
A database containing those URIs that are currently being processed.
processLink(CrawlURI, CharSequence, CharSequence) - Method in class org.archive.crawler.extractor.ExtractorHTML
Handle generic HREF cases.
processLink(CharSequence, CharSequence) - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
processMeta(CrawlURI, CharSequence) - Method in class org.archive.crawler.extractor.ExtractorHTML
Process metadata tags.
processMeta(CharSequence) - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
Processor - Class in org.archive.crawler.framework
Base class for URI processing classes.
Processor(String, String) - Constructor for class org.archive.crawler.framework.Processor
 
PROCESSOR_PTR_ATTRIBUTE_NAME - Static variable in class org.archive.configuration.registry.TestProcessor
 
ProcessorChain - Class in org.archive.crawler.framework
This class groups together a number of processors that logically fit together.
ProcessorChain(MapType) - Constructor for class org.archive.crawler.framework.ProcessorChain
Construct a new processor chain.
ProcessorChainList - Class in org.archive.crawler.framework
A list of all the ProcessorChains.
ProcessorChainList(CrawlOrder) - Constructor for class org.archive.crawler.framework.ProcessorChainList
Constructs a new ProcessorChainList.
processorCount() - Method in class org.archive.crawler.framework.ProcessorChainList
Get the total number of all processors in all the chains.
PROCESSORS_REPORT - Static variable in class org.archive.crawler.framework.CrawlController
 
processScript(CrawlURI, CharSequence, int) - Method in class org.archive.crawler.extractor.AggressiveExtractorHTML
 
processScript(CrawlURI, CharSequence, int) - Method in class org.archive.crawler.extractor.ExtractorHTML
 
processScript(CharSequence, int) - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
processScriptCode(CrawlURI, CharSequence) - Method in class org.archive.crawler.extractor.ExtractorHTML
 
processScriptCode(CharSequence) - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
processStyle(CrawlURI, CharSequence, int) - Method in class org.archive.crawler.extractor.ExtractorHTML
Process style text.
processStyle(CharSequence, int) - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
processStyleCode(CrawlURI, CharSequence, CrawlController) - Static method in class org.archive.crawler.extractor.ExtractorCSS
 
ProcessUtils - Class in org.archive.util
Class to run an external process.
ProcessUtils() - Constructor for class org.archive.util.ProcessUtils
 
ProcessUtils.ProcessResult - Class in org.archive.util
Data structure to hold result of a process exec.
ProcessUtils.ProcessResult(String[], int, String, String) - Constructor for class org.archive.util.ProcessUtils.ProcessResult
 
ProcessUtils.StreamGobbler - Class in org.archive.util
Thread to gobble up an output stream.
ProcessUtils.StreamGobbler(InputStream, String) - Constructor for class org.archive.util.ProcessUtils.StreamGobbler
 
processXml(CrawlURI, CharSequence, CrawlController) - Static method in class org.archive.crawler.extractor.ExtractorXML
 
profileLog - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
profileLog(String) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
profileLog - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
 
profileLog(String) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
PROFILES_DIR_NAME - Static variable in class org.archive.crawler.admin.CrawlJobHandler
Name of the profiles directory.
progressStatisticsEvent(EventObject) - Method in class org.archive.crawler.admin.CrawlJob.MBeanCrawlController
 
progressStatisticsEvent(EventObject) - Method in class org.archive.crawler.admin.StatisticsTracker
 
progressStatisticsEvent(EventObject) - Method in class org.archive.crawler.framework.AbstractTracker
A method for logging current crawler state.
progressStatisticsEvent(EventObject) - Method in class org.archive.crawler.framework.CrawlController
Called whenever progress statistics logging event.
progressStatisticsLegend() - Method in class org.archive.crawler.framework.AbstractTracker
 
progressStatisticsLegend() - Method in interface org.archive.crawler.framework.StatisticsTracking
 
progressStatisticsLegend(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
 
progressStatisticsLegend(PrintWriter) - Method in interface org.archive.util.ProgressStatisticsReporter
 
progressStatisticsLine(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
 
progressStatisticsLine(PrintWriter) - Method in interface org.archive.util.ProgressStatisticsReporter
 
ProgressStatisticsReporter - Interface in org.archive.util
 
PropertyUtils - Class in org.archive.util
 
PropertyUtils() - Constructor for class org.archive.util.PropertyUtils
 
PTR_ARRAY_TYPE - Static variable in class org.archive.configuration.Configuration
Make a Pointer ArrayType used later in subclass definitions.
publish(LogRecord) - Method in class org.archive.io.SinkHandler
 
push(Object) - Method in interface org.archive.queue.Stack
Add object to top of Stack
pushAuxiliaryDirectory(String) - Method in class org.archive.io.ObjectPlusFilesInputStream
Push another default storage directory for use until popped.
pushAuxiliaryDirectory(String) - Method in class org.archive.io.ObjectPlusFilesOutputStream
Add another subdirectory for any file-capture needs during the current serialization.
put(CrawlURI) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Put the given CrawlURI in at the appropriate place.
put(Object, Object) - Method in class org.archive.crawler.settings.DataContainer
 
put(String, MBeanAttributeInfo, Object) - Method in class org.archive.crawler.settings.DataContainer
 
put(String, CrawlerSettings) - Method in class org.archive.crawler.settings.SoftSettingsHash
Associates the specified settings object with the specified key in this hash.
put(SoftSettingsHash.SettingsEntry) - Method in class org.archive.crawler.settings.SoftSettingsHash
 
put(Object, Object) - Method in class org.archive.util.CachedBdbMap
 
putInt(String, int) - Method in class org.archive.crawler.datamodel.CandidateURI
 
putLong(String, long) - Method in class org.archive.crawler.datamodel.CandidateURI
 
putObject(String, Object) - Method in class org.archive.crawler.datamodel.CandidateURI
 
putSettings(String, CrawlerSettings) - Method in class org.archive.crawler.settings.SettingsCache
Add a settings object to the cache.
putString(String, String) - Method in class org.archive.crawler.datamodel.CandidateURI
 

Q

QUERY_SAFE - Static variable in class org.archive.net.LaxURLCodec
 
Queue - Interface in org.archive.queue
An Abstract queue.
queue - Variable in class org.archive.queue.QueueTestBase
the queue object to be tested
queueAssignmentPolicy - Variable in class org.archive.crawler.frontier.AbstractFrontier
Policy for assigning CrawlURIs to named queues
QueueAssignmentPolicy - Class in org.archive.crawler.frontier
Establishes a mapping from CrawlURIs to String keys (queue names).
QueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.QueueAssignmentPolicy
 
QueueCat - Class in org.archive.queue
Command-line tool that displays serialized object streams in a line-oriented format.
QueueCat() - Constructor for class org.archive.queue.QueueCat
 
queuedUriCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
queuedUriCount() - Method in class org.archive.crawler.admin.StatisticsTracker
Number of URIs queued up and waiting for processing.
queuedUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs queued up and waiting for processing.
queuedUriCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
queuedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
(non-Javadoc)
queuedUriCount() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
QueueTestBase - Class in org.archive.queue
JUnit test suite for Queue.
QueueTestBase(String) - Constructor for class org.archive.queue.QueueTestBase
Create a new PaddingStringBufferTest object
quickCache - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
cache of most recently seen FPs
quickContains(long) - Method in class org.archive.util.AbstractLongFPSet
Low-cost, non-definitive (except when true) contains test.
quickContains(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
quickContains(long) - Method in interface org.archive.util.fingerprint.LongFPSet
Do a contains() check that doesn't require laggy activity (eg disk IO).
quickContains(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
quickContainsKey(Object) - Method in class org.archive.util.CachedBdbMap
 
quickContainsValue(Object) - Method in class org.archive.util.CachedBdbMap
 
quickDupAtLast - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
quickDuplicateCount - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
QUOT - Static variable in class org.archive.net.UURIFactory
 
QuotaEnforcer - Class in org.archive.crawler.prefetch
A simple quota enforcer.
QuotaEnforcer(String) - Constructor for class org.archive.crawler.prefetch.QuotaEnforcer
Constructor.

R

raAppend(int, String) - Method in class org.archive.util.PaddingStringBuffer
Append a string, right-aligned to the given columm.
raAppend(int, int) - Method in class org.archive.util.PaddingStringBuffer
Append an int right-aligned to the given column.
raAppend(int, long) - Method in class org.archive.util.PaddingStringBuffer
Append a long, right-aligned to the given column.
raf - Variable in class org.archive.io.RandomAccessOutputStream
 
RandomAccessInputStream - Class in org.archive.io
Wraps a RandomAccessFile with an InputStream interface.
RandomAccessInputStream(RandomAccessFile) - Constructor for class org.archive.io.RandomAccessInputStream
Constructor.
RandomAccessInputStream(File) - Constructor for class org.archive.io.RandomAccessInputStream
Constructor.
RandomAccessInputStream(File, long) - Constructor for class org.archive.io.RandomAccessInputStream
Constructor.
RandomAccessInputStream(RandomAccessFile, boolean, long) - Constructor for class org.archive.io.RandomAccessInputStream
 
RandomAccessOutputStream - Class in org.archive.io
Wraps a RandomAccessFile with OutputStream interface.
RandomAccessOutputStream(RandomAccessFile) - Constructor for class org.archive.io.RandomAccessOutputStream
Wrap the given RandomAccessFile
RANGE - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
RANGE_PREFIX - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
RCURBRACKET - Static variable in class org.archive.net.UURIFactory
 
RCURBRACKET_PATTERN - Static variable in class org.archive.net.UURIFactory
 
read(String) - Method in interface org.archive.crawler.framework.AlertManager
 
read() - Method in class org.archive.io.arc.ARCRecord
 
read(byte[], int, int) - Method in class org.archive.io.arc.ARCRecord
 
read() - Method in class org.archive.io.CompositeFileInputStream
 
read(byte[], int, int) - Method in class org.archive.io.CompositeFileInputStream
 
read(byte[]) - Method in class org.archive.io.CompositeFileInputStream
 
read() - Method in class org.archive.io.RandomAccessInputStream
 
read(byte[], int, int) - Method in class org.archive.io.RandomAccessInputStream
 
read(byte[]) - Method in class org.archive.io.RandomAccessInputStream
 
read() - Method in class org.archive.io.RecordingInputStream
 
read(byte[], int, int) - Method in class org.archive.io.RecordingInputStream
 
read(byte[]) - Method in class org.archive.io.RecordingInputStream
 
read() - Method in class org.archive.io.ReplayInputStream
 
read(byte[], int, int) - Method in class org.archive.io.ReplayInputStream
 
read(byte[]) - Method in class org.archive.io.RepositionableInputStream
 
read(byte[], int, int) - Method in class org.archive.io.RepositionableInputStream
 
read() - Method in class org.archive.io.RepositionableInputStream
 
read(long) - Method in class org.archive.io.SinkHandler
 
read - Variable in class org.archive.io.SinkHandlerLogRecord
 
readAlert(String) - Method in class org.archive.crawler.Heritrix
 
readByte(InputStream) - Method in class org.archive.io.GzipHeader
Read a byte.
readByte(InputStream, CRC32) - Method in class org.archive.io.GzipHeader
Read a byte.
readByte(InputStream, CRC32, byte[], int, int) - Method in class org.archive.io.GzipHeader
Read a byte.
reader - Variable in class org.archive.util.iterator.LineReadingIterator
 
readFileAsString(File) - Static method in class org.archive.util.FileUtils
Utility method to read an entire file as a String.
readFully() - Method in class org.archive.io.RecordingInputStream
 
readFullyAsString(InputStream) - Static method in class org.archive.util.IoUtils
Read the entire stream to EOF, returning what's read as a String.
readFullyOrUntil(long, long, long) - Method in class org.archive.io.RecordingInputStream
Read all of a stream (Or read until we timeout or have read to the max).
readFullyTo(OutputStream) - Method in class org.archive.io.ReplayInputStream
 
readFullyToFile(InputStream, File) - Static method in class org.archive.util.IoUtils
Read the entire stream to EOF into the passed file.
readFullyToFile(InputStream, File, byte[]) - Static method in class org.archive.util.IoUtils
Read the entire stream to EOF into the passed file.
readHeader(InputStream) - Method in class org.archive.io.GzipHeader
Read in gzip header.
readHeader() - Method in class org.archive.io.GzippedInputStream
Read in the gzip header.
readMaxValues(Object) - Method in class org.archive.crawler.filter.TransclusionFilter
 
readObjectFromFile(Class, File) - Static method in class org.archive.crawler.util.CheckpointUtils
 
readObjectFromFile(Class, String, File) - Static method in class org.archive.crawler.util.CheckpointUtils
 
readPrefixes() - Method in class org.archive.crawler.deciderules.OnDomainsDecideRule
Patch the SURT prefix set so that it only includes host-enforcing prefixes
readPrefixes() - Method in class org.archive.crawler.deciderules.OnHostsDecideRule
Patch the SURT prefix set so that it only includes host-enforcing prefixes
readPrefixes(Object) - Method in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
Patch the SURT prefix set so that it only includes the appropriate prefixes.
readPrefixes() - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
readResponseBody(HttpState, HttpConnection) - Method in class org.archive.httpclient.HttpRecorderGetMethod
 
readResponseBody(HttpState, HttpConnection) - Method in class org.archive.httpclient.HttpRecorderPostMethod
 
readSettingsObject(CrawlerSettings) - Method in class org.archive.crawler.settings.SettingsHandler
Read the CrawlerSettings object from persistent storage.
readSettingsObject(CrawlerSettings, File) - Method in class org.archive.crawler.settings.XMLSettingsHandler
Read the CrawlerSettings object from a specific file.
readSettingsObject(CrawlerSettings) - Method in class org.archive.crawler.settings.XMLSettingsHandler
 
readUuri(String) - Method in class org.archive.crawler.datamodel.CandidateURI
Read a UURI from a String, handling a null or URIException
readValid() - Method in class org.archive.crawler.datamodel.Checkpoint
 
readyClassQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
All per-class queues whose first item may be handed out.
readyHosts() - Method in interface org.archive.crawler.framework.FrontierHostStatistics
Total number of hosts that have a URI ready for processing.
receive(CandidateURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter.HasUriReceiver
 
receive(CandidateURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
receive(CandidateURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Accept the given CandidateURI for scheduling, as it has passed the alreadyIncluded filter.
receive(CandidateURI) - Method in class org.archive.crawler.util.BdbUriUniqFilterTest
 
receive(CandidateURI) - Method in class org.archive.crawler.util.BenchmarkUriUniqFilters
 
receive(CandidateURI) - Method in class org.archive.crawler.util.BloomUriUniqFilterTest
 
receive(CandidateURI) - Method in class org.archive.crawler.util.FPUriUniqFilterTest
 
receiver - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
receiver - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
 
RecorderIOException - Exception in org.archive.io
 
RecorderIOException() - Constructor for exception org.archive.io.RecorderIOException
 
RecorderIOException(String) - Constructor for exception org.archive.io.RecorderIOException
 
RecorderLengthExceededException - Exception in org.archive.io
Indicates a length exception thrown by the Recorder.
RecorderLengthExceededException() - Constructor for exception org.archive.io.RecorderLengthExceededException
 
RecorderLengthExceededException(String) - Constructor for exception org.archive.io.RecorderLengthExceededException
 
RecorderTimeoutException - Exception in org.archive.io
Indicates a timeout thrown by the RecordingInputStream.
RecorderTimeoutException() - Constructor for exception org.archive.io.RecorderTimeoutException
 
RecorderTimeoutException(String) - Constructor for exception org.archive.io.RecorderTimeoutException
 
RecordingInputStream - Class in org.archive.io
Stream which records all data read from it, which it acquires from a wrapped input stream.
RecordingInputStream(int, String) - Constructor for class org.archive.io.RecordingInputStream
Create a new RecordingInputStream.
RecordingInputStreamTest - Class in org.archive.io
Test cases for RecordingInputStream.
RecordingInputStreamTest() - Constructor for class org.archive.io.RecordingInputStreamTest
 
RecordingOutputStream - Class in org.archive.io
An output stream that records all writes to wrapped output stream.
RecordingOutputStream(int, String) - Constructor for class org.archive.io.RecordingOutputStream
Create a new RecordingOutputStream.
RecordingOutputStreamTest - Class in org.archive.io
Test casesfor RecordingOutputStream.
RecordingOutputStreamTest() - Constructor for class org.archive.io.RecordingOutputStreamTest
 
recover(CrawlController) - Method in class org.archive.crawler.framework.Checkpointer
Call when recovering from a checkpoint.
RECOVER_LOG - Static variable in class org.archive.crawler.admin.CrawlJobHandler
String to indicate recovery should be based on the recovery log, not based on checkpointing.
RecoveryJournal - Class in org.archive.crawler.frontier
Helper class for managing a simple Frontier change-events journal which is useful for recovering from crawl problems.
RecoveryJournal(String, String) - Constructor for class org.archive.crawler.frontier.RecoveryJournal
Create a new recovery journal at the given location
RecoveryJournalTest - Class in org.archive.crawler.frontier
 
RecoveryJournalTest() - Constructor for class org.archive.crawler.frontier.RecoveryJournalTest
 
RecoveryLogMapper - Class in org.archive.crawler.util
 
RecoveryLogMapper(String) - Constructor for class org.archive.crawler.util.RecoveryLogMapper
Normal constructor - if encounter not-found seeds while loading recoverLogFileName, will throw throw SeedUrlNotFoundException.
RecoveryLogMapper(String, String) - Constructor for class org.archive.crawler.util.RecoveryLogMapper
Constructor to use if you want to allow not-found seeds, logging them to seedNotFoundLogFileName.
recycleMatcher(Matcher) - Static method in class org.archive.util.TextUtils
 
RecyclingFastBufferedOutputStream - Class in org.archive.io
Lightweight, unsynchronised, aligned output stream buffering class.
RecyclingFastBufferedOutputStream(OutputStream, byte[]) - Constructor for class org.archive.io.RecyclingFastBufferedOutputStream
Creates a new fast buffered output stream by wrapping a given output stream, using a given buffer
RecyclingFastBufferedOutputStream(OutputStream, int) - Constructor for class org.archive.io.RecyclingFastBufferedOutputStream
Creates a new fast buffered output stream by wrapping a given output stream with a given buffer size.
RecyclingFastBufferedOutputStream(OutputStream) - Constructor for class org.archive.io.RecyclingFastBufferedOutputStream
Creates a new fast buffered ouptut stream by wrapping a given output stream with a buffer of RecyclingFastBufferedOutputStream.DEFAULT_BUFFER_SIZE bytes.
RecyclingSerialBinding - Class in org.archive.crawler.frontier
A SerialBinding that recycles a single FastOutputStream per thread, avoiding reallocation of the internal buffer for either repeated serializations or because of mid-serialization expansions.
RecyclingSerialBinding(ClassCatalog, Class) - Constructor for class org.archive.crawler.frontier.RecyclingSerialBinding
Constructor.
REFER_HOP - Static variable in class org.archive.crawler.extractor.Link
referral/redirect links, like header 'Location:' on a 301/302 response
referentField - Static variable in class org.archive.util.CachedBdbMap
Reference to the Reference#referent Field.
REFERER - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
RefinedScope - Class in org.archive.crawler.scope
Superclass for Scopes which make use of "additional focus" to add items by pattern, or want to swap in alternative transitive filter.
RefinedScope(String) - Constructor for class org.archive.crawler.scope.RefinedScope
 
Refinement - Class in org.archive.crawler.settings.refinements
This class acts as a mapping between refinement criterias and a settings object.
Refinement(CrawlerSettings, String) - Constructor for class org.archive.crawler.settings.refinements.Refinement
Create a new instance of Refinement
Refinement(CrawlerSettings, String, String) - Constructor for class org.archive.crawler.settings.refinements.Refinement
Create a new instance of Refinement
refinementsIterator() - Method in class org.archive.crawler.settings.CrawlerSettings
Get an ListIterator over the refinements for this settings object.
refQueue - Variable in class org.archive.util.CachedBdbMap
 
refreshHostToSettings() - Method in class org.archive.crawler.settings.SettingsCache
Make sure that no host strings points to wrong settings.
refreshSeeds() - Method in class org.archive.crawler.framework.CrawlScope
Refresh seeds.
refreshSeeds() - Method in class org.archive.crawler.scope.SeedCachingScope
 
refund(int) - Method in class org.archive.crawler.frontier.WorkQueue
A URI should not have been charged against queue (eg it was disregarded); return the amount expended
RegexpCSSLinkExtractor - Class in org.archive.extractor
This extractor is parsing URIs from CSS type files.
RegexpCSSLinkExtractor() - Constructor for class org.archive.extractor.RegexpCSSLinkExtractor
 
RegexpHTMLLinkExtractor - Class in org.archive.extractor
Basic link-extraction, from an HTML content-body, using regular expressions.
RegexpHTMLLinkExtractor() - Constructor for class org.archive.extractor.RegexpHTMLLinkExtractor
 
RegexpJSLinkExtractor - Class in org.archive.extractor
Uses regular expressions to find likely URIs inside Javascript.
RegexpJSLinkExtractor() - Constructor for class org.archive.extractor.RegexpJSLinkExtractor
 
RegexpLineIterator - Class in org.archive.util.iterator
Utility class providing an Iterator interface over line-oriented text input.
RegexpLineIterator(Iterator, String, String, String) - Constructor for class org.archive.util.iterator.RegexpLineIterator
 
RegexRule - Class in org.archive.crawler.url.canonicalize
General conversion rule.
RegexRule(String) - Constructor for class org.archive.crawler.url.canonicalize.RegexRule
 
RegexRule(String, String, String) - Constructor for class org.archive.crawler.url.canonicalize.RegexRule
 
RegexRuleTest - Class in org.archive.crawler.url.canonicalize
Test the regex rule.
RegexRuleTest() - Constructor for class org.archive.crawler.url.canonicalize.RegexRuleTest
 
register(String, Class, String, Configuration) - Method in class org.archive.configuration.registry.JmxRegistry
 
register(String, Class, Configuration) - Method in class org.archive.configuration.registry.JmxRegistry
 
register(String, Class<?>, Configuration) - Method in interface org.archive.configuration.Registry
Register a configuration object.
register(String, Class<?>, String, Configuration) - Method in interface org.archive.configuration.Registry
Register a configuration object.
registeredCrawlURIDispositionListeners - Variable in class org.archive.crawler.framework.CrawlController
 
registerFinishTask(Runnable) - Method in class org.archive.io.ObjectPlusFilesInputStream
Register a task to be done when the ObjectPlusFilesInputStream is closed.
registerHeritrix(Heritrix, String, boolean) - Static method in class org.archive.crawler.Heritrix
Register Heritrix with JNDI, JMX, and with the static hashtable of all Heritrix instances known to this JVM.
registerJndi(ObjectName) - Static method in class org.archive.crawler.Heritrix
 
registerMBean(Object, String, String) - Static method in class org.archive.crawler.Heritrix
 
registerMBean(MBeanServer, Object, String, String) - Static method in class org.archive.crawler.Heritrix
 
registerMBean(MBeanServer, Object, ObjectName) - Static method in class org.archive.crawler.Heritrix
 
registerValueErrorHandler(ValueErrorHandler) - Method in class org.archive.crawler.settings.SettingsHandler
Register an instance of ValueErrorHandler.
Registration - Interface in org.archive.configuration
 
Registry - Interface in org.archive.configuration
Registry of application Configurations.
RegularExpressionConstraint - Class in org.archive.crawler.settings
A constraint that checks that a value matches a regular expression.
RegularExpressionConstraint(String, Level, String) - Constructor for class org.archive.crawler.settings.RegularExpressionConstraint
Constructs a new RegularExpressionConstraint.
RegularExpressionConstraint(String, String) - Constructor for class org.archive.crawler.settings.RegularExpressionConstraint
Constructs a new RegularExpressionConstraint using default severity level (Level.WARNING).
RegularExpressionConstraint(String, Level) - Constructor for class org.archive.crawler.settings.RegularExpressionConstraint
Constructs a new RegularExpressionConstraint using the default error message.
RegularExpressionConstraint(String) - Constructor for class org.archive.crawler.settings.RegularExpressionConstraint
Constructs a new RegularExpressionConstraint.
RegularExpressionCriteria - Class in org.archive.crawler.settings.refinements
A refinement criteria that test if a URI matches a regular expression.
RegularExpressionCriteria() - Constructor for class org.archive.crawler.settings.refinements.RegularExpressionCriteria
Create a new instance of RegularExpressionCriteria.
RegularExpressionCriteria(String) - Constructor for class org.archive.crawler.settings.refinements.RegularExpressionCriteria
Create a new instance of RegularExpressionCriteria initializing it with a regular expression.
REJECT - Static variable in class org.archive.crawler.deciderules.DecideRule
 
RejectDecideRule - Class in org.archive.crawler.deciderules
Rule which answers REJECT to everything evaluated.
RejectDecideRule(String) - Constructor for class org.archive.crawler.deciderules.RejectDecideRule
 
release() - Method in class org.archive.queue.MemQueue
 
release() - Method in interface org.archive.queue.Queue
release any OS/IO resources associated with Queue
release() - Method in interface org.archive.queue.Stack
Release any OS resources, if necessary.
releaseConnection(HttpConnection) - Method in class org.archive.httpclient.SingleHttpConnectionManager
 
releaseConnection(HttpConnection) - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
 
releaseContinuePermission() - Method in class org.archive.crawler.framework.CrawlController
Relinquish continue permission at end of processing (allowing another thread to proceed if in single-thread mode).
RELEVANT_TAG_EXTRACTOR - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
RELEVANT_TAG_EXTRACTOR - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
Compiled relevant tag extractor.
relocate(long, long, long) - Method in class org.archive.util.AbstractLongFPSet
 
relocate(long, long, long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
remaining() - Method in class org.archive.io.ReplayInputStream
 
remove(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
remove(CrawlerSettings, Credential) - Method in class org.archive.crawler.datamodel.CredentialStore
Delete the credential name.
remove(CrawlerSettings, String) - Method in class org.archive.crawler.datamodel.CredentialStore
Delete the credential name.
remove(String) - Method in interface org.archive.crawler.framework.AlertManager
 
remove() - Method in class org.archive.crawler.settings.ComplexType.MBeanAttributeInfoIterator
 
remove(int) - Method in class org.archive.crawler.settings.ListType
 
remove(Object) - Method in class org.archive.crawler.settings.ListType
 
remove() - Method in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
remove(String) - Method in class org.archive.crawler.settings.SoftSettingsHash
Removes the settings object identified by the key from this hash if present.
remove() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
remove() - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
remove() - Method in class org.archive.io.arc.ARCReader.ARCRecordIterator
 
remove(long) - Method in class org.archive.io.SinkHandler
 
remove(long) - Method in class org.archive.util.AbstractLongFPSet
 
remove(Object) - Method in class org.archive.util.CachedBdbMap
 
remove(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
remove(long) - Method in interface org.archive.util.fingerprint.LongFPSet
Remove a fingerprint from the set, if it is there
remove() - Method in class org.archive.util.iterator.CompositeIterator
 
remove() - Method in class org.archive.util.iterator.LookaheadIterator
 
removeAlert(String) - Method in class org.archive.crawler.Heritrix
 
removeAlistPersistentMember(Object) - Static method in class org.archive.crawler.datamodel.CrawlURI
 
removeAll(Collection) - Method in class org.archive.crawler.settings.ListType
 
removeAt(long) - Method in class org.archive.util.AbstractLongFPSet
Remove the value at the given index, relocating its successors as necessary.
removeCredentialAvatar(CredentialAvatar) - Method in class org.archive.crawler.datamodel.CrawlURI
Remove all credential avatars from this crawl uri.
removeCredentialAvatars() - Method in class org.archive.crawler.datamodel.CrawlURI
Remove all credential avatars from this crawl uri.
removeElement(String) - Method in class org.archive.crawler.settings.DataContainer
Remove an attribute from the DataContainer.
removeElement(CrawlerSettings, String) - Method in class org.archive.crawler.settings.MapType
Remove an attribute from the map.
removeRefinement(String) - Method in class org.archive.crawler.settings.CrawlerSettings
Remove a refinement from this settings object.
reopen(Environment) - Method in class org.archive.crawler.util.BdbUriUniqFilter
Call after deserializing an instance of this class.
reorder() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Method is called whenever something has been done that might have changed the value of the 'published' time of next ready.
reorder(AdaptiveRevisitHostQueue) - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
This method reorders the host queues.
replaceAll(String, CharSequence, String) - Static method in class org.archive.util.TextUtils
Utility method using a precompiled pattern instead of using the replaceAll method of the String class.
replaceFirst(String, CharSequence, String) - Static method in class org.archive.util.TextUtils
Utility method using a precompiled pattern instead of using the replaceFirst method of the String class.
replaceOutlinks(Collection) - Method in class org.archive.crawler.datamodel.CrawlURI
Replace current collection of links w/ passed list.
ReplayCharSequence - Interface in org.archive.io
CharSequence interface with addition of a ReplayCharSequence.close() method.
ReplayCharSequenceFactory - Class in org.archive.io
Factory that returns a ReplayCharSequence view on to a recording stream.
ReplayCharSequenceFactoryTest - Class in org.archive.io
Test the ReplayCharSequence factory.
ReplayCharSequenceFactoryTest() - Constructor for class org.archive.io.ReplayCharSequenceFactoryTest
 
ReplayInputStream - Class in org.archive.io
Replays the bytes recorded from a RecordingInputStream or RecordingOutputStream.
ReplayInputStream(byte[], long, long, String) - Constructor for class org.archive.io.ReplayInputStream
Constructor.
ReplayInputStream(byte[], long, String) - Constructor for class org.archive.io.ReplayInputStream
Constructor.
report() - Method in class org.archive.crawler.extractor.AggressiveExtractorHTML
 
report() - Method in class org.archive.crawler.extractor.ExtractorCSS
 
report() - Method in class org.archive.crawler.extractor.ExtractorDOC
 
report() - Method in class org.archive.crawler.extractor.ExtractorHTML
 
report() - Method in class org.archive.crawler.extractor.ExtractorHTTP
 
report() - Method in class org.archive.crawler.extractor.ExtractorJS
 
report() - Method in class org.archive.crawler.extractor.ExtractorPDF
Provide a human-readable textual summary of this Processor's state.
report() - Method in class org.archive.crawler.extractor.ExtractorSWF
 
report() - Method in class org.archive.crawler.extractor.ExtractorUniversal
 
report() - Method in class org.archive.crawler.extractor.ExtractorXML
 
report() - Method in class org.archive.crawler.fetcher.FetchHTTP
 
report() - Method in class org.archive.crawler.framework.Processor
Compiles and returns a report (in human readable form) about the status of the processor.
report() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns a report detailing the status of this HQ.
report() - Method in class org.archive.crawler.processor.Test
 
Reporter - Interface in org.archive.util
 
reportManifestTo(PrintWriter) - Method in class org.archive.crawler.framework.CrawlController
 
reportProcessorsTo(PrintWriter) - Method in class org.archive.crawler.framework.CrawlController
Compiles and returns a human readable report on the active processors.
reports - Variable in class org.archive.crawler.framework.CrawlController
Logger to hold job summary report.
REPORTS - Static variable in class org.archive.crawler.framework.CrawlController
 
REPORTS - Static variable in class org.archive.crawler.framework.ToePool
 
REPORTS - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.datamodel.CandidateURI
 
reportTo(PrintWriter) - Method in class org.archive.crawler.datamodel.CandidateURI
 
reportTo(PrintWriter) - Method in class org.archive.crawler.framework.CrawlController
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.framework.CrawlController
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.framework.ToePool
 
reportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
Compiles and returns a report on its status.
reportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
 
reportTo(PrintWriter) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
reportTo(PrintWriter) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
reportTo(PrintWriter) - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
reportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueue
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueue
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
This method compiles a human readable report on the status of the frontier at the time of the call.
reportTo(String, PrintWriter) - Method in interface org.archive.util.Reporter
Make a report of the given name to the passed-in Writer, If null, give the default report.
reportTo(PrintWriter) - Method in interface org.archive.util.Reporter
Make a default report to the passed-in Writer.
RepositionableInputStream - Class in org.archive.io
Wrapper around an InputStream to make a primitive Repositionable stream.
RepositionableInputStream(InputStream) - Constructor for class org.archive.io.RepositionableInputStream
 
RepositionableInputStream(InputStream, int) - Constructor for class org.archive.io.RepositionableInputStream
 
RepositionableInputStreamTest - Class in org.archive.io
 
RepositionableInputStreamTest() - Constructor for class org.archive.io.RepositionableInputStreamTest
 
requestCrawlCheckpoint() - Method in class org.archive.crawler.framework.CrawlController
Request a checkpoint.
requestCrawlPause() - Method in class org.archive.crawler.framework.CrawlController
Stop the crawl temporarly.
requestCrawlResume() - Method in class org.archive.crawler.framework.CrawlController
Resume crawl from paused state
requestCrawlStart() - Method in class org.archive.crawler.framework.CrawlController
Operator requested crawl begin
requestCrawlStop() - Method in class org.archive.crawler.admin.CrawlJobHandler
 
requestCrawlStop() - Method in class org.archive.crawler.framework.CrawlController
Operator requested for crawl to stop.
requestCrawlStop(String) - Method in class org.archive.crawler.framework.CrawlController
Operator requested for crawl to stop.
requestFlush() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Request that any pending items be added/dropped.
requestFlush() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
requestFlush() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
REQUIRED_VERSION_1_HEADER_FIELDS - Static variable in interface org.archive.io.arc.ARCConstants
Version 1 required metadata fields.
reschedule(CrawlURI, boolean) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Put near top of relevant hostQueue (but behind anything recently scheduled 'high')..
rescheduled(CrawlURI) - Method in interface org.archive.crawler.frontier.FrontierJournal
 
rescheduled(CrawlURI) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
reset() - Method in class org.archive.extractor.CharSequenceLinkExtractor
Discard all state.
reset() - Method in interface org.archive.extractor.LinkExtractor
Discard all state and release any used resources.
reset() - Method in class org.archive.extractor.RegexpCSSLinkExtractor
 
reset() - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
Discard all state.
reset() - Method in class org.archive.extractor.RegexpJSLinkExtractor
 
reset() - Method in class org.archive.io.RandomAccessInputStream
 
reset() - Method in class org.archive.io.ReplayInputStream
 
reset() - Method in class org.archive.io.RepositionableInputStream
 
reset() - Method in class org.archive.util.PaddingStringBuffer
reset the buffer back to empty
resetConsecutiveConnectionErrors() - Method in class org.archive.crawler.datamodel.CrawlServer
 
resetDeferrals() - Method in class org.archive.crawler.datamodel.CrawlURI
Reset deferrals counter.
resetFetchAttempts() - Method in class org.archive.crawler.datamodel.CrawlURI
Reset fetchAttempts counter.
resetInflater() - Method in class org.archive.io.GzippedInputStream
Move to next gzip member in the file.
resetSerialNo() - Static method in class org.archive.io.arc.ARCWriter
Reset the serial number.
resetState() - Method in class org.archive.crawler.extractor.PDFParser
Reinitialize the object as though a new one were created.
resetState(byte[]) - Method in class org.archive.crawler.extractor.PDFParser
Reset the object and initialize it with a new byte array (the document).
resetState(String) - Method in class org.archive.crawler.extractor.PDFParser
Reinitialize the object as though a new one were created, complete with a valid pointer to a document that can be read
resize(int) - Method in class org.archive.crawler.settings.SoftSettingsHash
Rehashes the contents of this hash into a new HashMap instance with a larger capacity.
resolve(String) - Method in class org.archive.net.UURI
 
resolve(String, boolean) - Method in class org.archive.net.UURI
 
resolve(String, boolean, String) - Method in class org.archive.net.UURI
 
responseBodyStart - Variable in class org.archive.io.ReplayInputStream
Where the response body starts, if marked
restoreFile(File) - Method in class org.archive.io.ObjectPlusFilesInputStream
Restore a file from storage, using the name and length info on the serialization stream and the file from the current auxiliary directory, to the given File.
restoreFileTo(File) - Method in class org.archive.io.ObjectPlusFilesInputStream
Restore a file from storage, using the name and length info on the serialization stream and the file from the current auxiliary directory, to the given File.
restoreStatisticsTracker(MapType, String) - Method in class org.archive.crawler.framework.CrawlController
 
resume() - Method in class org.archive.crawler.admin.CrawlJob
 
resume(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
Resumes this WorkQueue.
resumeJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
Cause the current job to resume crawling if it was paused.
retainAll(Collection) - Method in class org.archive.crawler.settings.ListType
 
retire() - Method in class org.archive.crawler.framework.ToeThread
Request that this thread retire (exit cleanly) at the earliest opportunity.
retiredQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
'retired' queues, no longer considered for activation.
retryDelayFor(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Return a suitable value to wait before retrying the given URI.
retryMethod(HttpMethod, IOException, int) - Method in class org.archive.crawler.fetcher.HeritrixHttpMethodRetryHandler
 
returnARCWriter(ARCWriter) - Method in class org.archive.io.arc.ARCWriterPool
 
returnTrueIfMatches(CrawlURI) - Method in class org.archive.crawler.filter.OrFilter
 
returnTrueIfMatches(CrawlURI) - Method in class org.archive.crawler.filter.PathDepthFilter
 
returnTrueIfMatches(CrawlURI) - Method in class org.archive.crawler.filter.URIRegExpFilter
 
returnTrueIfMatches(CrawlURI) - Method in class org.archive.crawler.framework.Filter
Checks to see if filter functionality should be inverted for this curi.
rewind() - Method in class org.archive.io.arc.ARCReader
Rewinds stream to start of the arc file.
RFC2396REGEX - Static variable in class org.archive.net.UURIFactory
RFC 2396-inspired regex.
Rfc2617Credential - Class in org.archive.crawler.datamodel.credential
A Basic/Digest auth RFC2617 credential.
Rfc2617Credential(String) - Constructor for class org.archive.crawler.datamodel.credential.Rfc2617Credential
Constructor.
ROBOTS_NOT_FETCHED - Static variable in class org.archive.crawler.datamodel.CrawlServer
 
RobotsExclusionPolicy - Class in org.archive.crawler.datamodel
expiry handled outside, in CrawlServer
RobotsExclusionPolicy(CrawlerSettings, LinkedList, HashMap, RobotsHonoringPolicy) - Constructor for class org.archive.crawler.datamodel.RobotsExclusionPolicy
 
RobotsExclusionPolicy(int) - Constructor for class org.archive.crawler.datamodel.RobotsExclusionPolicy
 
robotsFetched - Variable in class org.archive.crawler.datamodel.CrawlServer
 
RobotsHonoringPolicy - Class in org.archive.crawler.datamodel
This class represent the policy to which Robots.txt files is to honored.
RobotsHonoringPolicy(String) - Constructor for class org.archive.crawler.datamodel.RobotsHonoringPolicy
Creates a new instance of RobotsHonoringPolicy.
RobotsHonoringPolicy() - Constructor for class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
Robotstxt - Class in org.archive.crawler.datamodel
 
Robotstxt() - Constructor for class org.archive.crawler.datamodel.Robotstxt
 
robotstxtChecksum - Variable in class org.archive.crawler.datamodel.CrawlServer
 
RobotstxtTest - Class in org.archive.crawler.datamodel
 
RobotstxtTest() - Constructor for class org.archive.crawler.datamodel.RobotstxtTest
 
RootFilter - Class in org.archive.crawler.admin.ui
Filter that redirects accesses to 'index.jsp'.
RootFilter() - Constructor for class org.archive.crawler.admin.ui.RootFilter
 
rootUriMatch(CrawlController, CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
Test passed curi matches this credentials rootUri.
rotate(String, String) - Method in class org.archive.io.GenerationFileHandler
Move the current file to a new filename with the storeSuffix in place of the activeSuffix; continuing logging to a new file under the original filename.
rotateLogFiles(String) - Method in class org.archive.crawler.framework.CrawlController
 
RSQRBRACKET - Static variable in class org.archive.net.UURIFactory
 
RSQRBRACKET_PATTERN - Static variable in class org.archive.net.UURIFactory
 
RsyncURLConnection - Class in org.archive.net.rsync
Rsync URL connection.
RsyncURLConnection(URL) - Constructor for class org.archive.net.rsync.RsyncURLConnection
 
run() - Method in class org.archive.crawler.fetcher.FetchHTTP.PostRestore
 
run() - Method in class org.archive.crawler.framework.AbstractTracker
Start thread.
run() - Method in class org.archive.crawler.framework.ToeThread
(non-Javadoc)
run() - Method in class org.archive.util.ProcessUtils.StreamGobbler
 
runExtractor(UURI) - Method in class org.archive.crawler.extractor.ExtractorHTMLTest
 
runExtractor(UURI, String) - Method in class org.archive.crawler.extractor.ExtractorHTMLTest
 
runFrontierRecover(String) - Method in class org.archive.crawler.framework.CrawlController
 
runTest(String) - Method in class org.archive.crawler.deciderules.ConfiguredDecideRuleTest
 
runTestWriting(long) - Method in class org.archive.crawler.util.BdbUriUniqFilterTest
 
RuntimeErrorFormatter - Class in org.archive.crawler.io
Runtime exception log formatter.
RuntimeErrorFormatter() - Constructor for class org.archive.crawler.io.RuntimeErrorFormatter
 
runtimeErrors - Variable in class org.archive.crawler.framework.CrawlController
This logger contains unexpected runtime errors.

S

S_BLOCKED_BY_CUSTOM_PROCESSOR - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
Blocked by custom prefetcher processor.
S_BLOCKED_BY_QUOTA - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
Blocked due to exceeding an established quota.
S_BLOCKED_BY_USER - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
blocked from fetch by user setting.
S_CONNECT_FAILED - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
HTTP connect failed
S_CONNECT_LOST - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
HTTP connect broken
S_DEEMED_CHAFF - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
'chaff' detection of traps/content of negligible value applied
S_DEFERRED - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
temporary status assigned URIs awaiting preconditions; appearance in logs is a bug
S_DELETED_BY_USER - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
deleted from frontier by user
S_DNS_SUCCESS - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
DNS success
S_DOMAIN_PREREQUISITE_FAILURE - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
DNS prerequisite failed, precluding attempt
S_DOMAIN_UNRESOLVABLE - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
DNS lookup failed
S_GETBYNAME_SUCCESS - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
InetAddress.getByName success
S_OTHER_PREREQUISITE_FAILURE - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
DNS prerequisite failed, precluding attempt
S_OUT_OF_SCOPE - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
out-of-scope upoin reexamination (only when scope changes during crawl)
S_PREREQUISITE_UNSCHEDULABLE_FAILURE - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
DNS prerequisite failed, precluding attempt
S_PROCESSING_THREAD_KILLED - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
Processing thread was killed
S_ROBOTS_PRECLUDED - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
robots rules precluded fetch
S_ROBOTS_PREREQUISITE_FAILURE - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
Robots prerequisite failed, precluding attempt
S_RUNTIME_EXCEPTION - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
Unexpected runtime exception; see runtime-errors.log
S_SERIOUS_ERROR - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
severe java 'Error' conditions (OutOfMemoryError, StackOverflowError, etc.) during URI processing
S_TIMEOUT - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
HTTP timeout (before any meaningful response received)
S_TOO_MANY_EMBED_HOPS - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
overstepped embed/trans hops
S_TOO_MANY_LINK_HOPS - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
overstepped link hops
S_TOO_MANY_RETRIES - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
multiple retries all failed
S_UNATTEMPTED - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
fetch never tried (perhaps protocol unsupported or illegal URI)
S_UNFETCHABLE_URI - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
URI recognized as unsupported or illegal)
S_UNQUEUEABLE - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
URI could not be queued in Frontier; when URIs are properly filtered for format, should never occur
sameDomainAs(CandidateURI) - Method in class org.archive.crawler.datamodel.CandidateURI
Compares the domain of this CandidateURI with that of another CandidateURI
save(Store, String) - Method in class org.archive.configuration.registry.JmxRegistry
 
save(Store, String) - Method in interface org.archive.configuration.Registry
Save current state of settings.
save(Iterator<StoreElement>) - Method in interface org.archive.configuration.Store
 
save(String, Iterator<StoreElement>) - Method in interface org.archive.configuration.Store
 
save(Iterator<StoreElement>) - Method in class org.archive.configuration.store.SerializeStore
 
save(String, Iterator<StoreElement>) - Method in class org.archive.configuration.store.SerializeStore
 
saveCookies() - Method in class org.archive.crawler.fetcher.FetchHTTP
Saves cookies to the file specified in the order file.
saveCookies(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
Saves cookies to a file.
saveHostStats(String, long) - Method in class org.archive.crawler.admin.StatisticsTracker
 
saveIgnoredItems(String, File) - Static method in class org.archive.crawler.frontier.AbstractFrontier
Dump ignored seed items (if any) to disk; delete file otherwise.
scanCheckpoints() - Method in class org.archive.crawler.admin.CrawlJob
Read all the checkpoints found in the job's checkpoints directory into Checkpoint instances
schedule(CandidateURI) - Method in interface org.archive.crawler.framework.Frontier
Schedules a CandidateURI.
schedule(CandidateURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
schedule(CandidateURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Arrange for the given CandidateURI to be visited, if it is not already scheduled/completed.
schedule(CandidateURI) - Method in class org.archive.crawler.postprocessor.FrontierScheduler
Schedule the given CandidateURI with the Frontier.
ScopePlusOneDecideRule - Class in org.archive.crawler.deciderules
Rule allows one level of discovery beyond configured scope (e.g.
ScopePlusOneDecideRule(String) - Constructor for class org.archive.crawler.deciderules.ScopePlusOneDecideRule
Constructor.
Scoper - Class in org.archive.crawler.framework
Base class for Scopers.
Scoper(String, String) - Constructor for class org.archive.crawler.framework.Scoper
Constructor.
scratchDir - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
scratchDirFor(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
Utility method to return a scratch dir for the given key's temp files.
secondaryUriDB - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Secondary index into the primary DB, URIs indexed by the time when they can next be processed again.
secondsSinceEpoch(String) - Static method in class org.archive.util.ArchiveUtils
 
seed - Variable in class org.archive.crawler.datamodel.CrawlURITest
 
SEED_DISPOSITION_DISREGARD - Static variable in interface org.archive.crawler.framework.StatisticsTracking
Seed was disregarded
SEED_DISPOSITION_FAILURE - Static variable in interface org.archive.crawler.framework.StatisticsTracking
Failed to crawl seed
SEED_DISPOSITION_NOT_PROCESSED - Static variable in interface org.archive.crawler.framework.StatisticsTracking
Seed has not been processed
SEED_DISPOSITION_RETRY - Static variable in interface org.archive.crawler.framework.StatisticsTracking
Failed to crawl seed, will retry
SEED_DISPOSITION_SUCCESS - Static variable in interface org.archive.crawler.framework.StatisticsTracking
Seed successfully crawled
SeedAcceptDecideRule - Class in org.archive.crawler.deciderules
Rule which ACCEPTs all 'seed' URIs (those for which isSeed is true).
SeedAcceptDecideRule(String) - Constructor for class org.archive.crawler.deciderules.SeedAcceptDecideRule
 
SeedCachingScope - Class in org.archive.crawler.scope
A CrawlScope that caches its seed list for the convenience of scope-tests that are based on the seeds.
SeedCachingScope(String) - Constructor for class org.archive.crawler.scope.SeedCachingScope
 
SeedCachingScopeTest - Class in org.archive.crawler.scope
Test SeedCachingScope.
SeedCachingScopeTest() - Constructor for class org.archive.crawler.scope.SeedCachingScopeTest
 
SeedFileIterator - Class in org.archive.crawler.scope
Iterator wrapper for seeds file on disk.
SeedFileIterator(BufferedReader) - Constructor for class org.archive.crawler.scope.SeedFileIterator
Construct a SeedFileIterator over the input available from the supplied BufferedReader.
SeedFileIterator(BufferedReader, Writer) - Constructor for class org.archive.crawler.scope.SeedFileIterator
Construct a SeedFileIterator over the input available from the supplied BufferedReader, reporting any nonblank noncomment entries which don't generate a valid seed to the supplied BufferedWriter.
SeedFileIteratorTest - Class in org.archive.crawler.scope
Test SeedFileIterator.
SeedFileIteratorTest() - Constructor for class org.archive.crawler.scope.SeedFileIteratorTest
 
SeedListener - Interface in org.archive.crawler.scope
Implemented by components which want notifications of seed list changes from a Scope.
seedListeners - Variable in class org.archive.crawler.framework.CrawlScope
 
SeedRecord - Class in org.archive.crawler.admin
Record of all interesting info about the most-recent processing of a specific seed.
SeedRecord(CrawlURI, String) - Constructor for class org.archive.crawler.admin.SeedRecord
Create a record from the given CrawlURI and disposition string
SeedRecord(String, String) - Constructor for class org.archive.crawler.admin.SeedRecord
Constructor for when a CrawlURI is unavailable; such as wehn considering seeds not yet passed through as CrawlURIs.
seeds - Variable in class org.archive.crawler.scope.SeedCachingScope
 
seedsEdittableSize(SettingsHandler) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Test whether seeds file is of a size that's reasonable to edit in an HTML textarea.
seedsIterator() - Method in class org.archive.crawler.framework.CrawlScope
Gets an iterator over all configured seeds.
seedsIterator(Writer) - Method in class org.archive.crawler.framework.CrawlScope
Gets an iterator over all configured seeds.
seedsIterator() - Method in class org.archive.crawler.scope.SeedCachingScope
 
SeedUrlNotFoundException - Exception in org.archive.crawler.util
 
SeedUrlNotFoundException(String) - Constructor for exception org.archive.crawler.util.SeedUrlNotFoundException
 
selftest(String, int) - Static method in class org.archive.crawler.Heritrix
Run the selftest
SELFTEST - Static variable in class org.archive.crawler.selftest.SelfTestCase
Suffix for selftest classes.
SelfTestCase - Class in org.archive.crawler.selftest
Base class for integrated selftest unit tests.
SelfTestCase() - Constructor for class org.archive.crawler.selftest.SelfTestCase
 
SelfTestCase(String) - Constructor for class org.archive.crawler.selftest.SelfTestCase
 
SelfTestCrawlJobHandler - Class in org.archive.crawler.selftest
An override to gain access to end-of-crawljob message.
SelfTestCrawlJobHandler(File, String, String) - Constructor for class org.archive.crawler.selftest.SelfTestCrawlJobHandler
 
sendCheckpointEvent(File) - Method in class org.archive.crawler.framework.CrawlController
Send the checkpoint event.
sendCrawlStateChangeEvent(Object, String) - Method in class org.archive.crawler.framework.CrawlController
Send crawl change event to all listeners.
sendToQueue(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Send a CrawlURI to the appropriate subqueue.
SERIALIZED_CLASS_SUFFIX - Static variable in class org.archive.crawler.util.CheckpointUtils
 
serializeDeserialize(Object) - Method in class org.archive.crawler.settings.CrawlerSettingsTest
 
SerializeStore - Class in org.archive.configuration.store
Simple Store that Serializes passed StoreElements to disk.
SerializeStore(File) - Constructor for class org.archive.configuration.store.SerializeStore
 
SerializeStoreTest - Class in org.archive.configuration.store
 
SerializeStoreTest() - Constructor for class org.archive.configuration.store.SerializeStoreTest
 
seriousError(String) - Method in interface org.archive.crawler.frontier.FrontierJournal
Add a line noting a serious crawl error.
seriousError(String) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
SERVER_DELEGATE_STR - Static variable in class org.archive.util.JmxUtils
 
ServerCache - Class in org.archive.crawler.datamodel
Server and Host cache.
ServerCache() - Constructor for class org.archive.crawler.datamodel.ServerCache
Constructor.
ServerCache(SettingsHandler) - Constructor for class org.archive.crawler.datamodel.ServerCache
This constructor creates a ServerCache that is all memory-based using Hashtables.
ServerCache(CrawlController) - Constructor for class org.archive.crawler.datamodel.ServerCache
 
ServerCacheTest - Class in org.archive.crawler.datamodel
Test the BigMapServerCache
ServerCacheTest() - Constructor for class org.archive.crawler.datamodel.ServerCacheTest
 
serverInetAddr - Variable in class org.archive.crawler.fetcher.FetchDNS
 
servers - Variable in class org.archive.crawler.datamodel.ServerCache
hostname[:port] -> CrawlServer.
SERVICE - Static variable in class org.archive.util.JmxUtils
 
set(int, Double) - Method in class org.archive.crawler.settings.DoubleList
Replaces the element at the specified position in this list with the specified element.
set(int, Float) - Method in class org.archive.crawler.settings.FloatList
Replaces the element at the specified position in this list with the specified element.
set(int, Integer) - Method in class org.archive.crawler.settings.IntegerList
Replaces the element at the specified position in this list with the specified element.
set(int, Object) - Method in class org.archive.crawler.settings.ListType
Replaces the element at the specified position in this list with the specified element.
set(int, Long) - Method in class org.archive.crawler.settings.LongList
Replaces the element at the specified position in this list with the specified element.
set(int, String) - Method in class org.archive.crawler.settings.StringList
Replaces the element at the specified position in this list with the specified element.
setActive(WorkQueueFrontier, boolean) - Method in class org.archive.crawler.frontier.WorkQueue
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.FPUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.MemUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.NoopUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setAList(AList) - Method in class org.archive.crawler.datamodel.CandidateURI
Called when making a copy of another CandidateURI.
setAsOrder(SettingsHandler) - Method in class org.archive.crawler.settings.ComplexType
 
setAt(long, long) - Method in class org.archive.util.AbstractLongFPSet
Set the stored value at the given slot.
setAt(long, long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
setAttribute(Attribute) - Method in class org.archive.configuration.Configuration
 
setAttribute(Attribute) - Method in class org.archive.crawler.admin.CrawlJob
 
setAttribute(Attribute) - Method in class org.archive.crawler.Heritrix
 
setAttribute(Attribute) - Method in class org.archive.crawler.settings.ComplexType
Set the value of a specific attribute of the ComplexType.
setAttribute(CrawlerSettings, Attribute) - Method in class org.archive.crawler.settings.ComplexType
Set the value of a specific attribute of the ComplexType.
setAttribute(Attribute) - Method in class org.archive.util.JEApplicationMBean
 
setAttribute(Environment, Attribute) - Method in class org.archive.util.JEMBeanHelper
Set an attribute value for the given environment.
setAttributes(AttributeList) - Method in class org.archive.configuration.Configuration
 
setAttributes(AttributeList) - Method in class org.archive.crawler.admin.CrawlJob
 
setAttributes(AttributeList) - Method in class org.archive.crawler.Heritrix
 
setAttributes(AttributeList) - Method in class org.archive.crawler.settings.ComplexType
 
setAttributes(AttributeList) - Method in class org.archive.util.JEApplicationMBean
 
setAudience(String) - Method in class org.archive.crawler.settings.CrawlerSettings
Set the recipient/customer for the crawl job product.
setAudience(String) - Method in class org.archive.crawler.settings.refinements.Refinement
 
setAuthentication(String, String, String) - Method in class org.archive.crawler.SimpleHttpServer
Setup a realm on the server named for the webapp and add to the passed webapp's context.
setAuthentication(String, String, String, String, String) - Method in class org.archive.crawler.SimpleHttpServer
 
SetBasedUriUniqFilter - Class in org.archive.crawler.util
UriUniqFilter based on an underlying UriSet (essentially a Set).
SetBasedUriUniqFilter() - Constructor for class org.archive.crawler.util.SetBasedUriUniqFilter
 
setBaseURI(String) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the (HTML) Base URI used for derelativizing internal URIs.
setBdbjeBkgrdThreads(EnvironmentConfig, List, String) - Method in class org.archive.crawler.framework.CrawlController
 
setBit(long) - Method in class org.archive.util.BloomFilter32bit
Changes the bit with index bitIndex in local bitvector.
setBit(long) - Method in class org.archive.util.BloomFilter32bitSplit
Changes the bit with index bitIndex in local bitvector.
setBit(int) - Method in class org.archive.util.BloomFilter32bp2
Changes the bit with index bitIndex in local bitvector.
setBit(int) - Method in class org.archive.util.BloomFilter32bp2Split
Changes the bit with index bitIndex in local bitvector.
setBit(long) - Method in class org.archive.util.BloomFilter64bit
Changes the bit with index bitIndex in local bitvector.
setCapacity(int) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
setCharacterEncoding(String) - Method in class org.archive.util.HttpRecorder
 
setCheckpointErrors(boolean) - Method in class org.archive.crawler.framework.Checkpointer
 
setClassKey(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
setConnection(HttpConnection) - Method in class org.archive.httpclient.HttpRecorderMethod
 
setConnectionStaleCheckingEnabled(boolean) - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
Deprecated. Use HttpConnectionParams.setStaleCheckingEnabled(boolean), HttpConnectionManager.getParams().
setConsoleHandler() - Static method in class org.archive.util.OneLineSimpleLogger
 
setContentDigest(byte[]) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the retained content-digest value (usu.
setContentHandler(ContentHandler) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
setContentSize(long) - Method in class org.archive.crawler.datamodel.CrawlURI
 
setContentType(String) - Method in class org.archive.crawler.datamodel.CrawlURI
Set a fetched uri's content type.
setController(CrawlController) - Method in class org.archive.crawler.datamodel.CrawlOrder
 
setCount() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.FPUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.MemUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.NoopUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setCountryCode(String) - Method in class org.archive.crawler.datamodel.CrawlHost
Set country code for this hos
setCrawlJob(CrawlJob) - Method in class org.archive.crawler.admin.CrawlJob.MBeanCrawlController
 
setCrawlOrderAttribute(String, ComplexType, Attribute) - Method in class org.archive.crawler.admin.CrawlJob
 
setCredentialDomain(CrawlerSettings, String) - Method in class org.archive.crawler.datamodel.credential.Credential
 
setDefaultNextProcessor(Processor) - Method in class org.archive.crawler.framework.Processor
Set the default next processor in the chain.
setDefaultProfile(CrawlJob) - Method in class org.archive.crawler.admin.CrawlJobHandler
Set the default profile.
setDescription(String) - Method in class org.archive.crawler.settings.ComplexType
Set the description of this ComplexType The description should be suitable for showing in a user interface.
setDescription(String) - Method in class org.archive.crawler.settings.CrawlerSettings
Set the description of this CrawlerSettings object.
setDescription(String) - Method in class org.archive.crawler.settings.refinements.Refinement
Set the description for this refinement.
setDestination(UriUniqFilter.HasUriReceiver) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Receiver of uniq URIs.
setDestination(UriUniqFilter.HasUriReceiver) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
setDestination(UriUniqFilter.HasUriReceiver) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setDigest(boolean) - Method in class org.archive.io.arc.ARCReader
 
setDigest(String) - Method in class org.archive.io.arc.ARCRecordMetaData
 
setDigest(MessageDigest) - Method in class org.archive.io.RecordingInputStream
Sets a digest function which may be applied to recorded data.
setDigest(String) - Method in class org.archive.io.RecordingOutputStream
Sets a digest function which may be applied to recorded data.
setDigest(MessageDigest) - Method in class org.archive.io.RecordingOutputStream
Sets a digest function which may be applied to recorded data.
setDocumentLocator(Locator) - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
 
setDTDHandler(DTDHandler) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
setEarliestNextURIEmitTime(long) - Method in class org.archive.crawler.datamodel.CrawlHost
Set the earliest time a URI for this host could be emitted.
setElement(String) - Method in exception org.archive.crawler.framework.exceptions.ConfigurationException
Set the name of the element that was being parsed when this exception occured.
setEntityResolver(EntityResolver) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
setErrorHandler(ErrorHandler) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
setErrorMessage(String) - Method in class org.archive.crawler.admin.CrawlJob
Set an error message for this job.
setErrorReportingLevel(Level) - Method in class org.archive.crawler.settings.SettingsHandler
Set the level for which notification of failed constraints will be fired.
setExpertSetting(boolean) - Method in class org.archive.crawler.settings.Type
Set if this Type should only show up in expert mode in UI.
setFeature(String, boolean) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
setFetchStatus(int) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the overall/fetch status of this CrawlURI for its current trip through the processing loop.
setFile(String) - Method in exception org.archive.crawler.framework.exceptions.ConfigurationException
Store the name of the configuration file that was being parsed when this exception occured.
setFile(File) - Method in class org.archive.net.rsync.RsyncURLConnection
 
setForceFetch(boolean) - Method in class org.archive.crawler.datamodel.CandidateURI
Method to signal that this URI should be fetched even though it already has been crawled.
setFrom(String) - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
Set the beginning of the time frame to check against.
setGetBit(long) - Method in class org.archive.util.BloomFilter32bitSplit
Sets the bit with index bitIndex in local bitvector -- returning the old value.
setGetBit(int) - Method in class org.archive.util.BloomFilter32bp2Split
Sets the bit with index bitIndex in local bitvector -- returning the old value.
setHeld() - Method in class org.archive.crawler.frontier.WorkQueue
Set isHeld to true
setHolder(Object) - Method in class org.archive.crawler.datamodel.CrawlURI
Remember a 'holder' to which some enclosing/queueing facility has assigned this CrawlURI .
setHolderCost(int) - Method in class org.archive.crawler.datamodel.CrawlURI
Remember a 'holderCost' which some enclosing/queueing facility has assigned this CrawlURI
setHolderKey(Object) - Method in class org.archive.crawler.datamodel.CrawlURI
Remember a 'holderKey' which some enclosing/queueing facility has assigned this CrawlURI .
setHttpRecorder(HttpRecorder) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the http recorder to be associated with this uri.
setIntValue(int) - Method in class org.archive.crawler.util.StringIntPair
 
setIP(InetAddress, long) - Method in class org.archive.crawler.datamodel.CrawlHost
Set the IP address for this host.
setIsSeed(boolean) - Method in class org.archive.crawler.datamodel.CandidateURI
Set the isSeed attribute of this URI.
setJobPriority(int) - Method in class org.archive.crawler.admin.CrawlJob
Set this job's level of priority.
setLastSavedTime(Date) - Method in class org.archive.crawler.settings.CrawlerSettings
Set the time when this CrawlerSettings was last saved to persistent storage.
setLegalValues(Object[]) - Method in class org.archive.crawler.settings.SimpleType
Set the array of legal values for this type.
setLegalValueType(Class) - Method in class org.archive.crawler.settings.Type
Set the class values of this Type must be an instance of.
setLevel(Level) - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
 
setMaxPending(int) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
setName(String) - Method in class org.archive.crawler.settings.CrawlerSettings
Set the name of this CrawlerSettings object.
setNew(boolean) - Method in class org.archive.crawler.admin.CrawlJob
Set if the job is considered a new job or not.
setNextChain(ProcessorChain) - Method in class org.archive.crawler.framework.ProcessorChain
Set the processor chain that the URI should be working through after finishing this one.
setNextProcessor(Processor) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the next processor to process this URI.
setNextProcessorChain(ProcessorChain) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the next processor chain to process this URI.
setNextReadyTime(long) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Updates nextReadyTime (if smaller) with the supplied value
setNumberOfJournalEntries(int) - Method in class org.archive.crawler.admin.CrawlJob
 
setOperator(String) - Method in class org.archive.crawler.settings.CrawlerSettings
Set the operator of this crawl job.
setOperator(String) - Method in class org.archive.crawler.settings.refinements.Refinement
 
setOrder(CrawlOrder) - Method in class org.archive.crawler.framework.CrawlController
 
setOrganization(String) - Method in class org.archive.crawler.settings.CrawlerSettings
Set the name of the organization who is running this crawl.
setOrganization(String) - Method in class org.archive.crawler.settings.refinements.Refinement
 
setOverrideable(boolean) - Method in class org.archive.crawler.settings.Type
Set if this Type should be overideable.
setOwner(AdaptiveRevisitQueueList) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Set the AdaptiveRevisitQueueList object that contains this HQ.
setParams(HttpConnectionManagerParams) - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
Assigns parameters for this connection manager.
setParseHttpHeaders(boolean) - Method in class org.archive.io.arc.ARCReader
 
setPathFromSeed(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
setPortNumber(String) - Method in class org.archive.crawler.settings.refinements.PortnumberCriteria
Set the port number that is to be checked against a URI.
setPost(boolean) - Method in class org.archive.crawler.datamodel.CrawlURI
Set whether this URI should be fetched by sending a HTTP POST request.
setPrerequisite(boolean) - Method in class org.archive.crawler.datamodel.CrawlURI
Set if this CrawlURI is itself a prerequisite URI.
setPrerequisiteUri(Object) - Method in class org.archive.crawler.datamodel.CrawlURI
Set a prerequisite for this URI.
setPreservedFields(String[]) - Method in class org.archive.crawler.settings.ComplexType
Set a list of attribute names that the complex type should attempt to preserve if the module is exchanged with an other one.
setProfileLog(File) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Set a File to receive a log for replay profiling.
setProfileLog(File) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
setProfileLog(File) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setProperty(String, Object) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
setRead() - Method in class org.archive.io.SinkHandlerLogRecord
Mark alert as seen (That is, isNew() no longer returns true).
setReadOnly() - Method in class org.archive.crawler.admin.CrawlJob
Once called no changes can be made to the settings for this job.
setReference(String) - Method in class org.archive.crawler.settings.refinements.Refinement
Set the reference to this refinement's settings object.
setRefinement(boolean) - Method in class org.archive.crawler.settings.CrawlerSettings
Mark this settings object as an refinement.
setRegexp(String) - Method in class org.archive.crawler.settings.refinements.RegularExpressionCriteria
Set the regular expression to be matched against a URI.
setRemove(CharSequence) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.FPUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.MemUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.NoopUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setRetired(boolean) - Method in class org.archive.crawler.frontier.WorkQueue
Set the retired status of this queue.
setRobots(RobotsExclusionPolicy) - Method in class org.archive.crawler.datamodel.CrawlServer
Set the robots exclusion policy for this server.
setRunning(boolean) - Method in class org.archive.crawler.admin.CrawlJob
Set if job is being crawled.
setSchedulingDirective(int) - Method in class org.archive.crawler.datamodel.CandidateURI
 
setSerialNo(int) - Static method in class org.archive.io.arc.ARCWriter
Call when recovering from checkpointing.
setSessionBalance(int) - Method in class org.archive.crawler.frontier.WorkQueue
Set the session 'activity budget balance' to the given value
setSettingsHandler(SettingsHandler) - Method in class org.archive.crawler.datamodel.CrawlServer
Set the settings handler to be used by this server.
setSha1Digest() - Method in class org.archive.io.RecordingInputStream
Convenience method for setting SHA1 digest.
setSha1Digest() - Method in class org.archive.io.RecordingOutputStream
Convenience method for setting SHA1 digest.
setSize(int) - Method in class org.archive.crawler.framework.ToePool
Change the number of ToeThreads.
setStackTrace(StackTraceElement[]) - Method in exception org.archive.io.arc.ARCReader.RecoverableIOException
 
setStartKey(DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
setStatus(String) - Method in class org.archive.crawler.admin.CrawlJob
Set the status of this CrawlJob.
setStatusCode(String) - Method in class org.archive.io.arc.ARCRecordMetaData
 
setStrict(boolean) - Method in class org.archive.io.arc.ARCReader
 
setStrict(boolean) - Method in class org.archive.io.arc.ARCRecord
 
setStringValue(String) - Method in class org.archive.crawler.util.StringIntPair
 
setThreadNumber(int) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the number of the ToeThread responsible for processing this uri.
settings - Variable in class org.archive.crawler.settings.ComplexType.Context
 
SettingsCache - Class in org.archive.crawler.settings
This class keeps a map of host names to settings objects.
SettingsCache(CrawlerSettings) - Constructor for class org.archive.crawler.settings.SettingsCache
Creates a new instance of the settings cache
SettingsFrameworkTestCase - Class in org.archive.crawler.settings
Set up a couple of settings to test different functions of the settings framework.
SettingsFrameworkTestCase() - Constructor for class org.archive.crawler.settings.SettingsFrameworkTestCase
 
settingsHandler - Variable in class org.archive.crawler.admin.CrawlJob
 
settingsHandler - Variable in class org.archive.crawler.datamodel.ServerCache
 
settingsHandler - Variable in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
SettingsHandler - Class in org.archive.crawler.settings
An instance of this class holds a hierarchy of settings.
SettingsHandler() - Constructor for class org.archive.crawler.settings.SettingsHandler
Create a new SettingsHandler object.
settingsHandler - Variable in class org.archive.crawler.url.canonicalize.RegexRuleTest
 
settingsHandler - Variable in class org.archive.crawler.url.CanonicalizerTest
 
settingsToFilename(CrawlerSettings) - Method in class org.archive.crawler.settings.XMLSettingsHandler
Resolves the filename for a settings object into a file path.
setTo(String) - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
Set the end of the time frame to check against.
setToResponseBodyStart() - Method in class org.archive.io.ReplayInputStream
 
setTotalBudget(long) - Method in class org.archive.crawler.frontier.WorkQueue
Set the total expenditure level allowable before queue is considered inherently 'over-budget'.
setTransient(boolean) - Method in class org.archive.crawler.settings.Type
Set to false if this attribute should not be serialized to persistent storage.
setType(Object) - Method in class org.archive.crawler.settings.ModuleAttributeInfo
 
setUnresolvable(CrawlURI, CrawlHost) - Method in class org.archive.crawler.fetcher.FetchDNS
 
setUp() - Method in class org.archive.configuration.registry.JmxRegistryTest
 
setUp() - Method in class org.archive.configuration.store.SerializeStoreTest
 
setUp() - Method in class org.archive.crawler.datamodel.CrawlURITest
 
setUp() - Method in class org.archive.crawler.deciderules.ConfiguredDecideRuleTest
 
setUp() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
setUp() - Method in class org.archive.crawler.extractor.ExtractorHTMLTest
 
setUp() - Method in class org.archive.crawler.frontier.RecoveryJournalTest
 
setUp() - Method in class org.archive.crawler.scope.DomainScopeTest
 
setUp() - Method in class org.archive.crawler.scope.SeedCachingScopeTest
 
setUp() - Method in class org.archive.crawler.selftest.SelfTestCase
 
setUp() - Method in class org.archive.crawler.settings.CrawlerSettingsTest
 
setUp() - Method in class org.archive.crawler.settings.MapTypeTest
 
setUp() - Method in class org.archive.crawler.settings.OverrideTest
 
setUp() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
setUp() - Method in class org.archive.crawler.settings.XMLSettingsHandlerTest
 
setUp() - Method in class org.archive.crawler.url.canonicalize.RegexRuleTest
 
setUp() - Method in class org.archive.crawler.url.CanonicalizerTest
 
setUp() - Method in class org.archive.crawler.util.BdbUriUniqFilterTest
 
setUp() - Method in class org.archive.crawler.util.BloomUriUniqFilterTest
 
setUp() - Method in class org.archive.crawler.util.FPUriUniqFilterTest
 
setup(UURI, UURI, InputStream, Charset, ExtractErrorListener) - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
setup(UURI, UURI, CharSequence, ExtractErrorListener) - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
setup(UURI, CharSequence, ExtractErrorListener) - Method in class org.archive.extractor.CharSequenceLinkExtractor
Convenience method for when source and base are same.
setup(UURI, InputStream, Charset, ExtractErrorListener) - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
setup(UURI, UURI, InputStream, Charset, ExtractErrorListener) - Method in interface org.archive.extractor.LinkExtractor
Setup the LinkExtractor to operate on the given stream and charset, considering the given contextURI as the initial 'base' URI for resolving relative URIs.
setup(UURI, InputStream, Charset, ExtractErrorListener) - Method in interface org.archive.extractor.LinkExtractor
Convenience version of above for common case where source and base are same.
setUp() - Method in class org.archive.io.arc.ARCWriterTest
 
setUp() - Method in class org.archive.io.GzippedInputStreamTest
 
setUp() - Method in class org.archive.io.RecordingInputStreamTest
 
setUp() - Method in class org.archive.io.RecordingOutputStreamTest
 
setUp() - Method in class org.archive.io.ReplayCharSequenceFactoryTest
 
setUp() - Method in class org.archive.io.RepositionableInputStreamTest
 
setUp() - Method in class org.archive.io.SinkHandlerTest
 
setUp() - Method in class org.archive.queue.QueueTestBase
 
setUp() - Method in class org.archive.util.CachedBdbMapTest
 
setUp() - Method in class org.archive.util.FileUtilsTest
 
setUp() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
 
setUp() - Method in class org.archive.util.PaddingStringBufferTest
 
setUp() - Method in class org.archive.util.TmpDirTestCase
 
setupCheckpointRecover() - Method in class org.archive.crawler.framework.CrawlController
Does setup of checkpoint recover.
setupForCrawlStart() - Method in class org.archive.crawler.admin.CrawlJob
 
setupPool() - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
setURI() - Method in class org.archive.net.LaxURI
Coalesce _scheme to existing instances, where appropriate.
setUserAgent(String) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the user agent to use when crawling this URI.
setVia(UURI) - Method in class org.archive.crawler.datamodel.CandidateURI
 
setWakeTime(long) - Method in class org.archive.crawler.frontier.WorkQueue
 
shouldBeForgotten(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Some URIs, if they recur, deserve another chance at consideration: they might not be too many hops away via another path, or the scope may have been updated to allow them passage.
shouldCloseConnection(HttpConnection) - Method in class org.archive.httpclient.HttpRecorderGetMethod
 
shouldCloseConnection(HttpConnection) - Method in class org.archive.httpclient.HttpRecorderPostMethod
 
shouldManifest() - Method in class org.archive.io.GenerationFileHandler
 
shouldMasquerade(CrawlURI) - Method in class org.archive.crawler.datamodel.RobotsHonoringPolicy
This method returns true if the crawler should masquerade as the user agent which restrictions it opted to use.
shouldPause - Variable in class org.archive.crawler.frontier.AbstractFrontier
should the frontier hold any threads asking for URIs?
shouldRetire() - Method in class org.archive.crawler.framework.ToeThread
Whether this thread should cleanly retire at the earliest opportunity.
shouldrun - Variable in class org.archive.crawler.framework.AbstractTracker
 
shouldTerminate - Variable in class org.archive.crawler.frontier.AbstractFrontier
should the frontier send an EndedException to any threads asking for URIs?
shutdown(int) - Static method in class org.archive.crawler.Heritrix
Shutdown all running heritrix instances and the JVM.
shutdown() - Static method in class org.archive.crawler.Heritrix
 
sigquitSelf() - Static method in class org.archive.util.DevUtils
Send this JVM process a SIGQUIT; giving a thread dump and possibly a heap histogram (if using -XX:+PrintClassHistogram).
SimpleHttpServer - Class in org.archive.crawler
Wrapper for embedded Jetty server.
SimpleHttpServer() - Constructor for class org.archive.crawler.SimpleHttpServer
 
SimpleHttpServer(int, boolean) - Constructor for class org.archive.crawler.SimpleHttpServer
 
SimpleHttpServer(String, String, int, boolean) - Constructor for class org.archive.crawler.SimpleHttpServer
 
SimpleHttpServer(List, int, boolean) - Constructor for class org.archive.crawler.SimpleHttpServer
 
SimpleType - Class in org.archive.crawler.settings
A type that holds a Java type.
SimpleType(String, String, Object) - Constructor for class org.archive.crawler.settings.SimpleType
Create a new instance of SimpleType.
SimpleType(String, String, Object, Object[]) - Constructor for class org.archive.crawler.settings.SimpleType
Create a new instance of SimpleType.
SimpleTypeTest - Class in org.archive.crawler.settings
Testing of the SimpleType
SimpleTypeTest() - Constructor for class org.archive.crawler.settings.SimpleTypeTest
 
SingleHttpConnectionManager - Class in org.archive.httpclient
An HttpClient-compatible HttpConnection "manager" that actually just gives out a new connection each time -- skipping the overhead of connection management, since we already throttle our crawler with external mechanisms.
SingleHttpConnectionManager() - Constructor for class org.archive.httpclient.SingleHttpConnectionManager
 
singleLineLegend() - Method in class org.archive.crawler.datamodel.CandidateURI
 
singleLineLegend() - Method in class org.archive.crawler.framework.CrawlController
 
singleLineLegend() - Method in class org.archive.crawler.framework.ToePool
 
singleLineLegend() - Method in class org.archive.crawler.framework.ToeThread
 
singleLineLegend() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
singleLineLegend() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
singleLineLegend() - Method in class org.archive.crawler.frontier.WorkQueue
 
singleLineLegend() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
singleLineLegend() - Method in interface org.archive.util.Reporter
Return a legend for the single-line summary report as a String.
singleLineReport() - Method in class org.archive.crawler.datamodel.CandidateURI
 
singleLineReport() - Method in class org.archive.crawler.framework.CrawlController
 
singleLineReport() - Method in class org.archive.crawler.framework.ToePool
 
singleLineReport() - Method in class org.archive.crawler.framework.ToeThread
 
singleLineReport() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
singleLineReport() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
singleLineReport() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
singleLineReport() - Method in class org.archive.crawler.frontier.WorkQueue
 
singleLineReport(Reporter) - Static method in class org.archive.util.ArchiveUtils
Utility method to get a String singleLineReport from Reporter
singleLineReport() - Method in interface org.archive.util.Reporter
Return a short single-line summary report as a String.
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.datamodel.CandidateURI
 
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.framework.CrawlController
 
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
 
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
 
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueue
 
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
singleLineReportTo(PrintWriter) - Method in interface org.archive.util.Reporter
Make a single-line summary report to the passed-in writer
singlePossibleNonPassDecision(Object) - Method in class org.archive.crawler.deciderules.AcceptDecideRule
 
singlePossibleNonPassDecision(Object) - Method in class org.archive.crawler.deciderules.ConfiguredDecideRule
 
singlePossibleNonPassDecision(Object) - Method in class org.archive.crawler.deciderules.DecideRule
If this rule is "one-way" -- can only return a single possible decision other than PASS -- return that decision.
singlePossibleNonPassDecision(Object) - Method in class org.archive.crawler.deciderules.RejectDecideRule
 
singleThreadMode() - Method in class org.archive.crawler.framework.CrawlController
Go to single thread mode, where only one ToeThread may proceed at a time.
SinkHandler - Class in org.archive.io
A handler that keeps an in-memory vector of all events deemed loggable by configuration.
SinkHandler() - Constructor for class org.archive.io.SinkHandler
 
SinkHandlerLogRecord - Class in org.archive.io
Version of LogRecord used by SinkHandler.
SinkHandlerLogRecord() - Constructor for class org.archive.io.SinkHandlerLogRecord
 
SinkHandlerLogRecord(LogRecord) - Constructor for class org.archive.io.SinkHandlerLogRecord
 
SinkHandlerTest - Class in org.archive.io
 
SinkHandlerTest() - Constructor for class org.archive.io.SinkHandlerTest
 
size() - Method in class org.archive.crawler.framework.ProcessorChain
Get the number of processors in this chain.
size() - Method in class org.archive.crawler.framework.ProcessorChainList
Get the number of processor chains.
size - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Size of queue.
size() - Method in class org.archive.crawler.settings.ComplexType.MBeanAttributeInfoIterator
 
size() - Method in class org.archive.crawler.settings.DataContainer
 
size() - Method in class org.archive.crawler.settings.ListType
Get the number of elements in this list.
size(Object) - Method in class org.archive.crawler.settings.MapType
Get the number of elements in this map.
size() - Method in class org.archive.crawler.settings.SoftSettingsHash
Returns the number of key-value mappings in this map.
size() - Method in interface org.archive.util.BloomFilter
The number of character sequences in the filter.
size() - Method in class org.archive.util.BloomFilter32bit
The number of character sequences in the filter.
size() - Method in class org.archive.util.BloomFilter32bitSplit
The number of character sequences in the filter.
size() - Method in class org.archive.util.BloomFilter32bp2
The number of character sequences in the filter.
size() - Method in class org.archive.util.BloomFilter32bp2Split
The number of character sequences in the filter.
size() - Method in class org.archive.util.BloomFilter64bit
The number of character sequences in the filter.
size() - Method in class org.archive.util.CachedBdbMap
 
skip(int) - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
skip(long) - Method in class org.archive.io.arc.ARCRecord
 
skip(long) - Method in class org.archive.io.CompositeFileInputStream
 
skip(long) - Method in class org.archive.io.RandomAccessInputStream
 
skipHttpHeader() - Method in class org.archive.io.arc.ARCRecord
Skip over the the http header if one present.
skipToProcessor(ProcessorChain, Processor) - Method in class org.archive.crawler.datamodel.CrawlURI
Set which processor should be the next processor to process this uri instead of using the default next processor.
skipToProcessorChain(ProcessorChain) - Method in class org.archive.crawler.datamodel.CrawlURI
Set which processor chain should be processing this uri next.
slash - Static variable in class org.archive.crawler.filter.PathDepthFilter
 
SLASH - Static variable in class org.archive.net.UURIFactory
 
SLASHDOTDOTSLASH - Static variable in class org.archive.net.UURIFactory
 
slots - Variable in class org.archive.util.fingerprint.MemLongFPSet
 
smear - Variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
snapshotAppendOnlyFile(File) - Method in class org.archive.io.ObjectPlusFilesOutputStream
Store a snapshot of an object's supporting file to the current auxiliary directory.
snoozedClassQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
All per-class queues held in snoozed state, sorted by wake time.
SoftSettingsHash - Class in org.archive.crawler.settings
 
SoftSettingsHash(int) - Constructor for class org.archive.crawler.settings.SoftSettingsHash
Constructs a new, empty SoftSettingsHash with the given initial capacity.
SoftSettingsHash.EntryIterator - Class in org.archive.crawler.settings
Iterator over all elements in hash.
SoftSettingsHash.EntryIterator() - Constructor for class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
SoftSettingsHash.SettingsEntry - Class in org.archive.crawler.settings
The entries in this hash extend SoftReference, using the host string as the key.
SoftSettingsHash.SettingsEntry(String, CrawlerSettings, ReferenceQueue, int, SoftSettingsHash.SettingsEntry) - Constructor for class org.archive.crawler.settings.SoftSettingsHash.SettingsEntry
Create new entry.
sorted - Variable in class org.archive.util.Histotable
 
Sorts - Class in org.archive.crawler.util
 
Sorts() - Constructor for class org.archive.crawler.util.Sorts
 
sortStringIntHashMap(HashMap) - Static method in class org.archive.crawler.util.Sorts
 
source - Variable in class org.archive.extractor.CharSequenceLinkExtractor
 
sourceContent - Variable in class org.archive.extractor.CharSequenceLinkExtractor
 
SPACE - Static variable in class org.archive.net.UURIFactory
 
spawn(int) - Method in class org.archive.crawler.framework.Processor
 
SPECULATIVE_HOP - Static variable in class org.archive.crawler.extractor.Link
speculative/aggressively extracted links, perhaps embed or nav, as in javascript
SPECULATIVE_MISC - Static variable in class org.archive.crawler.extractor.Link
stand-in value for speculative/aggressively extracted urls without other context
split(String, CharSequence) - Static method in class org.archive.util.TextUtils
Utility method using a precompiled pattern instead of using the split method of the String class.
SQUOT - Static variable in class org.archive.net.UURIFactory
 
Stack - Interface in org.archive.queue
Simple Stack: supports add and remove at top.
STANDARD_REPORT - Static variable in class org.archive.crawler.framework.ToePool
 
STANDARD_REPORT - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
standardReportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
 
start() - Method in interface org.archive.crawler.framework.Frontier
Request that Frontier allow crawling to begin.
start() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
start() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
start() - Method in class org.archive.crawler.Heritrix
Start Heritrix.
start - Variable in class org.archive.io.CharSubSequence
 
startCrawler() - Method in class org.archive.crawler.admin.CrawlJobHandler
Allow jobs to be crawled.
startCrawling() - Method in class org.archive.crawler.Heritrix
 
startDigest() - Method in class org.archive.io.RecordingInputStream
 
startDigest() - Method in class org.archive.io.RecordingOutputStream
Starts digesting recorded data, if a MessageDigest has been set.
startDocument() - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
 
startElement(String, String, String, Attributes) - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
Start of an element.
startEmbeddedWebserver(int, String) - Static method in class org.archive.crawler.Heritrix
Start up the embedded Jetty webserver instance.
startKey - Variable in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
startNextJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
Start next crawl job.
startNextJobInternal() - Method in class org.archive.crawler.admin.CrawlJobHandler
 
startServer() - Method in class org.archive.crawler.SimpleHttpServer
Start the server.
startsWith(byte[], byte[]) - Static method in class org.archive.util.ArchiveUtils
Verify that the array begins with the prefix.
state - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Last known state of HQ -- ALL methods should use getState() to read this value, never read it directly.
statistics - Variable in class org.archive.crawler.framework.CrawlController
 
StatisticsLogFormatter - Class in org.archive.crawler.io
 
StatisticsLogFormatter() - Constructor for class org.archive.crawler.io.StatisticsLogFormatter
 
StatisticsTracker - Class in org.archive.crawler.admin
This is an implementation of the AbstractTracker.
StatisticsTracker(String) - Constructor for class org.archive.crawler.admin.StatisticsTracker
 
StatisticsTracking - Interface in org.archive.crawler.framework
An interface for objects that want to collect statistics on running crawls.
STATUS_ABORTED - Static variable in class org.archive.crawler.admin.CrawlJob
Job was terminted by user input while crawling
STATUS_CHECKPOINTING - Static variable in class org.archive.crawler.admin.CrawlJob
Job is being checkpointed.
STATUS_CREATED - Static variable in class org.archive.crawler.admin.CrawlJob
Inital value.
STATUS_DELETED - Static variable in class org.archive.crawler.admin.CrawlJob
Job was deleted by user, will not be displayed in UI.
STATUS_FINISHED - Static variable in class org.archive.crawler.admin.CrawlJob
Job finished normally having completed its crawl.
STATUS_FINISHED_ABNORMAL - Static variable in class org.archive.crawler.admin.CrawlJob
Something went very wrong
STATUS_FINISHED_DATA_LIMIT - Static variable in class org.archive.crawler.admin.CrawlJob
Job finished normally when the specifed amount of data (MB) had been downloaded
STATUS_FINISHED_DOCUMENT_LIMIT - Static variable in class org.archive.crawler.admin.CrawlJob
Job finished normally when the specified number of documents had been fetched.
STATUS_FINISHED_TIME_LIMIT - Static variable in class org.archive.crawler.admin.CrawlJob
Job finished normally when the specified timelimit was hit.
STATUS_MISCONFIGURED - Static variable in class org.archive.crawler.admin.CrawlJob
Job could not be launced due to an InitializationException
STATUS_PAUSED - Static variable in class org.archive.crawler.admin.CrawlJob
Job was temporarly stopped.
STATUS_PENDING - Static variable in class org.archive.crawler.admin.CrawlJob
Job has been successfully submitted to a CrawlJobHandler
STATUS_PREPARING - Static variable in class org.archive.crawler.admin.CrawlJob
 
STATUS_PROFILE - Static variable in class org.archive.crawler.admin.CrawlJob
Job is actually a profile
STATUS_RUNNING - Static variable in class org.archive.crawler.admin.CrawlJob
Job is being crawled
STATUS_WAITING_FOR_PAUSE - Static variable in class org.archive.crawler.admin.CrawlJob
Job is going to be temporarly stopped after active threads are finished.
STATUSCODE_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for statuscode field.
statusCodeDistribution - Variable in class org.archive.crawler.admin.StatisticsTracker
Keep track of fetch status codes
stop() - Method in class org.archive.crawler.admin.CrawlJobHandler
 
stop() - Method in class org.archive.crawler.Heritrix
Stop Heritrix.
stopCrawler() - Method in class org.archive.crawler.admin.CrawlJobHandler
Stop future jobs from being crawled.
stopCrawling() - Method in class org.archive.crawler.admin.CrawlJob
 
stopCrawling() - Method in class org.archive.crawler.Heritrix
 
stopServer() - Method in class org.archive.crawler.SimpleHttpServer
Stop the running server.
Store - Interface in org.archive.configuration
 
STORE_DIR_KEY - Static variable in interface org.archive.configuration.Store
System property key of where on disk to persist.
StoreElement - Class in org.archive.configuration
 
StoreElement(Configuration, ObjectName) - Constructor for class org.archive.configuration.StoreElement
 
STR_ARRAY_TYPE - Static variable in class org.archive.configuration.Configuration
Make a String ArrayType used later in subclass definitions.
STRICT - Static variable in class org.archive.httpclient.ConfigurableX509TrustManager
Strict trust.
strictAdd(CrawlURI, boolean) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
An internal method for adding URIs to the queue.
STRING - Static variable in class org.archive.crawler.settings.SettingsHandler
 
STRING_LIST - Static variable in class org.archive.crawler.settings.SettingsHandler
 
STRING_URI_DETECTOR - Static variable in class org.archive.crawler.extractor.ExtractorJS
 
STRING_URI_DETECTOR - Static variable in class org.archive.extractor.RegexpJSLinkExtractor
 
StringIntPair - Class in org.archive.crawler.util
 
StringIntPair(String, int) - Constructor for class org.archive.crawler.util.StringIntPair
 
StringIntPairComparator - Class in org.archive.crawler.util
 
StringIntPairComparator() - Constructor for class org.archive.crawler.util.StringIntPairComparator
 
StringList - Class in org.archive.crawler.settings
List of String values.
StringList(String, String) - Constructor for class org.archive.crawler.settings.StringList
Creates a new StringList.
StringList(String, String, StringList) - Constructor for class org.archive.crawler.settings.StringList
Creates a new StringList and initializes it with the values from another StringList.
StringList(String, String, String[]) - Constructor for class org.archive.crawler.settings.StringList
Creates a new StringList and initializes it with the values from an array of Strings.
strings - Variable in class org.archive.extractor.RegexpJSLinkExtractor
 
StringToType(String, String) - Static method in class org.archive.crawler.settings.SettingsHandler
Convert a String object to an object of typeName.
stripExtension(String, String) - Static method in class org.archive.io.arc.ARCReader
 
StripSessionIDs - Class in org.archive.crawler.url.canonicalize
Strip known session ids.
StripSessionIDs(String) - Constructor for class org.archive.crawler.url.canonicalize.StripSessionIDs
 
StripSessionIDsTest - Class in org.archive.crawler.url.canonicalize
Test stripping of session ids.
StripSessionIDsTest() - Constructor for class org.archive.crawler.url.canonicalize.StripSessionIDsTest
 
stripToMinimal() - Method in class org.archive.crawler.datamodel.CrawlURI
Remove all attributes set on this uri.
StripUserinfoRule - Class in org.archive.crawler.url.canonicalize
Strip any 'userinfo' found on http/https URLs.
StripUserinfoRule(String) - Constructor for class org.archive.crawler.url.canonicalize.StripUserinfoRule
 
StripUserinfoRuleTest - Class in org.archive.crawler.url.canonicalize
Test stripping of userinfo from an url.
StripUserinfoRuleTest() - Constructor for class org.archive.crawler.url.canonicalize.StripUserinfoRuleTest
 
StripWWWRule - Class in org.archive.crawler.url.canonicalize
Strip any 'www' found on http/https URLs, IF they have some path/query component (content after third slash).
StripWWWRule(String) - Constructor for class org.archive.crawler.url.canonicalize.StripWWWRule
 
StripWWWRuleTest - Class in org.archive.crawler.url.canonicalize
Test stripping 'www' if present.
StripWWWRuleTest() - Constructor for class org.archive.crawler.url.canonicalize.StripWWWRuleTest
 
SUB_DOMAIN - Static variable in class org.archive.configuration.registry.JmxRegistryTest
 
SUBACTION - Static variable in class org.archive.crawler.admin.ui.JobConfigureUtils
 
subList(int, int) - Method in class org.archive.crawler.settings.ListType
 
subSequence(int, int) - Method in class org.archive.crawler.settings.TextField
 
subSequence(int, int) - Method in class org.archive.io.CharSubSequence
 
subSequence(int, int) - Method in class org.archive.net.UURI
 
subset(CrawlURI, Class) - Method in class org.archive.crawler.datamodel.CredentialStore
Return set made up of all credentials of the passed type.
subset(CrawlURI, Class, String) - Method in class org.archive.crawler.datamodel.CredentialStore
Return set made up of all credentials of the passed type.
substats - Variable in class org.archive.crawler.datamodel.CrawlHost
 
substats - Variable in class org.archive.crawler.datamodel.CrawlServer
 
substats - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
 
substats - Variable in class org.archive.crawler.frontier.WorkQueue
Substats for all CrawlURIs in this group
succeededFetchCount() - Method in interface org.archive.crawler.framework.Frontier
Number of successfully processed URIs.
succeededFetchCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
succeededFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
(non-Javadoc)
succeededFetchCount() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
successBytes - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
successDisposition(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
The CrawlURI has been successfully crawled.
successfullyFetchedCount() - Method in class org.archive.crawler.admin.StatisticsTracker
 
successfullyFetchedCount() - Method in interface org.archive.crawler.framework.StatisticsTracking
Number of successfully processed URIs.
suite(String, CrawlJob, File, File) - Static method in class org.archive.crawler.selftest.AllSelfTestCases
Run all known tests in the selftest suite.
suite(String, CrawlJob, File, File, List) - Static method in class org.archive.crawler.selftest.AllSelfTestCases
Run list of passed tests.
suite() - Static method in class org.archive.crawler.util.BdbUriUniqFilterTest
return the suite of tests for MemQueueTest
suite() - Static method in class org.archive.httpclient.ConfigurableX509TrustManagerTest
Select tests to run.
suite() - Static method in class org.archive.queue.MemQueueTest
return the suite of tests for MemQueueTest
suite() - Static method in class org.archive.util.ArchiveUtilsTest
return the suite of tests for ArchiveUtilsTest
suite() - Static method in class org.archive.util.fingerprint.LongFPSetCacheTest
return the suite of tests for LongFPSetCacheTest
suite() - Static method in class org.archive.util.fingerprint.MemLongFPSetTest
return the suite of tests for MemLongFPSetTest
suite() - Static method in class org.archive.util.PaddingStringBufferTest
return the suite of tests for PaddingStringBufferTest
suite() - Static method in class org.archive.util.SurtPrefixSetTest
return the suite of tests for SurtPrefixSetTest
suite() - Static method in class org.archive.util.SURTTest
return the suite of tests for MemQueueTest
suite() - Static method in class org.archive.util.TextUtilsTest
return the suite of tests for MemQueueTest
SupplementaryLinksScoper - Class in org.archive.crawler.postprocessor
Run CandidateURI links carried in the passed CrawlURI through a filter and 'handle' rejections.
SupplementaryLinksScoper(String) - Constructor for class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
SURT - Class in org.archive.util
Sort-friendly URI Reordering Transform.
SURT() - Constructor for class org.archive.util.SURT
 
SurtAuthorityQueueAssignmentPolicy - Class in org.archive.crawler.frontier
SurtAuthorityQueueAssignmentPolicy based on the surt form of hostname.
SurtAuthorityQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.SurtAuthorityQueueAssignmentPolicy
 
SurtPrefixedDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs that, when expressed in SURT form, begin with one of the prefixes in the configured set.
SurtPrefixedDecideRule(String) - Constructor for class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Usual constructor.
surtPrefixes - Variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
surtPrefixes - Variable in class org.archive.crawler.filter.SurtPrefixFilter
 
surtPrefixes - Variable in class org.archive.crawler.scope.SurtPrefixScope
 
SurtPrefixFilter - Class in org.archive.crawler.filter
A filter which tests a URI against a set of SURT prefixes, and if the URI's prefix is in the set, returns the chosen true/false accepts value.
SurtPrefixFilter(String) - Constructor for class org.archive.crawler.filter.SurtPrefixFilter
 
SurtPrefixScope - Class in org.archive.crawler.scope
A specialized CrawlScope suitable for the most common crawl needs.
SurtPrefixScope(String) - Constructor for class org.archive.crawler.scope.SurtPrefixScope
 
SurtPrefixSet - Class in org.archive.util
Specialized TreeSet for keeping a set of String prefixes.
SurtPrefixSet() - Constructor for class org.archive.util.SurtPrefixSet
 
SurtPrefixSetTest - Class in org.archive.util
 
SurtPrefixSetTest(String) - Constructor for class org.archive.util.SurtPrefixSetTest
Create a new SurtPrefixSetTest object
SURTTest - Class in org.archive.util
JUnit test suite for SURT
SURTTest(String) - Constructor for class org.archive.util.SURTTest
Create a new MemQueueTest object
suspend(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
Suspends this WorkQueue.
sweepHand - Variable in class org.archive.util.fingerprint.LongFPSetCache
 
sync() - Method in class org.archive.util.CachedBdbMap
Sync in-memory map entries to backing disk store.
syncDirectories(File, FilenameFilter, File) - Static method in class org.archive.util.FileUtils
Use for case where files are being added to src.

T

tagDefineButton(int, Vector) - Method in class org.archive.crawler.extractor.CustomSWFTags
 
tagDefineButton2(int, boolean, Vector) - Method in class org.archive.crawler.extractor.CustomSWFTags
 
tagDoAction() - Method in class org.archive.crawler.extractor.CustomSWFTags
 
tags - Variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
tail(String) - Static method in class org.archive.crawler.util.LogReader
Implementation of a unix-like 'tail' command
tail(String, int) - Static method in class org.archive.crawler.util.LogReader
Implementation of a unix-like 'tail -n' command
tail(RandomAccessFile, int) - Static method in class org.archive.crawler.util.LogReader
Implementation of a unix-like 'tail -n' command
tally(CrawlURI) - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
tally(Object) - Method in class org.archive.util.Histotable
Record one more occurence of the given object key.
tallyCurrentPause() - Method in class org.archive.crawler.framework.AbstractTracker
For a current pause (if any), add paused time to total and reset
targetSize - Variable in class org.archive.crawler.framework.ToePool
 
tearDown() - Method in class org.archive.configuration.registry.JmxRegistryTest
 
tearDown() - Method in class org.archive.configuration.store.SerializeStoreTest
 
tearDown() - Method in class org.archive.crawler.extractor.ExtractorHTMLTest
 
tearDown() - Method in class org.archive.crawler.frontier.RecoveryJournalTest
 
tearDown() - Method in class org.archive.crawler.scope.SeedCachingScopeTest
 
tearDown() - Method in class org.archive.crawler.settings.CrawlerSettingsTest
 
tearDown() - Method in class org.archive.crawler.settings.MapTypeTest
 
tearDown() - Method in class org.archive.crawler.settings.OverrideTest
 
tearDown() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
tearDown() - Method in class org.archive.crawler.settings.XMLSettingsHandlerTest
 
tearDown() - Method in class org.archive.crawler.util.BdbUriUniqFilterTest
 
tearDown() - Method in class org.archive.io.arc.ARCWriterTest
 
tearDown() - Method in class org.archive.io.GzippedInputStreamTest
 
tearDown() - Method in class org.archive.io.RepositionableInputStreamTest
 
tearDown() - Method in class org.archive.queue.QueueTestBase
 
tearDown() - Method in class org.archive.util.CachedBdbMapTest
 
tearDown() - Method in class org.archive.util.FileUtilsTest
 
tearDown() - Method in class org.archive.util.TmpDirTestCase
 
terminate() - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that it should end the crawl, giving any worker ToeThread that askss for a next() an EndedException.
terminate() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
terminate() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
terminateCurrentJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
 
Test - Class in org.archive.crawler.processor
A very simple extractor.
Test(String) - Constructor for class org.archive.crawler.processor.Test
 
test2kURI() - Method in class org.archive.net.UURIFactoryTest
Test for [ 1012520 ] UURI.length() > 2k.
testAbsolute() - Method in class org.archive.net.UURIFactoryTest
 
testACCEPT() - Method in class org.archive.crawler.deciderules.ConfiguredDecideRuleTest
 
testAccepts() - Method in class org.archive.crawler.filter.PathologicalPathFilterTest
 
testACCEPTWins() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testAdd() - Method in class org.archive.util.fingerprint.ArrayLongFPCacheTest
 
testAdd() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
check that we can add fingerprints
testAddComplexType() - Method in class org.archive.crawler.settings.CrawlerSettingsTest
 
testAdded() - Method in class org.archive.crawler.frontier.RecoveryJournalTest
 
testAdding() - Method in class org.archive.crawler.util.BdbUriUniqFilterTest
 
testAdding() - Method in class org.archive.crawler.util.BloomUriUniqFilterTest
 
testAdding() - Method in class org.archive.crawler.util.FPUriUniqFilterTest
 
testAddRemoveSizeGlobal() - Method in class org.archive.crawler.settings.MapTypeTest
Test different aspects of manipulating a MapType for the global settings.
testAddRemoveSizeHost() - Method in class org.archive.crawler.settings.MapTypeTest
Test different aspects of manipulating a MapType for the per domain settings.
testAnchors() - Method in class org.archive.net.UURIFactoryTest
A UURI should always be without a 'fragment' segment, which is unused and irrelevant for network fetches.
testAppendInt() - Method in class org.archive.util.PaddingStringBufferTest
check that append(int) works
testAppendLong() - Method in class org.archive.util.PaddingStringBufferTest
check that append(long) works
testAppendString() - Method in class org.archive.util.PaddingStringBufferTest
test that append(String) works correctly
testARCWriterPool() - Method in class org.archive.io.arc.ARCWriterPoolTest
 
testArrayToLong() - Method in class org.archive.util.ArchiveUtilsTest
 
testAtSymbolInPath() - Method in class org.archive.util.SURTTest
 
testAuth() - Method in class org.archive.crawler.selftest.AuthSelfTest
Test the max-link-hops setting is being respected.
testBackgroundImageExtraction() - Method in class org.archive.crawler.selftest.BackgroundImageExtractionSelfTestCase
Read ARC file for the background image the file that contained it.
testBackingDbGetsUpdated() - Method in class org.archive.util.CachedBdbMapTest
 
testBad12Date() - Method in class org.archive.util.ArchiveUtilsTest
check that parse12DigitDate doesn't accept a bad date
testBad14Date() - Method in class org.archive.util.ArchiveUtilsTest
check that parse14DigitDate doesn't accept a bad date
testBad17Date() - Method in class org.archive.util.ArchiveUtilsTest
check that parse12DigitDate doesn't accept a bad date
testBadBaseResolve() - Method in class org.archive.net.UURIFactoryTest
 
testByteArrayEquals() - Method in class org.archive.util.ArchiveUtilsTest
check that byteArrayEquals() works
testCalculateInsertKey() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueuesTest
Basic sanity checks for calculateInsertKey() -- ensure ordinal, cost, and schedulingDirective have the intended effects, for ordinal values up through 1/4th of the maximum (about 2^61).
testCandidateURIWithLoadedAList() - Method in class org.archive.crawler.datamodel.CrawlURITest
 
testCanonicalize() - Method in class org.archive.crawler.url.canonicalize.FixupQueryStrTest
 
testCanonicalize() - Method in class org.archive.crawler.url.canonicalize.LowercaseRuleTest
 
testCanonicalize() - Method in class org.archive.crawler.url.canonicalize.RegexRuleTest
 
testCanonicalize() - Method in class org.archive.crawler.url.canonicalize.StripSessionIDsTest
 
testCanonicalize() - Method in class org.archive.crawler.url.canonicalize.StripUserinfoRuleTest
 
testCanonicalize() - Method in class org.archive.crawler.url.canonicalize.StripWWWRuleTest
 
testCanonicalize() - Method in class org.archive.crawler.url.CanonicalizerTest
 
testCharset() - Method in class org.archive.crawler.selftest.CharsetSelfTest
Look for last file in link chain.
testCheckARCFileSize() - Method in class org.archive.io.arc.ARCWriterTest
 
testCheckARCFileSizeCompressed() - Method in class org.archive.io.arc.ARCWriterTest
 
testCheckParameters() - Method in class org.archive.io.ReplayCharSequenceFactoryTest
 
testCommaTruncate() - Method in class org.archive.util.MimetypeUtilsTest
 
testCompressedARCFile(File) - Static method in class org.archive.io.arc.ARCUtils
Check file is compressed and in ARC GZIP format.
testCompressedARCFile(File, boolean) - Static method in class org.archive.io.arc.ARCUtils
Check file is compressed and in ARC GZIP format.
testCompressedARCStream(InputStream) - Static method in class org.archive.io.arc.ARCUtils
Tests passed stream is gzip stream by reading in the HEAD.
testCompressedRepositionalStream(RepositionableStream) - Static method in class org.archive.io.arc.ARCUtils
Tests passed stream is gzip stream by reading in the HEAD.
testCompressedStream(InputStream) - Static method in class org.archive.io.arc.ARCUtils
Tests passed stream is gzip stream by reading in the HEAD.
testCompressedStream() - Method in class org.archive.io.arc.ARCUtilsTest
 
testContains() - Method in class org.archive.util.fingerprint.ArrayLongFPCacheTest
 
testContains() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
check that contains() does what we expect
testCopyFile() - Method in class org.archive.util.FileUtilsTest
 
testCopyFiles() - Method in class org.archive.util.FileUtilsTest
 
testCopySettings() - Method in class org.archive.crawler.settings.XMLSettingsHandlerTest
Test the copying of the entire settings directory.
testCount() - Method in class org.archive.util.fingerprint.LongFPSetCacheTest
This is a cache buffer, which does not grow, but chucks out old values.
testCount() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
check count works ok
testCountOfMembers() - Method in class org.archive.io.GzippedInputStreamTest
 
testCrawlOrder() - Method in class org.archive.configuration.registry.JmxRegistryTest
 
testCrawlURIKeys() - Method in class org.archive.crawler.datamodel.ServerCacheTest
 
testCreateCompositeType() - Method in class org.archive.util.JmxUtilsTest
 
testCreateKey() - Method in class org.archive.crawler.util.BdbUriUniqFilterTest
 
testCreateKeyCollisions() - Method in class org.archive.crawler.util.BdbUriUniqFilterTest
Verify that two URIs which gave colliding hashes, when previously the last 40bits of the composite did not sufficiently vary with certain inputs, no longer collide.
testCredentials() - Method in class org.archive.crawler.datamodel.CredentialStoreTest
 
testCurlies() - Method in class org.archive.net.UURIFactoryTest
 
testDefault() - Method in class org.archive.crawler.deciderules.ConfiguredDecideRuleTest
 
testDeleteSettingsObject() - Method in class org.archive.crawler.settings.XMLSettingsHandlerTest
 
testDequeue() - Method in class org.archive.queue.QueueTestBase
test that dequeue works
testDequeueEmptyQueue() - Method in class org.archive.queue.QueueTestBase
check what happens we dequeue on empty
testDnsHost() - Method in class org.archive.net.UURIFactoryTest
 
testDoubleEncoding() - Method in class org.archive.net.UURIFactoryTest
Test for doubly-encoded sequences.
testDoubleToString() - Method in class org.archive.util.ArchiveUtilsTest
test doubleToString()
testEmbedSrc() - Method in class org.archive.crawler.extractor.ExtractorHTMLTest
Test a particular construct that was suspicious in the No10GovUk crawl.
testEmptySequence() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testEquals() - Method in class org.archive.crawler.settings.SimpleTypeTest
 
testEscapeEncoding() - Method in class org.archive.net.UURIFactoryTest
 
testEscaping() - Method in class org.archive.net.UURIFactoryTest
 
testEscapingNotNecessary() - Method in class org.archive.net.UURIFactoryTest
 
testFilesFound() - Method in class org.archive.crawler.selftest.BadURIsStopPageParsingSelfTest
 
testFilesFound() - Method in class org.archive.crawler.selftest.FlashParseSelfTest
 
testFilesInArc(List) - Method in class org.archive.crawler.selftest.SelfTestCase
Test passed list were all found in the arc.
testFilesInArc(List, List) - Method in class org.archive.crawler.selftest.SelfTestCase
Test passed list were all found in the arc.
testFoo() - Method in class org.archive.util.fingerprint.MemLongFPSetTest
 
testForget() - Method in class org.archive.crawler.util.BdbUriUniqFilterTest
 
testForget() - Method in class org.archive.crawler.util.FPUriUniqFilterTest
 
testFormatBytesForDisplay() - Method in class org.archive.util.ArchiveUtilsTest
 
testFrames() - Method in class org.archive.crawler.selftest.FramesSelfTestCase
Verify that all frames and their contents are found by the crawler.
testFtpUris() - Method in class org.archive.net.UURIFactoryTest
 
testGapError() - Method in class org.archive.io.arc.ARCWriterTest
 
testGeneral() - Method in class org.archive.crawler.scope.SeedCachingScopeTest
 
testGeneral() - Method in class org.archive.crawler.scope.SeedFileIteratorTest
 
testGetArcfileName() - Method in class org.archive.io.arc.ARCUtilsTest
 
testGetAttribute() - Method in class org.archive.crawler.settings.MapTypeTest
 
testGetClasspathPath() - Method in class org.archive.util.IoUtilsTest
 
testGetConstraints() - Method in class org.archive.crawler.settings.SimpleTypeTest
 
testGetDefaultValue() - Method in class org.archive.crawler.settings.MapTypeTest
 
testGetDefaultValue() - Method in class org.archive.crawler.settings.SimpleTypeTest
 
testGetDescription() - Method in class org.archive.crawler.settings.SimpleTypeTest
 
testGetFileURL() - Method in class org.archive.io.arc.ARCReaderFactoryTest
Test File URL.
testGetFirstWord() - Method in class org.archive.util.TextUtilsTest
 
testGetInputStreamFileFileString() - Method in class org.archive.crawler.util.IoUtilsTest
 
testGetLegalValues() - Method in class org.archive.crawler.settings.MapTypeTest
 
testGetLegalValues() - Method in class org.archive.crawler.settings.SimpleTypeTest
 
testGetLegalValueType() - Method in class org.archive.crawler.settings.SimpleTypeTest
 
testGetModule() - Method in class org.archive.crawler.settings.CrawlerSettingsTest
 
testGetName() - Method in class org.archive.crawler.settings.SimpleTypeTest
 
testGetPathOrURL() - Method in class org.archive.io.arc.ARCReaderFactoryTest
Test path or url.
testGetReplayCharSequenceByteOffset() - Method in class org.archive.io.ReplayCharSequenceFactoryTest
 
testGetReplayCharSequenceByteZeroOffset() - Method in class org.archive.io.ReplayCharSequenceFactoryTest
 
testGetReplayCharSequenceMultiByteOffset() - Method in class org.archive.io.ReplayCharSequenceFactoryTest
 
testGetReplayCharSequenceMultiByteZeroOffset() - Method in class org.archive.io.ReplayCharSequenceFactoryTest
 
testGetSettings() - Method in class org.archive.crawler.settings.XMLSettingsHandlerTest
 
testGetSettingsObject() - Method in class org.archive.crawler.settings.XMLSettingsHandlerTest
 
testGetValue() - Method in class org.archive.crawler.settings.MapTypeTest
 
testGetXXDigitDate() - Method in class org.archive.util.ArchiveUtilsTest
check the getXXDigitDate() methods produce valid dates
testGetXXDigitDateLong() - Method in class org.archive.util.ArchiveUtilsTest
check that getXXDigitDate(long) does the right thing
testGzipMagic(InputStream) - Method in class org.archive.io.GzipHeader
Test gzip magic is next in the stream.
testGzipMagic(InputStream, CRC32) - Method in class org.archive.io.GzipHeader
Test gzip magic is next in the stream.
testHasScheme() - Method in class org.archive.net.UURITest
 
testHeritrixTrustStore() - Method in class org.archive.httpclient.ConfigurableX509TrustManagerTest
Test heritrix trust store.
testHolds() - Method in class org.archive.crawler.datamodel.ServerCacheTest
 
testHopLimit(int, char, String, String) - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testHops() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testHostEncodedChars() - Method in class org.archive.net.UURIFactoryTest
Test for NPE in java.net.URI.encode
testHostWithDigit() - Method in class org.archive.net.UURIFactoryTest
Test for java.net.URI#getHost fails when leading digit.
testHostWithLessThan() - Method in class org.archive.net.UURIFactoryTest
Test for [ 962892 ] UURI accepting/creating unUsable URIs (bad hosts).
testHostWithPeriod() - Method in class org.archive.net.UURIFactoryTest
Test for doing separate DNS lookup for same host
testHostWithUnderscores() - Method in class org.archive.net.UURIFactoryTest
Test for java.net.URI chokes on hosts_with_underscores.
testHQ() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueueTest
 
testHrefWhitespace() - Method in class org.archive.crawler.extractor.ExtractorHTMLTest
Test a whitespace issue found in href.
testHyphenInHost() - Method in class org.archive.crawler.scope.SeedFileIteratorTest
 
testIdn() - Method in class org.archive.net.UURIFactoryTest
 
testImportFromUris() - Method in class org.archive.util.SurtPrefixSetTest
 
testInnerProcess() - Method in class org.archive.crawler.extractor.ExtractorHTMLTest
 
testInScope() - Method in class org.archive.crawler.scope.DomainScopeTest
 
testInvalidate() - Method in class org.archive.io.arc.ARCWriterPoolTest
 
testIsWithinRefinementBounds() - Method in class org.archive.crawler.settings.refinements.TimespanCriteriaTest
 
testLengthTooLongCompressed() - Method in class org.archive.io.arc.ARCWriterTest
 
testLengthTooLongCompressedStrict() - Method in class org.archive.io.arc.ARCWriterTest
 
testLengthTooShortCompressed() - Method in class org.archive.io.arc.ARCWriterTest
 
testLengthTooShortCompressedStrict() - Method in class org.archive.io.arc.ARCWriterTest
 
testListAttributes() - Method in class org.archive.crawler.settings.MapTypeTest
 
testLogging() - Method in class org.archive.io.SinkHandlerTest
 
testLooseConfigurableX509TrustManagerTest() - Method in class org.archive.httpclient.ConfigurableX509TrustManagerTest
Test the configurable trust manager set to LOOSE.
testMarkReset() - Method in class org.archive.io.RecordingOutputStreamTest
Test mark and reset.
testMatcherRecycling() - Method in class org.archive.util.TextUtilsTest
 
testMatchesFilePattern() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testMaxLinkHops() - Method in class org.archive.crawler.selftest.MaxLinkHopsSelfTest
Test the max-link-hops setting is being respected.
testMisc() - Method in class org.archive.util.SurtPrefixSetTest
 
testMisc() - Method in class org.archive.util.SURTTest
 
testMoveElementDown() - Method in class org.archive.crawler.settings.MapTypeTest
 
testMoveElementUp() - Method in class org.archive.crawler.settings.MapTypeTest
 
testname() - Method in class org.archive.io.RepositionableInputStreamTest
 
testNewline() - Method in class org.archive.util.PaddingStringBufferTest
test the newline()
testNormalConfigurableX509TrustManagerTest() - Method in class org.archive.httpclient.ConfigurableX509TrustManagerTest
Test configurable trust manager set to NORMAL.
testNoScheme() - Method in class org.archive.crawler.scope.SeedCachingScopeTest
 
testNoScheme() - Method in class org.archive.net.UURIFactoryTest
 
testNote() - Method in class org.archive.crawler.util.BdbUriUniqFilterTest
 
testNote() - Method in class org.archive.crawler.util.BloomUriUniqFilterTest
 
testNote() - Method in class org.archive.crawler.util.FPUriUniqFilterTest
 
testNotMatchesFilePattern() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testNotRegex() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testNullFormat() - Method in class org.archive.crawler.url.canonicalize.RegexRuleTest
 
testOpenConfigurableX509TrustManagerTest() - Method in class org.archive.httpclient.ConfigurableX509TrustManagerTest
Test the configurable trust manager set to OPEN.
testOutOfScope() - Method in class org.archive.crawler.scope.DomainScopeTest
 
testOverridingOfGlobalAttribute() - Method in class org.archive.crawler.settings.OverrideTest
 
testOverridingOfNonGlobalAttribute() - Method in class org.archive.crawler.settings.OverrideTest
 
testPadTo() - Method in class org.archive.util.PaddingStringBufferTest
first check that padTo works ok, since all depends on it
testPadToInt() - Method in class org.archive.util.ArchiveUtilsTest
check that padTo(int) works
testPadToString() - Method in class org.archive.util.ArchiveUtilsTest
check that padTo(String) works
testPageParse() - Method in class org.archive.crawler.extractor.ExtractorHTMLTest
Test single net or local filesystem page parse.
testParseRobots() - Method in class org.archive.crawler.datamodel.RobotstxtTest
 
testParseXXDigitDate() - Method in class org.archive.util.ArchiveUtilsTest
Check that parseXXDigitDate() works
testPASS() - Method in class org.archive.crawler.deciderules.ConfiguredDecideRuleTest
 
testPathologicalPath() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testPatterns() - Method in class org.archive.crawler.filter.FilePatternFilterTest
Tests FilePatternFilter default pattern (all default file extension) and separate subgroups patterns such as images, audio, video, and miscellaneous groups.
testPercentEscaping() - Method in class org.archive.net.UURIFactoryTest
 
testPort() - Method in class org.archive.net.UURIFactoryTest
Test for Constraining java URI class.
testPort0080is80() - Method in class org.archive.net.UURIFactoryTest
 
testPrerequisite() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
TestProcessor - Class in org.archive.configuration.registry
Empty processor.
TestProcessor(String) - Constructor for class org.archive.configuration.registry.TestProcessor
 
testProcessorPtrs() - Method in class org.archive.configuration.registry.JmxRegistryTest
 
testQueue() - Method in class org.archive.queue.QueueTestBase
test that queue puts things on, and they stay there :)
testRaAppend() - Method in class org.archive.util.PaddingStringBufferTest
test the raAppend(String) works in the simple cases
testRaAppendInt() - Method in class org.archive.util.PaddingStringBufferTest
check that raAppend(int) works
testRaAppendLong() - Method in class org.archive.util.PaddingStringBufferTest
check that raAppend(long) works
testRaAppendWithExactLengthString() - Method in class org.archive.util.PaddingStringBufferTest
check it all works with the length == the length of the string
testRaAppendWithTooLongString() - Method in class org.archive.util.PaddingStringBufferTest
check what happens when we right append, but the string is longer than the space
testRandomAccess() - Method in class org.archive.io.arc.ARCWriterTest
 
testReadFullyOrUntil() - Method in class org.archive.io.RecordingInputStreamTest
Test readFullyOrUntil soft (no exception) and hard (exception) length cutoffs.
testReadWriteRefinements() - Method in class org.archive.crawler.settings.XMLSettingsHandlerTest
 
testReadWriteStore() - Method in class org.archive.configuration.store.SerializeStoreTest
 
testRegex() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testREJECT() - Method in class org.archive.crawler.deciderules.ConfiguredDecideRuleTest
 
testREJECTWins() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testRelative() - Method in class org.archive.net.UURIFactoryTest
 
testRelativeDblPathSlashes() - Method in class org.archive.net.UURIFactoryTest
 
testRelativeEmpty() - Method in class org.archive.net.UURIFactoryTest
Test that an empty uuri does the right thing -- that we get back the base.
testRelativeURIWithTwoSlashes() - Method in class org.archive.net.UURIFactoryTest
 
testRemove() - Method in class org.archive.util.fingerprint.ArrayLongFPCacheTest
 
testRemove() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
test remove() works as expected
testReplacement() - Method in class org.archive.util.fingerprint.ArrayLongFPCacheTest
 
testReplayCharSequenceByteToString() - Method in class org.archive.io.ReplayCharSequenceFactoryTest
 
testReplayCharSequenceByteToStringMulti() - Method in class org.archive.io.ReplayCharSequenceFactoryTest
 
testRequiredField(Map, String) - Method in class org.archive.io.arc.ARCRecordMetaData
Test required field is present in hash.
testReset() - Method in class org.archive.util.PaddingStringBufferTest
check the reset method clears the buffer
testReuse() - Method in class org.archive.io.RecordingOutputStreamTest
Test reusing instance of RecordingOutputStream.
testRFC2396Relative() - Method in class org.archive.net.UURIFactoryTest
Tests from rfc2396 with amendments to accomodate differences intentionally added to make our URI handling like IEs.
testScopePlusOne() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testSecondsSinceEpochCalculation() - Method in class org.archive.util.ArchiveUtilsTest
 
testSerialization() - Method in class org.archive.crawler.datamodel.CandidateURITest
 
testSerialization() - Method in class org.archive.crawler.datamodel.CrawlURITest
Test serialization/deserialization works.
testSerializingSimpleModuleType() - Method in class org.archive.crawler.settings.CrawlerSettingsTest
 
testSerializingStringAttributeModuleType() - Method in class org.archive.crawler.settings.CrawlerSettingsTest
 
testSerializingTextField() - Method in class org.archive.crawler.settings.CrawlerSettingsTest
 
testSessionid() - Method in class org.archive.crawler.url.canonicalize.RegexRuleTest
 
testSetLegalValues() - Method in class org.archive.crawler.settings.SimpleTypeTest
 
testShiftjis() - Method in class org.archive.io.ReplayCharSequenceFactoryTest
 
testSimpleProcessor() - Method in class org.archive.configuration.registry.JmxRegistryTest
 
testSingleACCEPT() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testSinglePASS() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testSingleREJECT() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testSpaceDoubleEncoding() - Method in class org.archive.net.UURIFactoryTest
Test space plus encoding ([ 1010966 ] crawl.log has URIs with spaces in them).
testSpaceInHost() - Method in class org.archive.net.UURIFactoryTest
Test for java.net.URI parses %20 but getHost null See [ 927940 ] java.net.URI parses %20 but getHost null
testSpaceInURL() - Method in class org.archive.io.arc.ARCWriterTest
 
testStartsWithColon() - Method in class org.archive.net.UURIFactoryTest
Ensure that URI strings beginning with a colon are treated the same as browsers do (as relative, rather than as absolute with zero-length scheme).
testStraightTruncate() - Method in class org.archive.util.MimetypeUtilsTest
 
testStrayPercents() - Method in class org.archive.net.UURIFactoryTest
Ensure that stray '%' characters do not prevent UURI instances from being created, and are reasonably escaped when encountered.
testSyncDirectories() - Method in class org.archive.util.FileUtilsTest
 
testTabInURL() - Method in class org.archive.io.arc.ARCWriterTest
 
testThreeSlashes() - Method in class org.archive.net.UURIFactoryTest
Test for syntax errors stop page parsing.
testTilde() - Method in class org.archive.net.UURIFactoryTest
 
testTooLongAfterEscaping() - Method in class org.archive.net.UURIFactoryTest
 
testTooManyPathSegments() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testTrailingEncodedSpace() - Method in class org.archive.net.UURIFactoryTest
 
testTrailingPercents() - Method in class org.archive.net.UURIFactoryTest
Ensure that stray trailing '%' characters do not prevent UURI instances from being created, and are reasonably escaped when encountered.
testTransclusion() - Method in class org.archive.crawler.deciderules.DecideRuleSequenceTest
 
testTrimSpaceNBSP() - Method in class org.archive.net.UURIFactoryTest
 
testTwoDots() - Method in class org.archive.net.UURIFactoryTest
Two dots for igor.
testUncompressedARCFile(File) - Static method in class org.archive.io.arc.ARCUtils
Check file is uncompressed ARC file.
testUnderscoreMakesPortParseFail() - Method in class org.archive.net.UURIFactoryTest
 
testUserinfo() - Method in class org.archive.net.UURIFactoryTest
Preserve userinfo capitalization.
TestUtils - Class in org.archive.util
Utility methods useful in testing situations.
TestUtils() - Constructor for class org.archive.util.TestUtils
 
testWhitespaceEscaped() - Method in class org.archive.net.UURIFactoryTest
 
testWhitespaceTruncate() - Method in class org.archive.util.MimetypeUtilsTest
 
testWithZero() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
check we can call add/remove/contains() with 0 as a value
testWritebytearray() - Method in class org.archive.io.RecordingOutputStreamTest
Method to test for void write(byte []).
testWriteint() - Method in class org.archive.io.RecordingOutputStreamTest
Method to test for void write(int).
testWriteRecord() - Method in class org.archive.io.arc.ARCWriterTest
 
testWriteRecordCompressed() - Method in class org.archive.io.arc.ARCWriterTest
 
testWriteSettingsObjectCrawlerSettings() - Method in class org.archive.crawler.settings.XMLSettingsHandlerTest
 
testWriting() - Method in class org.archive.crawler.util.BdbUriUniqFilterTest
Time import of recovery log.
testWriting() - Method in class org.archive.crawler.util.BloomUriUniqFilterTest
Test inserting.
testWriting() - Method in class org.archive.crawler.util.FPUriUniqFilterTest
Test inserting and removing.
testZeroPadInteger() - Static method in class org.archive.util.ArchiveUtilsTest
 
TEXT - Static variable in class org.archive.crawler.settings.SettingsHandler
 
TEXT - Static variable in class org.archive.io.GzippedInputStreamTest
 
TextField - Class in org.archive.crawler.settings
Class to hold values for text fields.
TextField(String) - Constructor for class org.archive.crawler.settings.TextField
Constructs a new TextField object.
TextUtils - Class in org.archive.util
 
TextUtils() - Constructor for class org.archive.util.TextUtils
 
TextUtilsTest - Class in org.archive.util
JUnit test suite for TextUtils
TextUtilsTest(String) - Constructor for class org.archive.util.TextUtilsTest
Create a new TextUtilsTest object
TextWaitEvaluator - Class in org.archive.crawler.postprocessor
A specialized ContentBasedWaitEvaluator.
TextWaitEvaluator(String) - Constructor for class org.archive.crawler.postprocessor.TextWaitEvaluator
Constructor
threadCount() - Method in class org.archive.crawler.admin.StatisticsTracker
Get the total number of ToeThreads (sleeping and active)
ThreadLocalHttpConnectionManager - Class in org.archive.httpclient
A simple, but thread-safe HttpClient HttpConnectionManager.
ThreadLocalHttpConnectionManager() - Constructor for class org.archive.httpclient.ThreadLocalHttpConnectionManager
 
TimespanCriteria - Class in org.archive.crawler.settings.refinements
A refinement criteria that checks if a URI is requested within a specific time frame.
TimespanCriteria(String, String) - Constructor for class org.archive.crawler.settings.refinements.TimespanCriteria
Create a new instance of TimespanCriteria.
TimespanCriteriaTest - Class in org.archive.crawler.settings.refinements
 
TimespanCriteriaTest() - Constructor for class org.archive.crawler.settings.refinements.TimespanCriteriaTest
 
TIMESTAMP - Static variable in class org.archive.crawler.settings.SettingsHandler
 
TIMESTAMP12 - Static variable in class org.archive.util.ArchiveUtils
Arc-style date stamp in the format yyyyMMddHHmm and UTC time zone.
TIMESTAMP14 - Static variable in class org.archive.util.ArchiveUtils
Arc-style date stamp in the format yyyyMMddHHmmss and UTC time zone.
TIMESTAMP14ISO8601Z - Static variable in class org.archive.util.ArchiveUtils
Log-style date stamp in the format yyyy-MM-dd'T'HH:mm:ss'Z' UTC time zone is assumed.
TIMESTAMP17 - Static variable in class org.archive.util.ArchiveUtils
Arc-style date stamp in the format yyyyMMddHHmmssSSS and UTC time zone.
TIMESTAMP17ISO8601Z - Static variable in class org.archive.util.ArchiveUtils
Log-style date stamp in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z' UTC time zone is assumed.
timestamp17ToCalendar(String) - Static method in class org.archive.util.ArchiveUtils
Convert 17-digit date format timestamps (as found in crawl.log, for example) into a GregorianCalendar object.
TIMESTAMP_INTERVAL - Variable in class org.archive.crawler.frontier.RecoveryJournal
 
TLDs - Static variable in class org.archive.crawler.extractor.ExtractorUniversal
Matches any string that begins with a TLD (no .) followed by a '/' slash or end of string.
TmpDirTestCase - Class in org.archive.util
Base class for TestCases that want access to a tmp dir for the writing of files.
TmpDirTestCase() - Constructor for class org.archive.util.TmpDirTestCase
 
TmpDirTestCase(String) - Constructor for class org.archive.util.TmpDirTestCase
 
toArray() - Method in class org.archive.crawler.settings.ListType
 
toArray(Object[]) - Method in class org.archive.crawler.settings.ListType
 
toeEnded() - Method in class org.archive.crawler.framework.CrawlController
Note that a ToeThread ended, possibly completing the crawl-stop.
toePaused() - Method in class org.archive.crawler.framework.CrawlController
Note that a ToeThread reached paused condition, possibly completing the crawl-pause.
ToePool - Class in org.archive.crawler.framework
A collection of ToeThreads.
ToePool(CrawlController) - Constructor for class org.archive.crawler.framework.ToePool
Constructor.
ToeThread - Class in org.archive.crawler.framework
One "worker thread"; asks for CrawlURIs, processes them, repeats unless told otherwise.
ToeThread(ToePool, int) - Constructor for class org.archive.crawler.framework.ToeThread
Create a ToeThread
TOKENIZED_PREFIX - Static variable in interface org.archive.io.arc.ARCConstants
Tokenized field prefix.
TooManyHopsDecideRule - Class in org.archive.crawler.deciderules
Rule REJECTs any CrawlURIs whose total number of hops (length of the hopsPath string, traversed links of any type) is over a threshold.
TooManyHopsDecideRule(String) - Constructor for class org.archive.crawler.deciderules.TooManyHopsDecideRule
Usual constructor.
TooManyPathSegmentsDecideRule - Class in org.archive.crawler.deciderules
Rule REJECTs any CrawlURIs whose total number of path-segments (as indicated by the count of '/' characters not including the first '//') is over a given threshold.
TooManyPathSegmentsDecideRule(String) - Constructor for class org.archive.crawler.deciderules.TooManyPathSegmentsDecideRule
Usual constructor.
topLevelModules() - Method in class org.archive.crawler.settings.CrawlerSettings
 
toString() - Method in class org.archive.crawler.datamodel.CandidateURI
 
toString() - Method in class org.archive.crawler.datamodel.CrawlHost
 
toString() - Method in class org.archive.crawler.datamodel.CrawlServer
 
toString() - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
toString() - Method in class org.archive.crawler.framework.CrawlScope
 
toString() - Method in class org.archive.crawler.framework.Filter
 
toString() - Method in class org.archive.crawler.settings.Constraint.FailedCheck
Returns a human readeable string for the failed check.
toString() - Method in class org.archive.crawler.settings.TextField
 
toString() - Method in class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
Converts this LumpyString to a String.
toString() - Method in exception org.archive.io.arc.ARCReader.RecoverableIOException
 
toString() - Method in class org.archive.io.arc.ARCRecordMetaData
 
toString() - Method in class org.archive.io.CharSubSequence
 
toString() - Method in class org.archive.io.SinkHandlerLogRecord
 
toString() - Method in class org.archive.net.UURI
Override to cache result
toString() - Method in class org.archive.util.PaddingStringBuffer
 
toString() - Method in class org.archive.util.ProcessUtils.ProcessResult
 
totalBytes - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
totalBytesWritten() - Method in class org.archive.crawler.admin.StatisticsTracker
 
totalBytesWritten() - Method in interface org.archive.crawler.framework.Frontier
Total number of bytes contained in all URIs that have been processed.
totalBytesWritten() - Method in interface org.archive.crawler.framework.StatisticsTracking
Returns the total number of uncompressed bytes written to disk.
totalBytesWritten() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
totalBytesWritten() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
totalCount() - Method in class org.archive.crawler.admin.StatisticsTracker
 
totalCount() - Method in interface org.archive.crawler.framework.StatisticsTracking
 
totalKBPerSec - Variable in class org.archive.crawler.admin.StatisticsTracker
 
totalProcessedBytes - Variable in class org.archive.crawler.admin.StatisticsTracker
 
totalProcessedBytes - Variable in class org.archive.crawler.frontier.AbstractFrontier
Used when bandwidth constraint are used.
totals - Variable in class org.archive.util.Histotable
 
TRAILING_ESCAPED_SPACE - Static variable in class org.archive.net.UURIFactory
 
TransclusionDecideRule - Class in org.archive.crawler.deciderules
Rule ACCEPTs any CrawlURIs whose path-from-seed ('hopsPath' -- see CandidateURI.getPathFromSeed()) ends with at least one, but not more than, the given number of non-navlink ('L') hops.
TransclusionDecideRule(String) - Constructor for class org.archive.crawler.deciderules.TransclusionDecideRule
Usual constructor.
TransclusionFilter - Class in org.archive.crawler.filter
Filter which accepts CandidateURI/CrawlURI instances which contain more than zero but fewer than max-trans-hops entries at the end of their discovery path.
TransclusionFilter(String) - Constructor for class org.archive.crawler.filter.TransclusionFilter
 
transform(Object) - Method in class org.archive.crawler.scope.SeedFileIterator
 
transform(Object) - Method in class org.archive.util.iterator.RegexpLineIterator
Loads next item into lookahead spot, if available.
transform(Object) - Method in class org.archive.util.iterator.TransformingIteratorWrapper
 
TRANSFORMED_HOST_DELIM - Static variable in class org.archive.util.SURT
 
TransformingIteratorWrapper - Class in org.archive.util.iterator
Superclass for Iterators which transform and/or filter results from a wrapped Iterator.
TransformingIteratorWrapper() - Constructor for class org.archive.util.iterator.TransformingIteratorWrapper
 
transitiveAccepts(Object) - Method in class org.archive.crawler.scope.BroadScope
 
transitiveAccepts(Object) - Method in class org.archive.crawler.scope.ClassicScope
 
transitiveAccepts(Object) - Method in class org.archive.crawler.scope.DomainScope
 
transitiveAccepts(Object) - Method in class org.archive.crawler.scope.HostScope
 
transitiveAccepts(Object) - Method in class org.archive.crawler.scope.PathScope
 
transitiveAccepts(Object) - Method in class org.archive.crawler.scope.RefinedScope
 
transitiveFilter - Variable in class org.archive.crawler.scope.DomainScope
 
transitiveFilter - Variable in class org.archive.crawler.scope.HostScope
 
transitiveFilter - Variable in class org.archive.crawler.scope.PathScope
 
transitiveFilter - Variable in class org.archive.crawler.scope.RefinedScope
 
TRIMMED_ENTRY_TRAILING_COMMENT - Static variable in class org.archive.util.iterator.RegexpLineIterator
 
trimToMax(int) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
If necessary, trims this string to a maximum length.
TRUE_FALSE_LEGAL_VALUES - Static variable in class org.archive.configuration.Configuration
 
truncate(String) - Static method in class org.archive.util.MimetypeUtils
Truncate passed mimetype.
TRUNCATION_REGEX - Static variable in class org.archive.util.MimetypeUtils
Truncation regex.
Type - Class in org.archive.crawler.settings
Interface implemented by all element types.
Type(String, Object) - Constructor for class org.archive.crawler.settings.Type
Creates a new instance of Type.
TYPE - Static variable in class org.archive.util.JmxUtils
 
TYPE_KEY - Static variable in class org.archive.configuration.Configuration
 

U

unbindObjectName(Context, ObjectName) - Static method in class org.archive.util.JndiUtils
 
UNCALCULATED - Static variable in class org.archive.crawler.datamodel.CrawlURI
 
unescape(String) - Static method in class org.archive.util.JavaLiterals
 
UnitCostAssignmentPolicy - Class in org.archive.crawler.frontier
A CostAssignment policy that uses a constant value of 1 for all CrawlURIs.
UnitCostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.UnitCostAssignmentPolicy
 
unpause() - Method in interface org.archive.crawler.framework.Frontier
Resumes the release of URIs to crawl, allowing worker ToeThreads to proceed.
unpause() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
unpause() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
unpeek() - Method in class org.archive.crawler.frontier.WorkQueue
Forgive the peek, allowing a subsequent peek to return a different item.
unpeek() - Method in class org.archive.queue.MemQueue
 
unpeek() - Method in interface org.archive.queue.Queue
Releases queue from the obligation to return in the next peek()/dequeue() the same object as returned by any previous peek().
unregisterHeritrix(Heritrix) - Static method in class org.archive.crawler.Heritrix
 
unregisterMBean() - Method in class org.archive.crawler.admin.CrawlJob
 
unregisterMBean(MBeanServer, String, String) - Static method in class org.archive.crawler.Heritrix
 
unregisterMBean(MBeanServer, ObjectName) - Static method in class org.archive.crawler.Heritrix
 
unregisterValueErrorHandler(ValueErrorHandler) - Method in class org.archive.crawler.settings.SettingsHandler
Unregister an instance of ValueErrorHandler.
unsetAttribute(CrawlerSettings, String) - Method in class org.archive.crawler.settings.ComplexType
Unset an attribute on a per host level.
unzip(File, File) - Static method in class org.archive.crawler.util.IoUtils
Use ant to unjar.
unzip(File, File, boolean) - Static method in class org.archive.crawler.util.IoUtils
Use ant to unjar.
update(CrawlURI, boolean, long) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Update CrawlURI that has completed processing.
update(CrawlURI, boolean, long, boolean) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Update CrawlURI that has completed processing.
update(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Update the given CrawlURI, which should already be present.
updateGeneration(String) - Method in class org.archive.crawler.processor.CrawlMapper
Close and mark as finished all existing diversion logs, and arrange for new logs to use the new generation prefix.
updateRecoveryPaths(File, SettingsHandler, String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
updateRobots(CrawlURI) - Method in class org.archive.crawler.datamodel.CrawlServer
Update the robots exclusion policy.
uri - Variable in class org.archive.crawler.settings.ComplexType.Context
 
URI_HEX_ENCODING - Static variable in class org.archive.net.UURIFactory
First percent sign in string followed by two hex chars.
URI_SPLITTER - Static variable in class org.archive.util.SURT
 
UriErrorFormatter - Class in org.archive.crawler.io
Formatter for 'uri-errors.log', of URIs so malformed they could not be instantiated.
UriErrorFormatter() - Constructor for class org.archive.crawler.io.UriErrorFormatter
 
uriErrors - Variable in class org.archive.crawler.framework.CrawlController
Special log for URI format problems, wherever they may occur.
URIListRegExpFilter - Class in org.archive.crawler.filter
Compares passed object -- a CrawlURI, UURI, or String -- against regular expressions, accepting matches.
URIListRegExpFilter(String) - Constructor for class org.archive.crawler.filter.URIListRegExpFilter
 
uriProcessing - Variable in class org.archive.crawler.framework.CrawlController
Crawl progress logger.
UriProcessingFormatter - Class in org.archive.crawler.io
Formatter for 'crawl.log'.
UriProcessingFormatter() - Constructor for class org.archive.crawler.io.UriProcessingFormatter
 
URIRegExpFilter - Class in org.archive.crawler.filter
Compares passed object -- a CrawlURI, UURI, or String -- against a regular expression, accepting matches.
URIRegExpFilter(String) - Constructor for class org.archive.crawler.filter.URIRegExpFilter
 
URIRegExpFilter(String, String) - Constructor for class org.archive.crawler.filter.URIRegExpFilter
 
URIRegExpFilter(String, String, String) - Constructor for class org.archive.crawler.filter.URIRegExpFilter
 
uris - Variable in class org.archive.extractor.RegexpCSSLinkExtractor
 
UriUniqFilter - Interface in org.archive.crawler.datamodel
A UriUniqFilter passes URI objects to a destination (receiver) if the passed URI object has not been previously seen.
UriUniqFilter.HasUriReceiver - Interface in org.archive.crawler.datamodel
URIs that have not been seen before 'visit' this 'Visitor'.
URL_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for url field.
URL_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header URL field.
usage() - Method in class org.archive.crawler.CommandLineParser
Print usage then exit.
usage(int) - Method in class org.archive.crawler.CommandLineParser
Print usage then exit.
usage(String, int) - Method in class org.archive.crawler.CommandLineParser
Print message then usage then exit.
UTF8 - Static variable in class org.archive.io.arc.ARCWriter
 
UURI - Class in org.archive.net
Usable URI.
UURI() - Constructor for class org.archive.net.UURI
Shutdown access to default constructor.
UURI(String, boolean, String) - Constructor for class org.archive.net.UURI
 
UURI(UURI, UURI) - Constructor for class org.archive.net.UURI
 
UURI(String, boolean) - Constructor for class org.archive.net.UURI
 
UURIFactory - Class in org.archive.net
Factory that returns UURIs.
UURIFactoryTest - Class in org.archive.net
Test UURIFactory for proper UURI creation across variety of important/tricky cases.
UURIFactoryTest() - Constructor for class org.archive.net.UURIFactoryTest
 
UURITest - Class in org.archive.net
 
UURITest() - Constructor for class org.archive.net.UURITest
 

V

valence - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Number of simultanious connections permitted to this host.
VALID_DF_OUTPUT - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
 
validate() - Method in class org.archive.io.arc.ARCReader
Validate the arcFile.
validate(int) - Method in class org.archive.io.arc.ARCReader
Validate the arcFile.
validate(char[], BitSet) - Method in class org.archive.net.LaxURI
 
validate(char[], int, int, BitSet) - Method in class org.archive.net.LaxURI
 
validateMetaLine(String) - Method in class org.archive.io.arc.ARCWriter
Test that the metadata line is valid before writing.
VALIDITY_STAMP_FILENAME - Static variable in class org.archive.crawler.datamodel.Checkpoint
Name of file written with timestamp into valid checkpoints.
validityCheck(UURI) - Method in class org.archive.net.UURIFactory
Check the generated UURI.
validRobots - Variable in class org.archive.crawler.datamodel.CrawlServer
 
value - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
ValueErrorHandler - Interface in org.archive.crawler.settings
If a ValueErrorHandler is registered with a SettingsHandler, only constraints with level Level.SEVERE will throw an InvalidAttributeValueException.
values - Variable in class org.archive.util.fingerprint.MemLongFPSet
 
VERSION_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header version field.
VIDEO - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
VIDEO - Static variable in class org.archive.crawler.filter.FilePatternFilter
 
VIDEO_PATTERNS - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
VIDEO_PATTERNS - Static variable in class org.archive.crawler.filter.FilePatternFilter
 

W

WagCostAssignmentPolicy - Class in org.archive.crawler.frontier
A CostAssignmentPolicy based on some wild guesses of kinds of URIs that should be deferred into the (potentially never-crawled) future.
WagCostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.WagCostAssignmentPolicy
 
WaitEvaluator - Class in org.archive.crawler.postprocessor
A processor that determines when a URI should be revisited next.
WaitEvaluator(String) - Constructor for class org.archive.crawler.postprocessor.WaitEvaluator
Constructor
WaitEvaluator(String, String, Long, Long, Long, Double, Double) - Constructor for class org.archive.crawler.postprocessor.WaitEvaluator
Constructor
wakeQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Wake any queues sitting in the snoozed queue whose time has come.
wakeUpTime - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Time (in milliseconds) when each URI 'slot' becomes available again.
warnHandle(Throwable, String) - Static method in class org.archive.util.DevUtils
Log a warning message to the logger 'org.archive.util.DevUtils' made of the passed 'note' and a stack trace based off passed exception.
WebappLifecycle - Class in org.archive.crawler
Calls start and stop of Heritrix when Heritrix is bundled as a webapp.
WebappLifecycle() - Constructor for class org.archive.crawler.WebappLifecycle
 
WHITESPACE - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
WHITESPACE - Static variable in class org.archive.crawler.extractor.ExtractorJS
 
WHITESPACE - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
WHITESPACE - Static variable in class org.archive.extractor.RegexpJSLinkExtractor
 
workaroundCopyFile(File, File) - Static method in class org.archive.util.FileUtils
 
WorkQueue - Class in org.archive.crawler.frontier
A single queue of related URIs to visit, grouped by a classKey (typically "hostname:port" or similar)
WorkQueue(String) - Constructor for class org.archive.crawler.frontier.WorkQueue
 
workQueueDataOnDisk() - Method in class org.archive.crawler.frontier.BdbFrontier
 
workQueueDataOnDisk() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Returns true if the WorkQueue implementation of this Frontier stores its workload on disk instead of relying on serialization mechanisms.
WorkQueueFrontier - Class in org.archive.crawler.frontier
A common Frontier base using several queues to hold pending URIs.
WorkQueueFrontier(String, String) - Constructor for class org.archive.crawler.frontier.WorkQueueFrontier
Create the CommonFrontier
wrapInputStreamWithHttpRecord(File, String, InputStream, String) - Static method in class org.archive.util.HttpRecorder
Record the input stream for later playback by an extractor, etc.
write(CrawlURI, int, InputStream, String) - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
write(String, String, String, long, int, ByteArrayOutputStream) - Method in class org.archive.io.arc.ARCWriter
Write a record to ARC file.
write(String, String, String, long, int, InputStream) - Method in class org.archive.io.arc.ARCWriter
Write a record to ARC file.
write(String, String, String, long, int, ReplayInputStream) - Method in class org.archive.io.arc.ARCWriter
Write a record to ARC file.
write(int) - Method in class org.archive.io.RandomAccessOutputStream
 
write(byte[], int, int) - Method in class org.archive.io.RandomAccessOutputStream
 
write(byte[]) - Method in class org.archive.io.RandomAccessOutputStream
 
write(int) - Method in class org.archive.io.RecordingOutputStream
 
write(byte[]) - Method in class org.archive.io.RecordingOutputStream
 
write(byte[], int, int) - Method in class org.archive.io.RecordingOutputStream
 
write(int) - Method in class org.archive.io.RecyclingFastBufferedOutputStream
 
write(byte[], int, int) - Method in class org.archive.io.RecyclingFastBufferedOutputStream
 
writeAttribute(String, String, ComplexType, CrawlerSettings, String) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Write out attribute.
writeCrawlReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeDns(CrawlURI) - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
writeEscapedForHTML(String, JspWriter) - Static method in class org.archive.util.TextUtils
Utility method for writing a (potentially large) String to a JspWriter, escaping it for HTML display, withouth constructing another large String of the whole content.
writeFrontierReport(String, PrintWriter) - Method in class org.archive.crawler.admin.CrawlJob
Write the requested frontier report to the given PrintWriter
writeFrontierReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
Write the Frontier's 'nonempty' report (if available)
writeHostsReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeHttp(CrawlURI) - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
writeManifestReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeMimetypesReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeNewOrderFile(ComplexType, CrawlerSettings, HttpServletRequest, boolean) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
This methods updates a ComplexType with information passed to it by a HttpServletRequest.
writeObjectToFile(Object, File) - Static method in class org.archive.crawler.util.CheckpointUtils
Utility function to serialize an object to a file in current checkpoint dir.
writeObjectToFile(Object, String, File) - Static method in class org.archive.crawler.util.CheckpointUtils
 
writeProcessorsReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeRandomHTTPRecord(ARCWriter, int) - Method in class org.archive.io.arc.ARCWriterTest
 
writeReader(Reader, Writer) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Print complete seeds list on passed in PrintWriter.
writeRecord(ARCWriter, String, String, int, ByteArrayOutputStream) - Static method in class org.archive.io.arc.ARCWriterTest
 
writeReportFile(String, String) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeReportToString(Reporter, String) - Static method in class org.archive.util.ArchiveUtils
Compose the requested report into a String.
writeResponseCodeReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeSeedsReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeSettingsObject(CrawlerSettings) - Method in class org.archive.crawler.settings.SettingsHandler
Write the CrawlerSettings object to persistent storage.
writeSettingsObject(CrawlerSettings) - Method in class org.archive.crawler.settings.XMLSettingsHandler
 
writeSettingsObject(CrawlerSettings, File) - Method in class org.archive.crawler.settings.XMLSettingsHandler
Write a CrawlerSettings object to a specified file.
writeThreadsReport(String, PrintWriter) - Method in class org.archive.crawler.admin.CrawlJob
Write the requested threads report to the given PrintWriter
writeValidity() - Method in class org.archive.crawler.framework.Checkpointer
 

X

XML_ATTRIBUTE_CLASS - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ATTRIBUTE_FROM - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ATTRIBUTE_NAME - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ATTRIBUTE_TO - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_AUDIENCE - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_CONTENTMATCHES - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_CONTROLLER - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_DATE - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_DESCRIPTION - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_LIMITS - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_META - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_NAME - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_NEW_OBJECT - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_OBJECT - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_OPERATOR - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_ORGANIZATION - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_PORTNUMBER - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_REFERENCE - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_REFINEMENT - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_REFINEMENTLIST - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_TIMESPAN - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_URIMATCHES - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ROOT_HOST_SETTINGS - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ROOT_ORDER - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ROOT_REFINEMENT - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_SCHEMA - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_URI_EXTRACTOR - Static variable in class org.archive.crawler.extractor.ExtractorXML
 
XMLSettingsHandler - Class in org.archive.crawler.settings
A SettingsHandler which uses XML files as persistent storage.
XMLSettingsHandler(File) - Constructor for class org.archive.crawler.settings.XMLSettingsHandler
Create a new XMLSettingsHandler object.
XMLSettingsHandlerTest - Class in org.archive.crawler.settings
Tests the handling of settings files.
XMLSettingsHandlerTest() - Constructor for class org.archive.crawler.settings.XMLSettingsHandlerTest
 

Z

ZeroCostAssignmentPolicy - Class in org.archive.crawler.frontier
CostAssignmentPolicy considering all URIs costless -- essentially disabling budgetting features.
ZeroCostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.ZeroCostAssignmentPolicy
 
zeroPadInteger(int) - Static method in class org.archive.util.ArchiveUtils
 

A B C D E F G H I J K L M N O P Q R S T U V W X Z

Copyright © 2003-2006 Internet Archive. All Rights Reserved.