org.archive.crawler.frontier
Interface URIWorkQueue

All Known Implementing Classes:
KeyedQueue

public interface URIWorkQueue

A single queue of related URIs to visit. Typically grouped by hostname:port.

Author:
gojomo

Field Summary
static java.lang.Object BUSY
          BUSY: on hold until one or more URIs in progress are finished
static java.lang.Object DISCARDED
          DISCARDED: discarded because empty (not irreversible)
static java.lang.Object EMPTY
          EMPTY: eligible to supply URIs, but without any to supply
static java.lang.Object FROZEN
          FROZEN: not considered as URI source until operator intervention
static java.lang.Object INACTIVE
          INACTIVE: not considered as URI source until activated by policy
static java.lang.Object READY
          READY: eligible and able to supply a new work URI on demand
static java.lang.Object SNOOZED
          SNOOZED: on hold until a specific time interval has passed
 
Method Summary
 void activate()
          Move queue from INACTIVE to ACTIVE state
 boolean checkEmpty()
          Update READY/EMPTY state after preceding queue edit operations.
 void deactivate()
          Move queue from READY or EMPTY state to INACTIVE
 long deleteMatchedItems(org.apache.commons.collections.Predicate matcher)
          Delete items matching the supplied criterion.
 CrawlURI dequeue()
          Remove an item in the default manner
 void discard()
          Move queue from READY or EMPTY to DISCARDED
 void enqueue(CrawlURI curi)
          Add an item in the default manner
 void freeze()
          Move queue from READY or EMPTY state to FROZEN
 java.lang.String getClassKey()
          The 'classKey' identifier common to items in this queue
 java.util.List getInProcessItems()
           
 java.util.Iterator getIterator(boolean inCacheOnly)
          Iterate over all available (non-frozen) items.
 java.lang.String getLastDequeued()
           
 java.lang.String getLastQueued()
           
 java.lang.String getSortFallback()
          To ensure total and consistent ordering when in scheduled order, a fallback sort criterion
 java.lang.Object getState()
           
 long getWakeTime()
           
 boolean isDiscardable()
          May this KeyedQueue be completely discarded.
 boolean isEmpty()
           
 long length()
           
 void noteInProcess(CrawlURI o)
          Note that the given item is 'in process'; move queue from READY or EMPTY to BUSY if appropriate and remember in-process item.
 void noteProcessDone(CrawlURI o)
          Note that the given item's processing has completed; forget the in-process item and move queue from BUSY to READY or EMPTY state if appropriate
 void setValence(int v)
          Set 'valence', the number of simultaneous items to allow in process before becoming BUSY
 void setWakeTime(long w)
           
 void snooze()
          Move queue from READY or EMPTY state to SNOOZED
 void unfreeze()
          Move queue from FROZEN state to INACTIVE
 void wake()
          Move queue from SNOOZED state to READY or EMPTY
 

Field Detail

INACTIVE

static final java.lang.Object INACTIVE
INACTIVE: not considered as URI source until activated by policy


READY

static final java.lang.Object READY
READY: eligible and able to supply a new work URI on demand


FROZEN

static final java.lang.Object FROZEN
FROZEN: not considered as URI source until operator intervention


BUSY

static final java.lang.Object BUSY
BUSY: on hold until one or more URIs in progress are finished


SNOOZED

static final java.lang.Object SNOOZED
SNOOZED: on hold until a specific time interval has passed


EMPTY

static final java.lang.Object EMPTY
EMPTY: eligible to supply URIs, but without any to supply


DISCARDED

static final java.lang.Object DISCARDED
DISCARDED: discarded because empty (not irreversible)

Method Detail

getClassKey

java.lang.String getClassKey()
The 'classKey' identifier common to items in this queue

Returns:
Object

getState

java.lang.Object getState()
Returns:
The state of this queue.

isEmpty

boolean isEmpty()
Returns:
Is this KeyedQueue empty of ready-to-try URIs. (NOTE: may still have 'frozen' off-to-side URIs.)

length

long length()
Returns:
Total number of available items. (Does not include any 'frozen' items.)
See Also:
Queue.length()

activate

void activate()
Move queue from INACTIVE to ACTIVE state


deactivate

void deactivate()
Move queue from READY or EMPTY state to INACTIVE


freeze

void freeze()
Move queue from READY or EMPTY state to FROZEN


unfreeze

void unfreeze()
Move queue from FROZEN state to INACTIVE


snooze

void snooze()
Move queue from READY or EMPTY state to SNOOZED


wake

void wake()
Move queue from SNOOZED state to READY or EMPTY


discard

void discard()
Move queue from READY or EMPTY to DISCARDED


checkEmpty

boolean checkEmpty()
Update READY/EMPTY state after preceding queue edit operations.

Returns:
true if state changed, false otherwise

noteInProcess

void noteInProcess(CrawlURI o)
Note that the given item is 'in process'; move queue from READY or EMPTY to BUSY if appropriate and remember in-process item.

Parameters:
o -

noteProcessDone

void noteProcessDone(CrawlURI o)
Note that the given item's processing has completed; forget the in-process item and move queue from BUSY to READY or EMPTY state if appropriate

Parameters:
o -

getInProcessItems

java.util.List getInProcessItems()
Returns:
The remembered items in process (set with noteInProgress()).

setValence

void setValence(int v)
Set 'valence', the number of simultaneous items to allow in process before becoming BUSY

Parameters:
v -

getWakeTime

long getWakeTime()
Returns:
time when queue should wake

setWakeTime

void setWakeTime(long w)
Parameters:
w - time to wake, when snoozed

getSortFallback

java.lang.String getSortFallback()
To ensure total and consistent ordering when in scheduled order, a fallback sort criterion

Returns:
Fallback sort.

enqueue

void enqueue(CrawlURI curi)
Add an item in the default manner

Parameters:
curi -

dequeue

CrawlURI dequeue()
Remove an item in the default manner

Returns:
Item removed.

getLastQueued

java.lang.String getLastQueued()
Returns:
the last enqueued URI; useful for assessing queue state.

getLastDequeued

java.lang.String getLastDequeued()
Returns:
the last dequeued URI; useful for assessing queue state.

isDiscardable

boolean isDiscardable()
May this KeyedQueue be completely discarded. It may be discarded only if empty of available and frozen items, and not SNOOZED or FROZEN (which implies state info which would be lost if discarded).

Returns:
True if discardable.

getIterator

java.util.Iterator getIterator(boolean inCacheOnly)
Iterate over all available (non-frozen) items.

Parameters:
inCacheOnly -
Returns:
An iterator.
See Also:
Queue.getIterator(boolean)

deleteMatchedItems

long deleteMatchedItems(org.apache.commons.collections.Predicate matcher)
Delete items matching the supplied criterion.

Parameters:
matcher -
Returns:
Count of items deleted.
See Also:
Queue.deleteMatchedItems(org.apache.commons.collections.Predicate)


Copyright © 2003-2005 Internet Archive. All Rights Reserved.