|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.archive.crawler.frontier.KeyedQueue
public class KeyedQueue
Ordered collection of work items with the same "classKey". The collection itself has a state, which may reflect where it is stored or what can be done with the contained items.
For easy access to several locations in the main collection, it is held between 2 data structures: a top stack and a bottom queue. (These in turn may be disk-backed.)
Also maintains a collection 'off to the side' of 'frozen' items.
About KeyedQueue states:
All KeyedQueues begin INACTIVE. A call to activate() will render them READY (if not empty of eligible URIs) or EMPTY otherwise.
A noteInProcess() puts the KeyedQueue into IN_PROCESS state. A matching noteProcessDone() puts the KeyedQueue bank into READY or EMPTY.
A freeze() may be issued to any READY or EMPTY queue to put it into FROZEN state. Only an unfreeze() will move the queue to INACTIVE state.
A deactivate() may be issued to any READY or EMPTY queue to put it into INACTIVE state.
A snooze() may be issued to any READY or EMPTY queue to put it into SNOOZED state.
A discard() may be issued to any EMPTY queue to put it into the DISCARDED state. A queue never leaves the discarded state; if a queue of its hostname is needed again, a new one is created.
| Field Summary | |
|---|---|
(package private) java.lang.String |
classKey
common string 'key' of included items (typically hostname) |
(package private) CrawlServer |
crawlServer
Associated CrawlServer instance, held to keep CrawlServer from being cache-flushed |
(package private) TieredQueue |
innerQ
|
(package private) java.util.ArrayList |
inProcessItems
items in progress |
(package private) int |
inProcessLoad
|
(package private) java.lang.Object |
state
current state; see above values |
(package private) int |
valence
maximum simultaneous plain URIs to allow in-process at a time |
(package private) long |
wakeTime
ms time to wake, if snoozed |
| Fields inherited from interface org.archive.crawler.frontier.URIWorkQueue |
|---|
BUSY, DISCARDED, EMPTY, FROZEN, INACTIVE, READY, SNOOZED |
| Constructor Summary | |
|---|---|
KeyedQueue(java.lang.String key,
CrawlServer server,
java.io.File scratchDir,
int maxMemLoad)
|
|
| Method Summary | |
|---|---|
void |
activate()
Move queue from INACTIVE to ACTIVE state |
boolean |
checkEmpty()
Update READY/EMPTY state after preceding queue edit operations. |
void |
deactivate()
Move queue from READY or EMPTY state to INACTIVE |
long |
deleteMatchedItems(org.apache.commons.collections.Predicate matcher)
Delete items matching the supplied criterion. |
CrawlURI |
dequeue()
Remove an item in the default manner |
void |
discard()
Move queue from READY or EMPTY to DISCARDED |
void |
enqueue(CrawlURI curi)
Add an item in the default manner |
boolean |
equals(java.lang.Object o)
The only equals() that matters for KeyedQueues is object equivalence. |
void |
freeze()
Move queue from READY or EMPTY state to FROZEN |
java.lang.String |
getClassKey()
The 'classKey' identifier common to items in this queue |
java.util.List |
getInProcessItems()
|
java.util.Iterator |
getIterator(boolean inCacheOnly)
Iterate over all available (non-frozen) items. |
java.lang.String |
getLastDequeued()
|
java.lang.String |
getLastQueued()
|
java.lang.String |
getSortFallback()
To ensure total and consistent ordering when in scheduled order, a fallback sort criterion |
java.lang.Object |
getState()
|
long |
getWakeTime()
|
boolean |
isDiscardable()
May this KeyedQueue be completely discarded. |
boolean |
isEmpty()
|
long |
length()
|
void |
noteInProcess(CrawlURI o)
Note that the given item is 'in process'; move queue from READY or EMPTY to IN_PROCESS and remember in-process item. |
void |
noteProcessDone(CrawlURI o)
Note that the given item's processing has completed; forget the in-process item and move queue from BUSY or READY to READY or EMPTY state if necessary |
CrawlURI |
peek()
|
void |
setMaximumMemoryLoad(int i)
|
void |
setValence(int v)
Set 'valence', the number of simultaneous items to allow in process before becoming BUSY |
void |
setWakeTime(long w)
Should take care not to mutate this value while queue is inside a sorted queue. |
void |
snooze()
Move queue from READY or EMPTY state to SNOOZED |
void |
unfreeze()
Move queue from FROZEN state to INACTIVE |
void |
unpeek()
|
void |
wake()
Move queue from SNOOZED state to READY or EMPTY |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
CrawlServer crawlServer
long wakeTime
java.lang.String classKey
java.lang.Object state
int valence
java.util.ArrayList inProcessItems
int inProcessLoad
TieredQueue innerQ
| Constructor Detail |
|---|
public KeyedQueue(java.lang.String key,
CrawlServer server,
java.io.File scratchDir,
int maxMemLoad)
throws java.io.IOException
key - A unique identifier used to distingush files related to this
objects disk based data structures (will be a part of their
file name, must therefor be a legal filename).server - Server instance this queue is for.scratchDir - Directory where disk based data structures will be
created.maxMemLoad - Maximum number of items to keep in memory
java.io.IOException - When it fails to create disk based data structures.| Method Detail |
|---|
public java.lang.String getClassKey()
getClassKey in interface URIWorkQueuepublic java.lang.Object getState()
getState in interface URIWorkQueuepublic void activate()
activate in interface URIWorkQueuepublic void deactivate()
deactivate in interface URIWorkQueuepublic void freeze()
freeze in interface URIWorkQueuepublic void unfreeze()
unfreeze in interface URIWorkQueuepublic void snooze()
snooze in interface URIWorkQueuepublic void wake()
wake in interface URIWorkQueuepublic void discard()
discard in interface URIWorkQueuepublic void noteInProcess(CrawlURI o)
noteInProcess in interface URIWorkQueueo - public void noteProcessDone(CrawlURI o)
noteProcessDone in interface URIWorkQueueo - public boolean checkEmpty()
checkEmpty in interface URIWorkQueuepublic long getWakeTime()
getWakeTime in interface URIWorkQueuepublic void setWakeTime(long w)
setWakeTime in interface URIWorkQueuew - time to wake, when snoozedpublic java.lang.String getSortFallback()
getSortFallback in interface URIWorkQueuepublic boolean equals(java.lang.Object o)
equals in class java.lang.ObjectObject.equals(java.lang.Object)public void enqueue(CrawlURI curi)
enqueue in interface URIWorkQueuecuri - Queue.enqueue(java.lang.Object)public boolean isEmpty()
isEmpty in interface URIWorkQueueQueue.isEmpty()public CrawlURI dequeue()
dequeue in interface URIWorkQueueQueue.dequeue()public long length()
length in interface URIWorkQueueQueue.length()public java.util.Iterator getIterator(boolean inCacheOnly)
getIterator in interface URIWorkQueueinCacheOnly -
Queue.getIterator(boolean)public long deleteMatchedItems(org.apache.commons.collections.Predicate matcher)
deleteMatchedItems in interface URIWorkQueuematcher -
Queue.deleteMatchedItems(org.apache.commons.collections.Predicate)public java.util.List getInProcessItems()
getInProcessItems in interface URIWorkQueuepublic boolean isDiscardable()
isDiscardable in interface URIWorkQueuepublic void setValence(int v)
URIWorkQueue
setValence in interface URIWorkQueuev - public java.lang.String getLastQueued()
getLastQueued in interface URIWorkQueuepublic java.lang.String getLastDequeued()
getLastDequeued in interface URIWorkQueuepublic CrawlURI peek()
public void unpeek()
public void setMaximumMemoryLoad(int i)
i -
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||