|
||||||||||
| PREV NEXT | FRAMES NO FRAMES | |||||||||
| Packages that use CanonicalizationRule | |
|---|---|
| org.archive.crawler.url.canonicalize | |
| Uses of CanonicalizationRule in org.archive.crawler.url.canonicalize |
|---|
| Classes in org.archive.crawler.url.canonicalize that implement CanonicalizationRule | |
|---|---|
class |
BaseRule
Base of all rules applied canonicalizing a URL that are configurable via the Heritrix settings system. |
class |
FixupQueryStr
Strip any trailing question mark. |
class |
LowercaseRule
Lowercases the URL. |
class |
RegexRule
General conversion rule. |
class |
StripExtraSlashes
|
class |
StripSessionCFIDs
Strip cold fusion session ids. |
class |
StripSessionIDs
Strip known session ids. |
class |
StripUserinfoRule
Strip any 'userinfo' found on http/https URLs. |
class |
StripWWWNRule
Strip any 'www[0-9]*' found on http/https URLs IF they have some path/query component (content after third slash). |
class |
StripWWWRule
Strip any 'www' found on http/https URLs, IF they have some path/query component (content after third slash). |
|
||||||||||
| PREV NEXT | FRAMES NO FRAMES | |||||||||