JavaScript is disabled on your browser.
Skip navigation links
Package
Class
Use
Tree
Deprecated
Index
Help
Prev
Next
Frames
No Frames
All Classes
A
C
D
F
G
H
I
K
L
M
N
O
P
R
S
T
U
V
A
addTitle
- Variable in class org.openimaj.web.readability.
Readability
Anchor
- Class in
org.openimaj.web.readability
Class to represent a simple HTML anchor tag.
Anchor(String, String)
- Constructor for class org.openimaj.web.readability.
Anchor
Default constructor with text and a href.
article_contentType
- Variable in class org.openimaj.web.readability.
Readability
article_date
- Variable in class org.openimaj.web.readability.
Readability
article_date_string
- Variable in class org.openimaj.web.readability.
Readability
articleContent
- Variable in class org.openimaj.web.readability.
Readability
articleTitle
- Variable in class org.openimaj.web.readability.
Readability
augmentDocument(Document)
- Static method in class org.openimaj.web.readability.
Readability
Iterates through all the ELEMENT nodes in a document and gives them ids if they don't already have them.
C
clean(Element, String)
- Method in class org.openimaj.web.readability.
Readability
Clean a node of all elements of type "tag".
cleanConditionally(Element, String)
- Method in class org.openimaj.web.readability.
Readability
Clean an element of all tags of type "tag" if they look fishy.
cleanHeaders(Element)
- Method in class org.openimaj.web.readability.
Readability
Clean out spurious headers from an Element.
cleanStyles()
- Method in class org.openimaj.web.readability.
Readability
cleanStyles(Element)
- Method in class org.openimaj.web.readability.
Readability
Remove the style attribute on every e and under.
D
dbg(String)
- Method in class org.openimaj.web.readability.
Readability
debug
- Variable in class org.openimaj.web.readability.
Readability
divToPElementsRe
- Static variable in class org.openimaj.web.readability.
Readability.Regexps
document
- Variable in class org.openimaj.web.readability.
Readability
F
findArticleDate()
- Method in class org.openimaj.web.readability.
Readability
findArticleEncoding()
- Method in class org.openimaj.web.readability.
Readability
findArticleTitle()
- Method in class org.openimaj.web.readability.
Readability
Get the article title.
findChildNodeIndex(Node, Node)
- Method in class org.openimaj.web.readability.
Readability
findChildNodesWithName(Node, String)
- Method in class org.openimaj.web.readability.
Readability
flags
- Variable in class org.openimaj.web.readability.
Readability
G
getAllLinks()
- Method in class org.openimaj.web.readability.
Readability
getArticleContentType()
- Method in class org.openimaj.web.readability.
Readability
getArticleDate()
- Method in class org.openimaj.web.readability.
Readability
getArticleDateString()
- Method in class org.openimaj.web.readability.
Readability
getArticleHTML()
- Method in class org.openimaj.web.readability.
Readability
getArticleHTML_DOM()
- Method in class org.openimaj.web.readability.
Readability
getArticleImages()
- Method in class org.openimaj.web.readability.
Readability
getArticleLinks()
- Method in class org.openimaj.web.readability.
Readability
getArticleSubheadings()
- Method in class org.openimaj.web.readability.
Readability
getArticleText()
- Method in class org.openimaj.web.readability.
Readability
getArticleTextMapping(TreeWalker, List<Readability.MappingNode>)
- Method in class org.openimaj.web.readability.
Readability
getArticleTextMapping()
- Method in class org.openimaj.web.readability.
Readability
Get the mapping between bits of text in the dom & their xpaths
getArticleTitle()
- Method in class org.openimaj.web.readability.
Readability
getBody()
- Method in class org.openimaj.web.readability.
Readability
Equivalent to document.body in JS
getCharCount(Element, String)
- Method in class org.openimaj.web.readability.
Readability
Get the number of times a string s appears in the node e.
getCharCount(Element)
- Method in class org.openimaj.web.readability.
Readability
getClassWeight(Element)
- Method in class org.openimaj.web.readability.
Readability
Get an elements class/id weight.
getHref()
- Method in class org.openimaj.web.readability.
Anchor
getId()
- Method in class org.openimaj.web.readability.
Readability.MappingNode
getInnerHTML(Node)
- Method in class org.openimaj.web.readability.
Readability
getInnerText(Element, boolean)
- Method in class org.openimaj.web.readability.
Readability
Get the inner text of a node - cross browser compatibly.
getInnerText(Element)
- Method in class org.openimaj.web.readability.
Readability
getInnerTextSep(Node)
- Method in class org.openimaj.web.readability.
Readability
getLinkDensity(Element)
- Method in class org.openimaj.web.readability.
Readability
Get the density of links as a percentage of the content This is the amount of text that is inside a link divided by the total text in the node.
getReadability(String)
- Static method in class org.openimaj.web.readability.
Readability
Convenience method to build a
Readability
instance from an html string.
getReadability(String, boolean)
- Static method in class org.openimaj.web.readability.
Readability
Convenience method to build a
Readability
instance from an html string.
getText()
- Method in class org.openimaj.web.readability.
Anchor
getText()
- Method in class org.openimaj.web.readability.
Readability.MappingNode
getTitle()
- Method in class org.openimaj.web.readability.
Readability
grabArticle()
- Method in class org.openimaj.web.readability.
Readability
grabArticle - Using a variety of metrics (content score, classname, element types), find the content that is most likely to be the stuff a user wants to read.
H
hasContent()
- Method in class org.openimaj.web.readability.
Readability
I
init()
- Method in class org.openimaj.web.readability.
Readability
Runs readability.
initializeNode(Element)
- Method in class org.openimaj.web.readability.
Readability
Initialize a node with the readability object.
K
killBreaks(Element)
- Method in class org.openimaj.web.readability.
Readability
Remove extraneous break tags from a node.
killBreaksRe
- Static variable in class org.openimaj.web.readability.
Readability.Regexps
L
likelySubheadCandidateRe
- Static variable in class org.openimaj.web.readability.
Readability.Regexps
LINK_DENSITY_THRESHOLD
- Static variable in class org.openimaj.web.readability.
Readability
Threshold for removing elements with lots of links
M
main(String[])
- Static method in class org.openimaj.web.readability.
Readability
Testing
MappingNode(String, String)
- Constructor for class org.openimaj.web.readability.
Readability.MappingNode
match(String, String)
- Method in class org.openimaj.web.readability.
Readability
Javascript-like String.match
N
negativeRe
- Static variable in class org.openimaj.web.readability.
Readability.Regexps
nodeToString(Node)
- Method in class org.openimaj.web.readability.
Readability
nodeToString(Node, boolean)
- Static method in class org.openimaj.web.readability.
Readability
normalizeRe
- Static variable in class org.openimaj.web.readability.
Readability.Regexps
O
okMaybeItsACandidateRe
- Static variable in class org.openimaj.web.readability.
Readability.Regexps
org.openimaj.web.readability
- package org.openimaj.web.readability
P
parseDate()
- Method in class org.openimaj.web.readability.
Readability
positiveRe
- Static variable in class org.openimaj.web.readability.
Readability.Regexps
prepArticle(Element)
- Method in class org.openimaj.web.readability.
Readability
Prepare the article node for display.
prepDocument()
- Method in class org.openimaj.web.readability.
Readability
Prepare the HTML document for readability to scrape it.
R
Readability
- Class in
org.openimaj.web.readability
Class for extracting the "content" from web-pages, and ignoring adverts, etc.
Readability(Document)
- Constructor for class org.openimaj.web.readability.
Readability
Construct with the given document.
Readability(Document, boolean)
- Constructor for class org.openimaj.web.readability.
Readability
Construct with the given document.
Readability(Document, boolean, boolean)
- Constructor for class org.openimaj.web.readability.
Readability
Construct with the given document.
Readability.MappingNode
- Class in
org.openimaj.web.readability
Readability.Regexps
- Class in
org.openimaj.web.readability
Regular expressions for different types of content
Regexps()
- Constructor for class org.openimaj.web.readability.
Readability.Regexps
removeChildren(Node)
- Method in class org.openimaj.web.readability.
Readability
removeComments(Node)
- Method in class org.openimaj.web.readability.
Readability
replaceBrsRe
- Static variable in class org.openimaj.web.readability.
Readability.Regexps
replaceFontsRe
- Static variable in class org.openimaj.web.readability.
Readability.Regexps
S
search(String, String)
- Method in class org.openimaj.web.readability.
Readability
Javascript-like String.search
setAnchorText(String)
- Method in class org.openimaj.web.readability.
Anchor
Set the anchor text
setHref(String)
- Method in class org.openimaj.web.readability.
Anchor
Set the href
stringToNode(String)
- Method in class org.openimaj.web.readability.
Readability
T
titleSeparatorRe
- Static variable in class org.openimaj.web.readability.
Readability.Regexps
toString()
- Method in class org.openimaj.web.readability.
Anchor
toString()
- Method in class org.openimaj.web.readability.
Readability.MappingNode
trimRe
- Static variable in class org.openimaj.web.readability.
Readability.Regexps
U
unlikelyCandidatesRe
- Static variable in class org.openimaj.web.readability.
Readability.Regexps
V
videoRe
- Static variable in class org.openimaj.web.readability.
Readability.Regexps
A
C
D
F
G
H
I
K
L
M
N
O
P
R
S
T
U
V
Skip navigation links
Package
Class
Use
Tree
Deprecated
Index
Help
Prev
Next
Frames
No Frames
All Classes