Skip navigation links
A C D F G H I K L M N O P R S T U V 

A

addTitle - Variable in class org.openimaj.web.readability.Readability
 
Anchor - Class in org.openimaj.web.readability
Class to represent a simple HTML anchor tag.
Anchor(String, String) - Constructor for class org.openimaj.web.readability.Anchor
Default constructor with text and a href.
article_contentType - Variable in class org.openimaj.web.readability.Readability
 
article_date - Variable in class org.openimaj.web.readability.Readability
 
article_date_string - Variable in class org.openimaj.web.readability.Readability
 
articleContent - Variable in class org.openimaj.web.readability.Readability
 
articleTitle - Variable in class org.openimaj.web.readability.Readability
 
augmentDocument(Document) - Static method in class org.openimaj.web.readability.Readability
Iterates through all the ELEMENT nodes in a document and gives them ids if they don't already have them.

C

clean(Element, String) - Method in class org.openimaj.web.readability.Readability
Clean a node of all elements of type "tag".
cleanConditionally(Element, String) - Method in class org.openimaj.web.readability.Readability
Clean an element of all tags of type "tag" if they look fishy.
cleanHeaders(Element) - Method in class org.openimaj.web.readability.Readability
Clean out spurious headers from an Element.
cleanStyles() - Method in class org.openimaj.web.readability.Readability
 
cleanStyles(Element) - Method in class org.openimaj.web.readability.Readability
Remove the style attribute on every e and under.

D

dbg(String) - Method in class org.openimaj.web.readability.Readability
 
debug - Variable in class org.openimaj.web.readability.Readability
 
divToPElementsRe - Static variable in class org.openimaj.web.readability.Readability.Regexps
 
document - Variable in class org.openimaj.web.readability.Readability
 

F

findArticleDate() - Method in class org.openimaj.web.readability.Readability
 
findArticleEncoding() - Method in class org.openimaj.web.readability.Readability
 
findArticleTitle() - Method in class org.openimaj.web.readability.Readability
Get the article title.
findChildNodeIndex(Node, Node) - Method in class org.openimaj.web.readability.Readability
 
findChildNodesWithName(Node, String) - Method in class org.openimaj.web.readability.Readability
 
flags - Variable in class org.openimaj.web.readability.Readability
 

G

getAllLinks() - Method in class org.openimaj.web.readability.Readability
 
getArticleContentType() - Method in class org.openimaj.web.readability.Readability
 
getArticleDate() - Method in class org.openimaj.web.readability.Readability
 
getArticleDateString() - Method in class org.openimaj.web.readability.Readability
 
getArticleHTML() - Method in class org.openimaj.web.readability.Readability
 
getArticleHTML_DOM() - Method in class org.openimaj.web.readability.Readability
 
getArticleImages() - Method in class org.openimaj.web.readability.Readability
 
getArticleLinks() - Method in class org.openimaj.web.readability.Readability
 
getArticleSubheadings() - Method in class org.openimaj.web.readability.Readability
 
getArticleText() - Method in class org.openimaj.web.readability.Readability
 
getArticleTextMapping(TreeWalker, List<Readability.MappingNode>) - Method in class org.openimaj.web.readability.Readability
 
getArticleTextMapping() - Method in class org.openimaj.web.readability.Readability
Get the mapping between bits of text in the dom & their xpaths
getArticleTitle() - Method in class org.openimaj.web.readability.Readability
 
getBody() - Method in class org.openimaj.web.readability.Readability
Equivalent to document.body in JS
getCharCount(Element, String) - Method in class org.openimaj.web.readability.Readability
Get the number of times a string s appears in the node e.
getCharCount(Element) - Method in class org.openimaj.web.readability.Readability
 
getClassWeight(Element) - Method in class org.openimaj.web.readability.Readability
Get an elements class/id weight.
getHref() - Method in class org.openimaj.web.readability.Anchor
 
getId() - Method in class org.openimaj.web.readability.Readability.MappingNode
 
getInnerHTML(Node) - Method in class org.openimaj.web.readability.Readability
 
getInnerText(Element, boolean) - Method in class org.openimaj.web.readability.Readability
Get the inner text of a node - cross browser compatibly.
getInnerText(Element) - Method in class org.openimaj.web.readability.Readability
 
getInnerTextSep(Node) - Method in class org.openimaj.web.readability.Readability
 
getLinkDensity(Element) - Method in class org.openimaj.web.readability.Readability
Get the density of links as a percentage of the content This is the amount of text that is inside a link divided by the total text in the node.
getReadability(String) - Static method in class org.openimaj.web.readability.Readability
Convenience method to build a Readability instance from an html string.
getReadability(String, boolean) - Static method in class org.openimaj.web.readability.Readability
Convenience method to build a Readability instance from an html string.
getText() - Method in class org.openimaj.web.readability.Anchor
 
getText() - Method in class org.openimaj.web.readability.Readability.MappingNode
 
getTitle() - Method in class org.openimaj.web.readability.Readability
 
grabArticle() - Method in class org.openimaj.web.readability.Readability
grabArticle - Using a variety of metrics (content score, classname, element types), find the content that is most likely to be the stuff a user wants to read.

H

hasContent() - Method in class org.openimaj.web.readability.Readability
 

I

init() - Method in class org.openimaj.web.readability.Readability
Runs readability.
initializeNode(Element) - Method in class org.openimaj.web.readability.Readability
Initialize a node with the readability object.

K

killBreaks(Element) - Method in class org.openimaj.web.readability.Readability
Remove extraneous break tags from a node.
killBreaksRe - Static variable in class org.openimaj.web.readability.Readability.Regexps
 

L

likelySubheadCandidateRe - Static variable in class org.openimaj.web.readability.Readability.Regexps
 
LINK_DENSITY_THRESHOLD - Static variable in class org.openimaj.web.readability.Readability
Threshold for removing elements with lots of links

M

main(String[]) - Static method in class org.openimaj.web.readability.Readability
Testing
MappingNode(String, String) - Constructor for class org.openimaj.web.readability.Readability.MappingNode
 
match(String, String) - Method in class org.openimaj.web.readability.Readability
Javascript-like String.match

N

negativeRe - Static variable in class org.openimaj.web.readability.Readability.Regexps
 
nodeToString(Node) - Method in class org.openimaj.web.readability.Readability
 
nodeToString(Node, boolean) - Static method in class org.openimaj.web.readability.Readability
 
normalizeRe - Static variable in class org.openimaj.web.readability.Readability.Regexps
 

O

okMaybeItsACandidateRe - Static variable in class org.openimaj.web.readability.Readability.Regexps
 
org.openimaj.web.readability - package org.openimaj.web.readability
 

P

parseDate() - Method in class org.openimaj.web.readability.Readability
 
positiveRe - Static variable in class org.openimaj.web.readability.Readability.Regexps
 
prepArticle(Element) - Method in class org.openimaj.web.readability.Readability
Prepare the article node for display.
prepDocument() - Method in class org.openimaj.web.readability.Readability
Prepare the HTML document for readability to scrape it.

R

Readability - Class in org.openimaj.web.readability
Class for extracting the "content" from web-pages, and ignoring adverts, etc.
Readability(Document) - Constructor for class org.openimaj.web.readability.Readability
Construct with the given document.
Readability(Document, boolean) - Constructor for class org.openimaj.web.readability.Readability
Construct with the given document.
Readability(Document, boolean, boolean) - Constructor for class org.openimaj.web.readability.Readability
Construct with the given document.
Readability.MappingNode - Class in org.openimaj.web.readability
 
Readability.Regexps - Class in org.openimaj.web.readability
Regular expressions for different types of content
Regexps() - Constructor for class org.openimaj.web.readability.Readability.Regexps
 
removeChildren(Node) - Method in class org.openimaj.web.readability.Readability
 
removeComments(Node) - Method in class org.openimaj.web.readability.Readability
 
replaceBrsRe - Static variable in class org.openimaj.web.readability.Readability.Regexps
 
replaceFontsRe - Static variable in class org.openimaj.web.readability.Readability.Regexps
 

S

search(String, String) - Method in class org.openimaj.web.readability.Readability
Javascript-like String.search
setAnchorText(String) - Method in class org.openimaj.web.readability.Anchor
Set the anchor text
setHref(String) - Method in class org.openimaj.web.readability.Anchor
Set the href
stringToNode(String) - Method in class org.openimaj.web.readability.Readability
 

T

titleSeparatorRe - Static variable in class org.openimaj.web.readability.Readability.Regexps
 
toString() - Method in class org.openimaj.web.readability.Anchor
 
toString() - Method in class org.openimaj.web.readability.Readability.MappingNode
 
trimRe - Static variable in class org.openimaj.web.readability.Readability.Regexps
 

U

unlikelyCandidatesRe - Static variable in class org.openimaj.web.readability.Readability.Regexps
 

V

videoRe - Static variable in class org.openimaj.web.readability.Readability.Regexps
 
A C D F G H I K L M N O P R S T U V 
Skip navigation links