Reference

readability.browser.open_in_browser(html)

Open the HTML document in a web browser, saving it to a temporary file to open it. Note that this does not delete the file after use. This is mainly meant for debugging.

readability.encoding.fix_charset(encoding)

Overrides encoding when charset declaration or charset determination is a subset of a larger charset. Created because of issues with Chinese websites

class readability.readability.Document(input, positive_keywords=None, negative_keywords=None, url=None, min_text_length=25, retry_length=250, xpath=False, handle_failures='discard')

Bases: object

Class to build a etree document out of html.

author()

Returns document author

content()

Returns document body

get_clean_html()

An internal method, which can be overridden in subclasses, for example, to disable or to improve DOM-to-text conversion in .summary() method

short_title()

Returns cleaned up document title

summary(html_partial=False, keep_all_images=False)

Given a HTML file, extracts the text of the article.

Parameters:
  • html_partial – return only the div of the document, don’t wrap in html and body tags.

  • keep_all_images – Keep all images in summary.

Warning: It mutates internal DOM representation of the HTML document, so it is better to call other API methods before this one.

title()

Returns document title

exception readability.readability.Unparseable

Bases: ValueError