You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As an alternative, you may also use conda to install, just run:
$ conda install -c conda-forge readability-lxml
Usage
>>>importrequests>>>fromreadabilityimportDocument>>>response=requests.get('https://example.com')
>>>doc=Document(response.content)
>>>doc.title()
'Example Domain'>>>doc.summary()
"""<html><body><div><body id="readabilityBody">\n<div>\n <h1>Example Domain</h1>\n<p>This domain is established to be used for illustrative examples in documents. You mayuse this\n domain in examples without prior coordination or asking for permission.</p>\n <p><a href="https://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</div></body></html>"""
Change Log
0.8.4 Better CJK support, thanks @cdhigh
0.8.3.1 Support for python 3.8 - 3.13
0.8.3 We can now save all images via keep_all_images=True (default is to save 1 main image), thanks @botlabsDev
0.8.2 Added article author(s) (thanks @mattblaha)
0.8.1 Fixed processing of non-ascii HTMLs via regexps.
0.8 Replaced XHTML output with HTML5 output in summary() call.
0.7.1 Support for Python 3.7 . Fixed a slowdown when processing documents with lots of spaces.
0.7 Improved HTML5 tags handling. Fixed stripping unwanted HTML nodes (only first matching node was removed before).
0.6 Finally a release which supports Python versions 2.6, 2.7, 3.3 - 3.6
0.5 Preparing a release to support Python versions 2.6, 2.7, 3.3 and 3.4
0.4 Added Videos loading and allowed more images per paragraph
0.3 Added Document.encoding, positive_keywords and negative_keywords