html5lib - HTML解析库


MIT
跨平台
Python

软件简介

html5lib 是一个 Ruby 和 Python 用来解析 HTML 文档的类库,支持HTML 5 以及最大程度兼容桌面浏览器。

主要特性包括:

  • Parses valid and invalid HTML documents to a tree
  • Support for minidom , ElementTree (including cElementTree and lxml.etree ), BeautifulSoup and custom simpletree output formats
  • DOM to SAX converter
  • Reports parse errors
  • Character encoding detection
  • XML mode for working with illformed XML e.g. feeds
  • Filtering and serializing of trees
  • HTML+CSS sanitizer
  • Many unit tests
  • Faster than before :)