Don’t put html, head and body tags automatically, beautifulsoup

Question

In [35]: import bs4 as bs

In [36]: bs.BeautifulSoup('<h1>FOO</h1>', "html.parser")
Out[36]: <h1>FOO</h1>

This parses the HTML with Python’s builtin HTML parser.
Quoting the docs:

Unlike html5lib, this parser makes no attempt to create a well-formed
HTML document by adding a <body> tag. Unlike lxml, it doesn’t even
bother to add an <html> tag.

Alternatively, you could use the html5lib parser and just select the element after <body>:

In [61]: soup = bs.BeautifulSoup('<h1>FOO</h1>', 'html5lib')

In [62]: soup.body.next
Out[62]: <h1>FOO</h1>

Leave a Comment Cancel reply