Python Web Scraping - Beautiful Soup (Part 2)

Changing HTML Content using Beautiful Soup

insert()

from bs4 import BeautifulSoup
import requests
import re

html_doc = """
<html>
    <head>
    </head>
    <body>
    </body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

h3 = soup.new_tag('h3')
h3.string = 'This is a third level heading.'
soup.insert(0,h3)
# we are adding h3 tag to zeroth position

print(soup.prettify())

Output:

<h3>
 This is a third level heading.
</h3>
<html>
 <head>
 </head>
 <body>
 </body>
</html>

As you can see we had added h3 tag in zeroth position so that can be observed in output where h3 tag is prepended before html tag.

 

append()

from bs4 import BeautifulSoup
import requests
import re

html_doc = """
<html>
    <head>
    </head>
    <body>
    </body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

h3 = soup.new_tag('h3')
h3.string = 'This is a third level heading.'
soup.append(h3)
# we are adding appending h3 tag

print(soup.prettify())

Output:

<html>
 <head>
 </head>
 <body>
 </body>
</html>
<h3>
 This is a third level heading.
</h3>

 

Adding new attributes to tags

from bs4 import BeautifulSoup
import requests
import re

html_doc = """
<html>
    <head>
    </head>
    <body>
        <h1>This is heading one.</h1>
    </body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')

# we are adding a class attribute to h1
soup.h1['class'] = 'A'

print(soup.prettify())

Output:

<html>
 <head>
 </head>
 <body>
  <h1 class="A">
   This is heading one.
  </h1>
 </body>
</html>