How to Extract Script and CSS Files from Web Pages in Python
Published on Aug. 22, 2023, 12:16 p.m.
To extract script and CSS files from a web page in Python, you can use the BeautifulSoup
library along with the requests
library to send an HTTP request to the web page and parse its HTML content. Here’s an example Python code that demonstrates how to extract script and CSS URLs from a web page:
import requests
from bs4 import BeautifulSoup
url = 'https://www.example.com/page.html'
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html.parser')
# Extract script and CSS URLs
script_urls = [script['src'] for script in soup.find_all('script', src=True)]
css_urls = [link['href'] for link in soup.find_all('link', rel='stylesheet')]
print('Scripts:')
print('\n'.join(script_urls))
print('')
print('CSS:')
print('\n'.join(css_urls))
else:
print('Request failed')
In this code, we first send an HTTP GET request to the web page and use the BeautifulSoup
library to parse its HTML content. We then extract the URLs of all script and CSS files by searching for the script
and link
tags with the appropriate attributes using find_all()
method.
Note that the scripts and CSS URLs might be relative URLs, so you may need to use urllib.parse.urljoin()
method to create an absolute URL from them.