[Python]ヘッダー情報をあれこれ取得してみる

2021-02-21

Pythonでヘッダー情報をあれこれ取得してみることの覚え書きです。

requestsでWeb情報を取得する

PythonのrequestsモジュールをインポートすることでWeb情報を色々と取得することができます。

requestモジュールがない場合は先にpipでインストールしておきます。

pip install requests

当、サイトのWeb情報を取得してみます。

対象にするのは先日、公開したこちらの記事。

[Python]metaタグ内の要素を取得(スクレイピング)する

import requests


info = requests.get('https://code-schools.com/python-meta/')
print(info)

# 実行結果
<Response [200]>

次にヘッダー情報を取得します。headersというメソッドが用意されているのでそちらを使います。

import requests


info = requests.get('https://code-schools.com/python-meta/')
print(info.headers)

# 実行結果
{'Date': 'Sun, 11 Nov 2018 02:18:04 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding, Accept-Encoding', 'Last-Modified': 'Wed, 31 Oct 2018 04:05:52 GMT', 'X-Mod-Pagespeed': 'Powered By mod_pagespeed', 'Cache-Control': 'max-age=0, no-cache, no-store, must-revalidate', 'Pragma': 'no-cache', 'Expires': 'Mon, 29 Oct 1923 20:30:00 GMT', 'X-Cache-Status': 'HIT', 'X-UA-Device': 'pc', 'Content-Encoding': 'gzip'}

dictになっているので、それぞれのキーを指定してあげることで取り出せます。

Last-Modified(最終更新)を取り出します。

import requests


info = requests.get('https://code-schools.com/python-meta/')
print(info.headers['Last-Modified'])

# 実行結果
Wed, 31 Oct 2018 04:05:52 GMT