Time series choropleth map with Folium

Folium はバックエンドに Leaflet を使用したインタラクティブな地図作成用の Python ライブラリ.

政府統計の総合窓口から、国勢調査の時系列データ、”男女別人口－全国，都道府県（大正９年～平成27年）”、を取得し、それを日本地図と組み合わせて表示してみる.

CSV 形式のデータを読み込み、不必要な行を削除し、空データを 0 で置き換える. 日本語の行ヘッダーを英語に置き換えて、データの型を数値に変換. 都道府県の id を shapefile のそれと合わせるために、保存しておく.

import pandas as pd  

path = 'c01.csv'
df = pd.read_csv(path, skipfooter=2, encoding='sjis')
# delete unnecessary rows
df = df[(df['都道府県コード']!='0A') & (df['都道府県コード']!='0B')]
df = df[(df['都道府県名']!='全国')]
# replace 'empty' character with 0
df.replace(to_replace=['-'], value=0, inplace=True)

# delete unnecessary cols
cols = ['都道府県コード', '都道府県名', '西暦（年）', '人口（総数）', '人口（男）', '人口（女）']
df = df[cols]
cols = ['code', 'prefecture', 'year', 'total', 'male', 'female']
df.columns = cols
# convert to numeric types if possible
df = df.apply(pd.to_numeric, errors='ignore')
# create prefecture name to id code mapping for later use
name2id = df.set_index('prefecture').to_dict()['code']

JavaScript に渡す時系列を作成

import numpy as np

# create minimum data frame
cols = ['code', 'year', 'total']
df = df[cols]
# get number/the first year/the last year of census carried out
nn = len(df.year.unique())
start_year = df.year.min()
end_year = df.year.max()
# create array of years to be passed to java script
years = pd.date_range('%s-10-01' % df.year.min(), periods=nn, freq='5Y') 
years = np.array(list(map(time.mktime, map(datetime.datetime.timetuple, years))))
years = years.astype('int32')

都道府県別の人口比率を計算

# re-format the dataframe
df = pd.pivot_table(df, index='code', columns='year', values='total')   
# compute fraction of population for each prefecture/year
for year in range(start_year, end_year+1, 5): 
  df.loc[:, year] /= df.loc[:,year].sum()/100.
# minimum/maximum percentage
vmin = df[df > 0].min().min()
vmax = df.values.max()

都道府県境界を含む shapefile を読み込み、都道府県の id を CSV データと一致させ、GeoJson 形式に変換.

## read shapefile
import geopandas as gpd

path = r'prefecture.shp'
pref = gpd.read_file(path)
# re-mapping of prefecture id in sync with the csv data
pref['id'] = pref['都道府県'].map(name2id)
pref = pref.set_index('id')
pref['id'] = pref.index
# convert to json format
cols = ['id', 'geometry']
src = pref[cols].to_json()

人口比率をカラーコードに変換し、choropleth map 作成用のデータを作成.

import branca.colormap as cm

linear = cm.LinearColormap(['green', 'yellow', 'red'], vmin=vmin, vmax=vmax).to_step(100)
linear.caption = '都道府県人口比率 [%]'
data = {}
for pref in df.index:
  d = pd.DataFrame(
          {
            'opacity': np.full(nn, 0.6),
            'color': df.loc[pref, :].apply(linear).tolist()
          },
          index=years
        )
  data[pref] = d
styledict = {pref: v.to_dict(orient='index') for pref, v in data.items()}

最後に、時系列 choropleth map を作成.

import folium
import folium.plugins

zoom_start = 5
m = folium.Map(location=[35.658593, 139.745441],
                    tiles="Stamen Watercolor",
                    zoom_start=zoom_start)
g = folium.plugins.TimeSliderChoropleth(
    src,
    styledict=styledict,
    name='data'
).add_to(m)
m.add_child(folium.plugins.Fullscreen())
folium.LayerControl().add_to(m)
m.add_child(linear)
m.save('map.html')

作成した地図はこちら. リンク先の地図では、国土地理院提供の地理院タイルを含む各種タイルを選択できるようにしてある. 上部にあるスライダーを動かすことで、各都道府県の人口比率の時系列変化を見ることが出来る. 国勢調査の開始年、大正9年 (1920)年時点では東京への一極集中度はそれほど高くないが、その後、昭和15年 (1940) の調査まで増加. 昭和20年 (1945)の調査では、疎開の為か減少. 戦後にまた一極集中度が高まっているのが見て取れる.

今回、GeoViews で時系列データを用いた choropleth map の作成例が見つからなかったので、Folium を使用した.