【E資格対策】活性化関数

2026-02-11

E資格対策/温度パラメータ E資格対策/勾配消失 E資格対策/GELU E資格対策/双曲線関数

シグモイド(Sigmoid)関数

\begin{align*} \sigma(x) & = \dfrac{1}{1 + e^{-x}} \\ \sigma^{\prime}(x) & = \sigma(x) (1 - \sigma(x)) \end{align*}

微分式の導出 (クリックで展開)

\begin{align*} \sigma^{\prime}(x) & = \dfrac{d}{dx} \left( \dfrac{1}{1 + e^{-x}} \right) \\ & = - (1 + e^{-x})^{-2} \cdot (-e^{-x}) \\ & = \dfrac{e^{-x}}{(1 + e^{-x})^{2}} \\ & = \dfrac{1 + e^{-x} - 1}{(1 + e^{-x})^{2}} \\ & = \dfrac{1}{1 + e^{-x}} - \dfrac{1}{(1 + e^{-x})^{2}} \\ & = \sigma(x) - \sigma(x)^{2} \\ & = \sigma(x)(1 - \sigma(x)) \end{align*}

ニューラルネットワークの隠れ層や出力層で広く用いられる.

入力値を $0$ から $1$ の範囲に変換して出力することによって, 出力を確率として解釈できるようにする. 入力値の絶対値が大きい場合, 勾配が非常に小さくなり, 勾配消失問題が発生する可能性がある.

ReLU(Rectified Linear Unit)関数

\begin{align*} \text{ReLU}(x) & = \max(0, x) \\ \text{ReLU}^{\prime}(x) & = \begin{cases} 0 & (x < 0) \\ 1 & (x \geq 0) \end{cases} \end{align*}

DNNにおける隠れ層の活性化関数として広く用いられている. 非線形性の導入によって, 勾配消失問題を緩和することができる.

ReLU関数は, 順伝播時に負の入力を $0$ にすることができ, 逆伝播時に勾配消失問題を緩和できる. しかし, 負の入力に対する勾配が $0$ になることで, 学習速度が低下する可能性がある.

Leaky ReLU関数

\begin{align*} \text{Leaky ReLU}(x) & = \max(\alpha x, x) \quad (\alpha \text{は小さな正の定数}) \\ \text{Leaky ReLU}^{\prime}(x) & = \begin{cases} \alpha & (x < 0) \\ 1 & (x \geq 0) \end{cases} \end{align*}

Leaky ReLU関数は, 負の入力に対しても小さな勾配を持つことで, 勾配消失問題を効果的に抑制し, ReLU関数の欠点であった学習速度の低下を改善することができる.

$\tanh$ (双曲線正接)関数

\begin{align*} \tanh(x) & = \dfrac{e^{x} - e^{-x}}{e^{x} + e^{-x}} \\ \tanh^{\prime}(x) & = 1 - \tanh^{2}(x) \end{align*}

微分式の導出 (クリックで展開)

\begin{align*} \tanh^{\prime}(x) & = \dfrac{d}{dx} \left( \dfrac{e^{x} - e^{-x}}{e^{x} + e^{-x}} \right) \\ & = \dfrac{(e^{x} + e^{-x})(e^{x} + e^{-x}) - (e^{x} - e^{-x})(e^{x} - e^{-x})}{(e^{x} + e^{-x})^{2}} \\ & = \left( \dfrac{e^{x} + e^{-x}}{e^{x} + e^{-x}} \right)^2 - \left( \dfrac{e^{x} - e^{-x}}{e^{x} + e^{-x}} \right)^2 \\ & = 1 - \tanh^{2}(x) \end{align*}

DNNにおける隠れ層の活性化関数として広く用いられる. $\tanh$ 関数は微分の最大値が $1$ とシグモイド関数より大きいため, 勾配消失が起きにくい.

GELU (Gaussian Error Linear Unit)関数

\begin{align*} \text{GELU}(x) & = x \Phi(x) \\ \text{GELU}^{\prime}(x) & = \Phi(x) + x \phi(x) \end{align*}

ただし,

$\phi(x) = \dfrac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} x^{2}}$ は標準正規分布の確率密度関数
$\displaystyle \Phi(x) = \int_{-\infty}^{x} \phi(t) dt$ は標準正規分布の累積分布関数

入力値に沿った確率値を取るゲート係数をReLU関数に乗算した形の活性化関数である. ガウス関数に基づいた滑らかな非線形性があり, 勾配が焼失しにくいという特性を持つ.

また, ネットワークが大規模であるほど効率的に正則化を行うことができるため, Transformerのような大規模モデルでよく用いられている.

勾配消失問題

勾配消失問題とは, 誤差逆伝播法を利用してニューラルネットワークを学習する際に, ネットワークの総数が増加すると, 誤差の勾配が徐々に小さくなり, 学習が進まなくなってしまう問題のことである. 特にシグモイド関数や $\tanh$ 関数などの活性化関数を用いる際に起こりやすい.

また, パラメータの初期値が適切でない場合, 勾配消失や勾配爆発が発生しやすくなる.

解決策として, 適切な活性化関数や初期化方法, バッチ正規化などが挙げられる.

温度パラメータ

温度パラメータとは, 確率分布や活性化関数の挙動がどれだけ変化しやすいかを制御するためのパラメータである.

例えば, 温度付きシグモイド関数は以下のように定義される. ただし, $T$ は温度パラメータである.

f(x) = \dfrac{1}{1 + e^{-\frac{x}{T}}}

$T > 1$ の場合 : 曲線の $0$ から $1$ への変化が緩やかになる, つまり入力に対する出力の変化が緩やかになる.
$T = 1$ の場合 : 通常のシグモイド関数と同じ挙動を示す.
$0 < T < 1$ の場合 : 曲線の $0$ から $1$ への変化が急激になる, つまり入力に対する出力の変化が急激になる.

グラフの描画に用いたコードを以下に示す.

コード (クリックで展開)

1.import numpy as np
2.from scipy.stats import norm
3.import numpy as np
4.import matplotlib.pyplot as plt
5. 
6.# 1. Sigmoid function and its derivative
7.def sigmoid(x):
8.    return 1 / (1 + np.exp(-x))
9. 
10.def sigmoid_derivative(x):
11.    s = sigmoid(x)
12.    return s * (1 - s)
13. 
14.# 2. ReLU function and its derivative
15.def relu(x):
16.    return np.maximum(0, x)
17. 
18.def relu_derivative(x):
19.    return np.where(x > 0, 1, 0)
20. 
21.# 3. Leaky ReLU function and its derivative
22.def leaky_relu(x, alpha=0.01):
23.    return np.where(x > 0, x, alpha * x)
24. 
25.def leaky_relu_derivative(x, alpha=0.01):
26.    return np.where(x > 0, 1, alpha)
27. 
28.# 4. Tanh function and its derivative
29.def tanh_function(x):
30.    return np.tanh(x)
31. 
32.def tanh_derivative(x):
33.    return 1 - np.tanh(x)**2
34. 
35.# 5. GELU function and its derivative
36.def gelu(x):
37.    return x * norm.cdf(x)
38. 
39.def gelu_derivative(x):
40.    return norm.cdf(x) + x * norm.pdf(x)
41. 
42. 
43.# Define the x-axis range
44.x_values = np.linspace(-5, 5, 400)
45. 
46.# Calculate y-values for each activation function and its derivative
47. 
48.# Sigmoid
49.sigmoid_y = sigmoid(x_values)
50.sigmoid_derivative_y = sigmoid_derivative(x_values)
51. 
52.# ReLU
53.relu_y = relu(x_values)
54.relu_derivative_y = relu_derivative(x_values)
55. 
56.# Leaky ReLU
57.leaky_relu_y = leaky_relu(x_values)
58.leaky_relu_derivative_y = leaky_relu_derivative(x_values)
59. 
60.# Tanh
61.tanh_y = tanh_function(x_values)
62.tanh_derivative_y = tanh_derivative(x_values)
63. 
64.# GELU
65.gelu_y = gelu(x_values)
66.gelu_derivative_y = gelu_derivative(x_values)
67. 
68. 
69.# List of activation functions and their derivatives, along with labels and data
70.activation_functions = [
71.    {
72.        'name': 'Sigmoid',
73.        'function_y': sigmoid_y,
74.        'derivative_y': sigmoid_derivative_y
75.    },
76.    {
77.        'name': 'ReLU',
78.        'function_y': relu_y,
79.        'derivative_y': relu_derivative_y
80.    },
81.    {
82.        'name': 'Leaky ReLU',
83.        'function_y': leaky_relu_y,
84.        'derivative_y': leaky_relu_derivative_y
85.    },
86.    {
87.        'name': 'Tanh',
88.        'function_y': tanh_y,
89.        'derivative_y': tanh_derivative_y
90.    },
91.    {
92.        'name': 'GELU',
93.        'function_y': gelu_y,
94.        'derivative_y': gelu_derivative_y
95.    }
96.]
97. 
98.# Iterate through each function and create a separate plot
99.for func_data in activation_functions:
100.    name = func_data['name']
101.    function_y = func_data['function_y']
102.    derivative_y = func_data['derivative_y']
103. 
104.    plt.figure(figsize=(10, 6)) # Create a new figure for each function
105.    plt.plot(x_values, function_y, label=f'{name} Function', color='blue')
106.    plt.plot(x_values, derivative_y, label=f'{name} Derivative', color='red', linestyle='--')
107.    plt.title(f'{name} Function and its Derivative')
108.    plt.xlabel('x')
109.    plt.ylabel('y')
110.    plt.legend()
111.    plt.grid(True)
112.    plt.show()