6  指數與對數函數 (Exponential & Logarithmic Functions)

學習目標

  • 理解 \(e\) 的特殊性質:\(\frac{d}{dx}e^x = e^x\)
  • 掌握對數微分:\(\frac{d}{dx}\ln x = \frac{1}{x}\)
  • 認識 log transformation 的效果
  • 連結醫學統計:log-likelihood, log-odds, hazard ratio

6.1 為什麼統計學愛用 \(e\)\(\ln\)

在統計文獻中,你會不斷看到:

  • Log-likelihood:為什麼要取對數?
  • Logistic regression:log-odds 是什麼?
  • Cox regression:hazard ratio 為什麼要取 log?
  • 右偏資料:為什麼要做 log transformation?

答案都跟 \(e\)\(\ln\) 的特殊微分性質有關!

6.2 神奇的數字 \(e\)

6.2.1 \(e\) 的定義

自然對數的底 (natural logarithm base)

\[ e = \lim_{n \to \infty} \left(1 + \frac{1}{n}\right)^n \approx 2.71828... \]

或者用另一種定義:

\[ e = \sum_{n=0}^{\infty} \frac{1}{n!} = 1 + 1 + \frac{1}{2} + \frac{1}{6} + \frac{1}{24} + \cdots \]

6.2.2 \(e^x\) 的神奇性質

定理

\[ \frac{d}{dx}e^x = e^x \]

這是 \(e\) 最重要的性質:\(e^x\) 是唯一一個導數等於自己的函數!

Code
x <- seq(-2, 2, by = 0.01)
y <- exp(x)

# 在幾個點畫切線
points_x <- c(-1, 0, 1)
slopes <- exp(points_x)  # 斜率 = e^x

# 切線方程式:y - y0 = slope * (x - x0)
make_tangent <- function(x0, slope) {
  y0 <- exp(x0)
  function(x) y0 + slope * (x - x0)
}

tangents <- lapply(seq_along(points_x), function(i) {
  data.frame(
    x = x,
    y = make_tangent(points_x[i], slopes[i])(x),
    point = i
  )
})
tangent_df <- do.call(rbind, tangents)

ggplot() +
  # 原函數曲線
  geom_line(aes(x = x, y = y), color = "#2E86AB", linewidth = 1.5) +
  # 切線
  geom_line(data = tangent_df, aes(x = x, y = y, group = point),
            color = "#E94F37", alpha = 0.5, linetype = "dashed") +
  # 切點
  geom_point(aes(x = points_x, y = exp(points_x)),
             color = "#E94F37", size = 4) +
  # 標註
  annotate("text", x = -1, y = exp(-1) + 0.8,
           label = paste0("x = -1\n斜率 = e⁻¹ ≈ ", round(exp(-1), 2)),
           hjust = 0.5, size = 3.5, color = "#E94F37") +
  annotate("text", x = 0, y = exp(0) + 1.2,
           label = paste0("x = 0\n斜率 = e⁰ = 1"),
           hjust = 0.5, size = 3.5, color = "#E94F37") +
  annotate("text", x = 1, y = exp(1) + 0.8,
           label = paste0("x = 1\n斜率 = e¹ ≈ ", round(exp(1), 2)),
           hjust = 0.5, size = 3.5, color = "#E94F37") +
  labs(
    title = expression(e^x ~ "的神奇性質"),
    subtitle = "每一點的切線斜率,恰好等於該點的函數值",
    x = "x", y = expression(e^x)
  ) +
  coord_cartesian(xlim = c(-2, 2), ylim = c(0, 8)) +
  theme_minimal(base_size = 14)
Figure 6.1: e^x 的神奇性質:每一點的斜率都等於函數值

6.2.3 推廣:\(e^{kx}\) 的導數

使用連鎖律:

\[ \frac{d}{dx}e^{kx} = ke^{kx} \]

範例

  • \(\frac{d}{dx}e^{2x} = 2e^{2x}\)
  • \(\frac{d}{dx}e^{-x} = -e^{-x}\)
  • \(\frac{d}{dx}e^{-0.5t} = -0.5e^{-0.5t}\)(藥物代謝)

6.3 對數函數 \(\ln x\)

6.3.1 對數的定義

自然對數 (natural logarithm)\(\ln x\)\(e^x\) 的反函數。

\[ y = \ln x \quad \Leftrightarrow \quad e^y = x \]

性質

  • \(\ln(e) = 1\)
  • \(\ln(1) = 0\)
  • \(\ln(ab) = \ln(a) + \ln(b)\)(乘法變加法!)
  • \(\ln(a^b) = b\ln(a)\)

6.3.2 \(\ln x\) 的導數

定理

\[ \frac{d}{dx}\ln x = \frac{1}{x} \]

Code
x <- seq(0.1, 5, by = 0.01)

f <- log(x)        # ln(x)
f_prime <- 1/x     # d/dx ln(x) = 1/x

df <- data.frame(x, f, f_prime)

p1 <- ggplot(df, aes(x, f)) +
  geom_line(color = "#2E86AB", linewidth = 1.5) +
  geom_hline(yintercept = 0, color = "gray70", linetype = "dashed") +
  geom_vline(xintercept = 1, color = "#E94F37", linetype = "dotted") +
  annotate("point", x = 1, y = 0, color = "#E94F37", size = 4) +
  annotate("text", x = 1, y = -0.5, label = "ln(1) = 0",
           hjust = 0.5, color = "#E94F37") +
  labs(title = expression(f(x) == ln(x)), y = "f(x)") +
  theme_minimal(base_size = 12)

p2 <- ggplot(df, aes(x, f_prime)) +
  geom_line(color = "#E94F37", linewidth = 1.5) +
  geom_hline(yintercept = 0, color = "gray70") +
  geom_vline(xintercept = 1, color = "#E94F37", linetype = "dotted") +
  annotate("point", x = 1, y = 1, color = "#E94F37", size = 4) +
  annotate("text", x = 1, y = 1.5, label = "斜率 = 1",
           hjust = 0.5, color = "#E94F37") +
  labs(title = expression(f*"'"*(x) == frac(1, x)), y = "f'(x)") +
  ylim(0, 5) +
  theme_minimal(base_size = 12)

p1 / p2 +
  plot_annotation(
    title = "對數函數與其導數",
    subtitle = "ln(x) 的斜率隨 x 增加而遞減"
  )
Figure 6.2: ln(x) 與其導數 1/x

觀察

  • \(\ln x\)\(x=1\) 處斜率最大(= 1)
  • 隨著 \(x\) 增加,斜率越來越小
  • \(\ln x\) 持續增加,但增加速度越來越慢

6.3.3 推廣:\(\ln(g(x))\) 的導數

使用連鎖律:

\[ \frac{d}{dx}\ln(g(x)) = \frac{1}{g(x)} \cdot g'(x) = \frac{g'(x)}{g(x)} \]

範例

\[ \frac{d}{dx}\ln(x^2 + 1) = \frac{2x}{x^2 + 1} \]

6.4 Log Transformation 的效果

6.4.1 為什麼要做 log transformation?

在醫學資料中,很多變數是右偏 (right-skewed) 的:

  • 醫療費用
  • 住院天數
  • 病毒量
  • 收入

對數轉換可以:

  1. 壓縮尺度:把大範圍壓縮到小範圍
  2. 接近常態:讓右偏分布更對稱
  3. 乘法變加法\(\ln(ab) = \ln a + \ln b\)
  4. 穩定變異:讓變異數更穩定
Code
set.seed(42)
# 模擬醫療費用資料(右偏)
cost <- rlnorm(1000, meanlog = 10, sdlog = 1)

df <- data.frame(
  cost = cost,
  log_cost = log(cost)
)

p1 <- ggplot(df, aes(cost)) +
  geom_histogram(fill = "#2E86AB", color = "white", bins = 50) +
  geom_vline(xintercept = median(cost), color = "#E94F37",
             linewidth = 1, linetype = "dashed") +
  annotate("text", x = median(cost), y = 80,
           label = paste0("中位數 = ", scales::comma(round(median(cost)))),
           hjust = -0.1, color = "#E94F37") +
  labs(
    title = "原始資料:醫療費用",
    subtitle = "嚴重右偏",
    x = "費用(元)", y = "次數"
  ) +
  scale_x_continuous(labels = scales::comma) +
  theme_minimal(base_size = 12)

p2 <- ggplot(df, aes(log_cost)) +
  geom_histogram(fill = "#E94F37", color = "white", bins = 50) +
  geom_vline(xintercept = median(log(cost)), color = "#2E86AB",
             linewidth = 1, linetype = "dashed") +
  annotate("text", x = median(log(cost)), y = 80,
           label = paste0("中位數 = ", round(median(log(cost)), 2)),
           hjust = -0.1, color = "#2E86AB") +
  labs(
    title = "Log 轉換後",
    subtitle = "接近常態分布",
    x = "log(費用)", y = "次數"
  ) +
  theme_minimal(base_size = 12)

p1 / p2 +
  plot_annotation(
    title = "Log Transformation 的威力",
    subtitle = "右偏資料經過 log 轉換後更對稱、更接近常態"
  )
Figure 6.3: Log transformation 的效果:右偏資料變對稱

6.4.2 Log transformation 的視覺效果

Code
x <- seq(1, 1000, by = 1)
y <- x  # 假設是線性關係

df <- data.frame(x, y)

p1 <- ggplot(df, aes(x, y)) +
  geom_line(color = "#2E86AB", linewidth = 1.2) +
  labs(
    title = "原始尺度",
    x = "X", y = "Y"
  ) +
  theme_minimal(base_size = 12)

p2 <- ggplot(df, aes(x, y)) +
  geom_line(color = "#E94F37", linewidth = 1.2) +
  scale_x_log10(labels = scales::comma) +
  scale_y_log10(labels = scales::comma) +
  labs(
    title = "Log-log 尺度",
    x = "log(X)", y = "log(Y)"
  ) +
  theme_minimal(base_size = 12)

p1 + p2 +
  plot_annotation(
    title = "Log 尺度讓大範圍資料更容易視覺化"
  )
Figure 6.4: 原始尺度 vs. Log 尺度

6.5 統計應用

6.5.1 1. Log-likelihood

在最大概似估計中,我們常對 likelihood 取對數:

\[ \ell(\theta) = \ln L(\theta) = \ln \prod_{i=1}^n f(x_i; \theta) = \sum_{i=1}^n \ln f(x_i; \theta) \]

為什麼要取 log?

  1. 乘法變加法\(\ln(ab) = \ln a + \ln b\)
  2. 數值穩定:很多小數相乘會 underflow,取 log 避免此問題
  3. 微分簡單\(\frac{d}{d\theta}\ln L(\theta)\)\(\frac{d}{d\theta}L(\theta)\) 容易計算
  4. 保持單調性\(\ln\) 是單調遞增函數,最大值位置不變
Code
# 假設觀察到的資料
set.seed(123)
data <- rnorm(20, mean = 5, sd = 2)

# Likelihood 函數(固定 sigma = 2)
likelihood <- function(mu) {
  prod(dnorm(data, mean = mu, sd = 2))
}

# Log-likelihood 函數
log_lik <- function(mu) {
  sum(dnorm(data, mean = mu, sd = 2, log = TRUE))
}

mu_range <- seq(2, 8, by = 0.01)
L_values <- sapply(mu_range, likelihood)
ll_values <- sapply(mu_range, log_lik)

# MLE
mu_mle <- mu_range[which.max(ll_values)]

df <- data.frame(mu = mu_range, L = L_values, ll = ll_values)

p1 <- ggplot(df, aes(mu, L)) +
  geom_line(color = "#2E86AB", linewidth = 1.2) +
  geom_vline(xintercept = mu_mle, color = "#E94F37", linetype = "dashed") +
  labs(
    title = "Likelihood L(μ)",
    subtitle = "數值極小,容易 underflow",
    x = expression(mu), y = "L(μ)"
  ) +
  theme_minimal(base_size = 12)

p2 <- ggplot(df, aes(mu, ll)) +
  geom_line(color = "#E94F37", linewidth = 1.2) +
  geom_vline(xintercept = mu_mle, color = "#2E86AB", linetype = "dashed") +
  geom_point(aes(x = mu_mle, y = max(ll_values)),
             color = "#E94F37", size = 4) +
  annotate("text", x = mu_mle + 0.3, y = max(ll_values),
           label = paste0("MLE = ", round(mu_mle, 2)),
           hjust = 0, color = "#E94F37") +
  labs(
    title = "Log-likelihood ℓ(μ)",
    subtitle = "數值範圍合理,容易計算",
    x = expression(mu), y = "ℓ(μ)"
  ) +
  theme_minimal(base_size = 12)

p1 / p2 +
  plot_annotation(
    title = "為什麼要用 Log-likelihood?",
    subtitle = "兩者的最大值位置相同,但 log-likelihood 更好計算"
  )
Figure 6.5: Likelihood vs. Log-likelihood

6.5.2 2. Logistic Regression 的 Log-odds

Logistic regression 模型:

\[ \ln\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k \]

左邊叫做 log-oddslogit

為什麼要取 log?

  • 機率 \(p \in (0, 1)\),範圍受限
  • Odds \(= \frac{p}{1-p} \in (0, \infty)\),範圍是正數
  • Log-odds \(= \ln\left(\frac{p}{1-p}\right) \in (-\infty, \infty)\),範圍是全實數

這樣就可以用線性模型了!

Code
p <- seq(0.01, 0.99, by = 0.01)
odds <- p / (1 - p)
log_odds <- log(odds)

df <- data.frame(p, odds, log_odds)

p1 <- ggplot(df, aes(p, odds)) +
  geom_line(color = "#2E86AB", linewidth = 1.2) +
  geom_hline(yintercept = 1, color = "gray70", linetype = "dashed") +
  geom_vline(xintercept = 0.5, color = "#E94F37", linetype = "dotted") +
  labs(
    title = "Odds = p / (1 - p)",
    x = "機率 p", y = "Odds"
  ) +
  theme_minimal(base_size = 12)

p2 <- ggplot(df, aes(p, log_odds)) +
  geom_line(color = "#E94F37", linewidth = 1.2) +
  geom_hline(yintercept = 0, color = "gray70", linetype = "dashed") +
  geom_vline(xintercept = 0.5, color = "#E94F37", linetype = "dotted") +
  annotate("point", x = 0.5, y = 0, color = "#E94F37", size = 4) +
  labs(
    title = "Log-odds = ln(p / (1 - p))",
    x = "機率 p", y = "Log-odds"
  ) +
  theme_minimal(base_size = 12)

p1 / p2 +
  plot_annotation(
    title = "Logistic Regression 的核心轉換",
    subtitle = "Log-odds 把 (0,1) 的機率映射到整個實數軸"
  )
Figure 6.6: 機率 → Odds → Log-odds 的轉換

6.5.3 3. Cox Regression 的 Hazard Ratio

Cox proportional hazards model

\[ h(t|X) = h_0(t) \cdot \exp(\beta_1 X_1 + \cdots + \beta_k X_k) \]

兩組的 hazard ratio (HR)

\[ \text{HR} = \frac{h(t|X=1)}{h(t|X=0)} = e^{\beta} \]

Log hazard ratio

\[ \ln(\text{HR}) = \beta \]

迴歸係數 \(\beta\) 就是 log hazard ratio!

  • 如果 HR = 2,代表風險是 2 倍,\(\beta = \ln(2) \approx 0.69\)
  • 如果 HR = 0.5,代表風險是一半,\(\beta = \ln(0.5) \approx -0.69\)

6.6 練習題

6.6.1 觀念題

  1. 用自己的話解釋:為什麼 \(e^x\) 的導數等於自己很特別?

大部分函數的導數(斜率)都會隨著 x 改變,但 \(e^x\) 很神奇:在任何一點,它的斜率都恰好等於該點的函數值。這使得 \(e^x\) 在描述「變化率與當前狀態成正比」的現象時非常好用,例如人口成長、放射性衰變、藥物濃度下降等。

  1. 為什麼統計學要對 likelihood 取對數?列出至少三個理由。

三個主要理由:(1) 乘法變加法:多個機率的乘積變成對數的和,計算更簡單;(2) 數值穩定:很多小數相乘會造成 underflow,取 log 可避免;(3) 微分容易:log-likelihood 的導數形式比原始 likelihood 簡單得多,方便求最大值。此外,log 是單調遞增函數,最大值位置不變。

  1. 在 logistic regression 中,為什麼要用 log-odds 而不是直接用機率?

機率 \(p\) 的範圍被限制在 \((0, 1)\),無法直接用線性模型(線性模型的值域是全實數)。透過轉換:機率 → odds (\(\frac{p}{1-p}\)) → log-odds (\(\ln\frac{p}{1-p}\)),將範圍從 \((0,1)\) 映射到 \((-\infty, \infty)\),這樣就可以用線性模型來建模了。

6.6.2 計算題

  1. 計算下列函數的導數:
    1. \(f(x) = e^{3x}\)
    2. \(g(x) = \ln(2x + 1)\)
    3. \(h(x) = x^2 e^x\)(使用乘法規則)
  1. 使用連鎖律:\(f'(x) = 3e^{3x}\)

  2. 使用連鎖律:\(g'(x) = \frac{1}{2x+1} \cdot 2 = \frac{2}{2x+1}\)

  3. 使用乘法規則 \((uv)' = u'v + uv'\)\(h'(x) = 2x \cdot e^x + x^2 \cdot e^x = (2x + x^2)e^x = x(2 + x)e^x\)

  1. 如果 HR = 1.5,計算 log HR(即 \(\beta\))。

\(\beta = \ln(\text{HR}) = \ln(1.5) \approx 0.405\)。可用 R 驗證:log(1.5)。這表示在 Cox regression 中,如果某個變數的 HR 是 1.5(風險增加 50%),其對應的迴歸係數約為 0.405。

  1. 驗證:\(\frac{d}{dx}[\ln(x^2)] = \frac{2}{x}\)

方法一(連鎖律):\(\frac{d}{dx}\ln(x^2) = \frac{1}{x^2} \cdot 2x = \frac{2x}{x^2} = \frac{2}{x}\)

方法二(對數性質):\(\ln(x^2) = 2\ln(x)\),所以 \(\frac{d}{dx}[2\ln(x)] = 2 \cdot \frac{1}{x} = \frac{2}{x}\)

6.6.3 R 操作題

  1. 模擬一組右偏資料,繪製 log transformation 前後的直方圖。
set.seed(123)
data <- rlnorm(1000, meanlog = 5, sdlog = 1)

# 原始資料
hist(data, breaks = 50, col = "#2E86AB",
     main = "原始資料(右偏)", xlab = "值")

# Log 轉換後
hist(log(data), breaks = 50, col = "#E94F37",
     main = "Log 轉換後(更對稱)", xlab = "log(值)")

參數說明:meanlogsdlog 控制分布的位置和離散程度,數值越大越右偏。

Code
set.seed(___)
data <- rlnorm(1000, meanlog = ___, sdlog = ___)
# 繪製 histogram
  1. 繪製 \(f(x) = xe^{-x}\) 及其導數,找出極值點。

先求導數:使用乘法規則,\(f'(x) = e^{-x} + x \cdot (-e^{-x}) = e^{-x}(1-x)\)。令 \(f'(x) = 0\)\(x = 1\)

x <- seq(0, 5, by = 0.01)
f <- x * exp(-x)
f_prime <- exp(-x) * (1 - x)

# 繪製函數
plot(x, f, type = "l", col = "#2E86AB", lwd = 2,
     ylab = "y", main = "f(x) = x·e^(-x) 及其導數")
lines(x, f_prime, col = "#E94F37", lwd = 2)
abline(v = 1, lty = 2, col = "gray")
abline(h = 0, lty = 2, col = "gray")
legend("topright", c("f(x)", "f'(x)"), col = c("#2E86AB", "#E94F37"), lwd = 2)

極值點在 \(x = 1\)\(f(1) = e^{-1} \approx 0.368\)(極大值)。

本章重點整理

Important核心概念

微分公式

  1. \(\frac{d}{dx}e^x = e^x\) —— \(e^x\) 是唯一導數等於自己的函數
  2. \(\frac{d}{dx}e^{kx} = ke^{kx}\)
  3. \(\frac{d}{dx}\ln x = \frac{1}{x}\)
  4. \(\frac{d}{dx}\ln(g(x)) = \frac{g'(x)}{g(x)}\)

為什麼統計學愛用 log

  1. Log-likelihood:乘法變加法、數值穩定、微分簡單
  2. Logistic regression:log-odds 把 (0,1) 映射到全實數
  3. Cox regression:log HR 就是迴歸係數
  4. Data transformation:讓右偏資料更對稱

下一章:我們會用導數來解決最佳化問題——找出讓函數最大或最小的點!