3-3 Ensemble Of Tree

1. Ensemble(์•™์ƒ๋ธ”)

  • An ensemble is a group of people who work or perform together.
  • ์•™์ƒ๋ธ”(ensemble)์€ ์ „์ฒด์ ์ธ ์–ด์šธ๋ฆผ์ด๋‚˜ ํ†ต์ผ. โ€˜์กฐํ™”โ€™๋กœ ์ˆœํ™”ํ•œ๋‹ค๋Š” ์˜๋ฏธ์˜ ํ”„๋ž‘์Šค์–ด
  • ์Œ์•…์—์„œ 2์ธ ์ด์ƒ์ด ํ•˜๋Š” ๋…ธ๋ž˜๋‚˜ ์—ฐ์ฃผ๋ฅผ ๋งํ•˜๋ฉฐ,
  • ๋ฎค์ง€์ปฌ์—์„œ ์ฃผ, ์กฐ์—ฐ ๋ฐฐ์šฐ๋“ค ๋’ค์—์„œ ํ™”์Œ์„ ๋„ฃ๊ณ  ์ถค์„ ์ถ”๊ณ  ๋…ธ๋ž˜๋ฅผ ๋ถ€๋ฅด๋ฉด์„œ ๋ถ„์œ„๊ธฐ๋ฅผ ๋‹๊ตฌ๋Š” ์—ญํ• 

ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€

ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€

2. Tree์˜ Ensemble

  • Decision Tree Algorithm
  • Ensemble methods are meta-algorithms that conbine several machine learning techniques into one predictive model
  • in order to decrease variance(bagging) and bias(boosting)

Ensemble methods can be divided into two groups

  • Parallel ensemble methods where the base learners are generated parallel (e.g Random Forest)
    • exploit independence between the base learners
    • since the error can be reduced dramatically by averaging
  • Sequential ensemble methods where the base learners are generated sequentially (e.g AdaBoost)
    • exploit the dependence between the base learners
    • The overall performance can be boosted by weighing previously mislabeld examples with higher weight

ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€

ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€

3. Bagging (โ†’Decrease Variance)

  • Bootstrap ๋ฐฉ์‹์˜ ์•™์ƒ๋ธ” ํ•™์Šต๋ฒ•
  • Bootstrap aggregating
  • Bagging์€ ๋ถ„์‚ฐ์„ ์ค„์ด๊ณ  overfitting์„ ํ”ผํ•˜๋„๋ก ํ•ด์คŒ
  • Decision Tree๋‚˜ Random Forest์— ์ ์šฉ๋˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ 

Bootstrap

  • ์ฒซ ๋ฒˆ์งธ ์ฐจ๋ก€์—์„œ ํŒŒ๋ž€ ๊ณต์ด ๋ฝ‘ํž ํ™•๋ฅ  = 1/3
  • ๋‘ ๋ฒˆ์งธ ์ฐจ๋ก€์—์„œ ํŒŒ๋ž€ ๊ณต์ด ๋ฝ‘ํž ํ™•๋ฅ  = 1/3
  • ์„ธ ๋ฒˆ์งธ ์ฐจ๋ก€์—์„œ ํŒŒ๋ž€ ๊ณต์ด ๋ฝ‘ํž ํ™•๋ฅ  = 1/3
  • 9๊ฐœ ๊ณต์ด ๋ชจ๋‘ ํŒŒ๋ž€์ƒ‰์ผ ์ˆ˜ ์žˆ๋‹ค

๊ฐœ๋ณ„ ๋ถ„๋ฅ˜๊ธฐ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋›ฐ์–ด๋‚œ ์ด์œ 

  • ์ดํ•ญ๋ถ„ํฌ์˜ ํ™•๋ฅ  ์งˆ๋Ÿ‰ ํ•จ์ˆ˜
  • ๋‹จ์ผ Tree์˜ Binary Class ํ™•๋ฅ ์— ๋Œ€ํ•ด N๊ฐœ์˜ Tree๋กœ ํ™•์žฅ ์‹œ์ผฐ์„ ๋•Œ์˜ ์˜ค์ฐจ์œจ
  • ๋”ฐ๋ผ์„œ ์•™์ƒ๋ธ”์˜ ์—๋Ÿฌ ํ™•๋ฅ ์€ ๊ฐœ๋ณ„ ๋ถ„๋ฅ˜๊ธฐ ๋ณด๋‹ค ํ•ญ์ƒ ์ข‹๋‹ค.
  • ๋‹ค๋งŒ ๊ฐœ๋ณ„ ๋ถ„๋ฅ˜๊ธฐ์˜ ์—๋Ÿฌ์œจ์ด 0.5์ด๋ฉด์„œ ์•™์ƒ๋ธ”ํ•  ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์ง์ˆ˜ ์ผ ๋•Œ, ์˜ˆ์ธก์ด ๋ฐ˜๋ฐ˜์œผ๋กœ ๋‚˜๋‰˜๋ฉด ์—๋Ÿฌ๋กœ ์ทจ๊ธ‰๋œ๋‹ค.
  • ๋”ฐ๋ผ์„œ ๊ฐœ๋ณ„ ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ๋ฌด์ž‘์œ„ ์ถ”์ธก(P(E)=0.5)๋ณด๋‹ค๋Š” ์„ฑ๋Šฅ์ด ์ข‹์•„์•ผ ํ•œ๋‹ค.

A Commonly used class of ensemble algorithms are forest of randomized trees

  • In random forests, each tree in the ensemble is built from a sample drawn with replacement (i.e a bootstrap sample) from the traning set.
  • In addition, instead of using all the features, a random subset of features is selected, further randomizing the tree
  • As a result, the bias of the forest increases slightly, but due to the averaging of less correlated trees (์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์ ์€ tree์˜ ํ‰๊ท ํ™” ๋•Œ๋ฌธ์—), its variance decreases, resulting in an overall better
  • ๊ฐ Tree์˜ Prediction์„ ๋ชจ์•„ ๋‹ค์ˆ˜๊ฒฐ ํˆฌํ‘œ(Majority voting)๋กœ ์˜ˆ์ธก

ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€

ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€

4. Boosting (โ†’Decrease Bias)

Boosting

  • the hypothesis boosting
  • Reduce Bias
  • Improve Performance

Compare with Bagging

  • ๊ณตํ†ต์ 
    • 1๊ฐœ์˜ Strong Learner๊ฐ€ ์•„๋‹Œ n๊ฐœ์˜ Weak Learner Model์˜ Ensemble ๋ฐฉ์‹
    • Weak Learner๋ฅผ ๋‹ค์ˆ˜๊ฒฐ ํˆฌํ‘œ(Majority voting)๋กœ ์—ฐ๊ฒฐ
  • ์ฐจ์ด์ 
    • ๊ฐ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ sampling์—์„œ ์ค‘๋ณต์„ ํ—ˆ์šฉํ•˜์ง€ ์•Š์Œ (Bootstrap ๋ฐฉ์‹ X)
    • ์ƒํ˜ธ ์˜์กด์ ์ธ ๊ฐœ๋ณ„ ๋ชจ๋ธ
    • ์ด์ „ Model์˜ ์˜ํ–ฅ์„ ๋ฐ›์•„์„œ Error์— ๋” ํฐ ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ

AdaBoost

Gradienc Boosting

  • AdaBoost์™€ Gradient Boosting์€ ์ „๋ฐ˜์ ์ธ ๊ฐœ๋…์ด ๊ฐ™๋‹ค.
  • Weak (๊นŠ์ด๊ฐ€ 1์ธ Decision Tree์™€ ๊ฐ™์€) Learner๋ฅผ Boostingํ•˜์—ฌ Strong Model์„ ๋งŒ๋“ ๋‹ค.

XGBoost

LightGBM

ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€

ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€

5. Quiz

  1. Variance๋ฅผ ๋‚ฎ์ถ”๊ธฐ ์œ„ํ•œ Ensemble ๋ฐฉ์‹์€? ๐Ÿคก

(a) Bagging ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ (b) Gradienc Descent
(c) Boostingใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ (d) Hyperparameter Tuning