ํŒŒ๋ผ๋ฏธํ„ฐ(Parameter)

-

  • ๋งค๊ฐœ๋ณ€์ˆ˜
  • ๋ชจ๋ธ์˜ ๊ตฌ์„ฑ ์š”์†Œ์ด์ž ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํ•™์Šต ๋˜๋Š” ๊ฒƒ.
  • ๋ชจ๋ธ ๋‚ด๋ถ€์—์„œ ๊ฒฐ์ •๋˜๋Š” ๋ณ€์ˆ˜

ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ(Hyper Parameter)

-

  • ํ•™์Šต ์‹œ์ž‘ ์ „ ๋ฏธ๋ฆฌ ๊ฐ’์„ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒƒ
  • ์‚ฌ์šฉ์ž ์ง€์ • ํŒŒ๋ผ๋ฏธํ„ฐ
  • ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ์ž๋™์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ธฐ์ˆ ์„ Auto ML
  • ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์ œ๊ณตํ•˜๋Š” ๊ธฐ๋ณธ๊ฐ’์„ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•ด ๋ชจ๋ธ train > ๊ต์ฐจ๊ฒ€์ฆ ์ˆ˜ํ–‰

๊ต์ฐจ ๊ฒ€์ฆ(Cross Validation)

  • ๊ณ ์ •๋œ train set๊ณผ test set์œผ๋กœ ํ‰๊ฐ€๋ฅผ ํ•˜๊ณ , ๋ฐ˜๋ณต์ ์œผ๋กœ ๋ชจ๋ธ์„ ํŠœ๋‹ํ•˜๋‹ค๋ณด๋ฉด test set์—๋งŒ ๊ณผ์ ํ•ฉ๋˜๋Š” ๊ฒฐ๊ณผ ๋ฐœ์ƒ
  • ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ๊ต์ฐจ๊ฒ€์ฆ์„ ํ†ตํ•ด ํ•ด๊ฒฐ
  1. k-๊ฒน ๊ต์ฐจ ๊ฒ€์ฆ(k-fold cross validation)

    • ๋ฐ์ดํ„ฐ๋ฅผ k๊ฐœ์˜ ๋ฐ์ดํ„ฐ ํด๋“œ๋กœ ๋ถ„ํ• 
    • ๊ฐ Iteration๋งˆ๋‹ค test set์„ ๋‹ค๋ฅด๊ฒŒ ํ• ๋‹นํ•˜์—ฌ ์ด k๊ฐœ์˜ ๋ฐ์ดํ„ฐ ํด๋“œ ์„ธํŠธ๋ฅผ ๊ตฌ์„ฑ

  1. ๋ฆฌ๋ธŒ-p-์•„์›ƒ ๊ต์ฐจ ๊ฒ€์ฆ(Leave-p-out cross validation)

    • ์ „์ฒด ๋ฐ์ดํ„ฐ(์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ๋“ค) ์ค‘์—์„œ p๊ฐœ์˜ ์ƒ˜ํ”Œ์„ ์„ ํƒํ•˜์—ฌ ๊ทธ๊ฒƒ์„ ๋ชจ๋ธ ๊ฒ€์ฆ์— ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•

  1. ๋ฆฌ๋ธŒ-p-์•„์›ƒ ๊ต์ฐจ ๊ฒ€์ฆ(Leave-p-out cross validation)

    • ๋ฆฌ๋ธŒ-p-์•„์›ƒ ๊ต์ฐจ ๊ฒ€์ฆ์—์„œ p=1์ผ ๋•Œ์˜ ๊ฒฝ์šฐ
    • ๋ฆฌ๋ธŒ-p-์•„์›ƒ ๊ต์ฐจ ๊ฒ€์ฆ๋ณด๋‹ค ๊ณ„์‚ฐ ์‹œ๊ฐ„ ๋ถ€๋‹ด ์ ์Œ

  1. ๊ณ„์ธต๋ณ„ k-๊ฒน ๊ต์ฐจ ๊ฒ€์ฆ(Stratified k-fold cross validation)

    • ์ฃผ๋กœ Classification ๋ฌธ์ œ์—์„œ ์‚ฌ์šฉ๋˜๋ฉฐ, label์˜ ๋ถ„ํฌ๊ฐ€ ๊ฐ ํด๋ž˜์Šค๋ณ„๋กœ ๋ถˆ๊ท ํ˜•์„ ์ด๋ฃฐ ๋•Œ ์œ ์šฉํ•˜๊ฒŒ ์‚ฌ์šฉ
    • ๋ฆฌ๋ธŒ-p-์•„์›ƒ ๊ต์ฐจ ๊ฒ€์ฆ๋ณด๋‹ค ๊ณ„์‚ฐ ์‹œ๊ฐ„ ๋ถ€๋‹ด ์ ์Œ

ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ๋ฐฉ๋ฒ•

####1. Grid Search

  • ๋ชจ๋ธ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋„ฃ์„ ์ˆ˜ ์žˆ๋Š” ๊ฐ’๋“ค์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ž…๋ ฅํ•œ๋’ค์— ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ์ฐพ๋Š” ํƒ์ƒ‰ ๋ฐฉ๋ฒ•
params = {'min_impurity_decrease': np.arange(0.0001, 0.001, 0.0001),
          'max_depth': range(5, 20, 1),
          'min_samples_split': range(2, 100, 10)
          }
 params 

output :

{โ€˜min_impurity_decreaseโ€™: array([0.0001, 0.0002, 0.0003, 0.0004, 0.0005, 0.0006, 0.0007, 0.0008, 0.0009]), โ€˜max_depthโ€™: range(5, 20),
โ€˜min_samples_splitโ€™: range(2, 100, 10)}

  • ๋‹จ์  : ๋ชจ๋“  ๊ฒฝ์šฐ์˜ ์ˆ˜๋ฅผ ๋‹ค ๋„ฃ๊ธฐ ๋•Œ๋ฌธ์— ์ˆซ์ž๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜๋ก ์‹œ๊ฐ„์ด ์•„์ฃผ ์˜ค๋ž˜ ๊ฑธ๋ฆฐ๋‹ค

####2. Random Search

  • ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์„ ๋žœ๋คํ•˜๊ฒŒ ๋„ฃ์–ด๋ณด๊ณ  ๊ทธ์ค‘ ์šฐ์ˆ˜ํ•œ ๊ฐ’์„ ๋ณด์ธ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ™œ์šฉํ•ด ๋ชจ๋ธ์„ ์ƒ์„ฑ
  • ๋ถˆํ•„์š”ํ•œ ํƒ์ƒ‰ ํšŸ์ˆ˜๋ฅผ ์ค„์ธ๋‹ค
    params = {'min_impurity_decrease': uniform(0.0001, 0.001),
            'max_depth': randint(20, 50),
            'min_samples_split': randint(2, 25),
            'min_samples_leaf': randint(1, 25),
            }         
    params  
    

output :

{โ€˜min_impurity_decreaseโ€™: <scipy.stats._distn_infrastructure.rv_frozen at 0x7fed449f9fd0>, โ€˜max_depthโ€™: <scipy.stats._distn_infrastructure.rv_frozen at 0x7fed44d52be0>, โ€˜min_samples_splitโ€™: <scipy.stats._distn_infrastructure.rv_frozen at 0x7fed449f9130>, โ€˜min_samples_leafโ€™: <scipy.stats._distn_infrastructure.rv_frozen at 0x7fed44befd90>}

https://colab.research.google.com/github/rickiepark/hg-mldl/blob/master/5-2.ipynb#scrollTo=dYI3HwMQbtnr