๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

A PIECE OF DATA/๐Ÿ• ๋”ฅ๋Ÿฌ๋‹

[๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ดˆ] ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ์ดˆโ‘ก(logistic regression, Multinomial classification, learning rate


* ๋”ฅ๋Ÿฌ๋‹ session์€ Sung Kim๋‹˜์˜ ๊ฐ•์˜ (์œ ํŠœ๋ธŒ)๋ฅผ ์š”์•ฝ/์ •๋ฆฌํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.
** Sung Kim๋‹˜์˜ ๊ฐ•์˜์™€ ์ž๋ฃŒ๋Š” ์•„๋ž˜์˜ ์ž๋ฃŒ๋ฅผ ์ฐธ๊ณ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
- Andrew Ng's ML class
- Convolutional Neural Networks for Visual Recognition
- Tensorflow


CONTENTS

  • Logistic(regression) classification
  • Logistic(regression) classification cost function & gradient decent
  • Softmax classification: Multinomial classification
  • Learning rate, data preprocessing, overfitting

05-1. Logistic (regression) classification

RECAP

๊ฐ€์„ค ์„ธ์šฐ๊ธฐ
: W(weight) X(variable)

Cost๊ฐ’ ๊ตฌํ•˜๊ธฐ
: ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’ ์ฐจ์ด ๊ตฌํ•˜๊ธฐ

Gradient descent ์ ์šฉํ•˜๊ธฐ
: cost๊ฐ’์„ ์ตœ์†Œํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ์šธ๊ธฐ ์ฐพ๊ธฐ

Binary Classification(์ด์ง„ ๋ถ„๋ฅ˜)

: ์ด์ง„ ๋ถ„๋ฅ˜๋Š” y๊ฐ’์ด ๋‘ ๊ฐ€์ง€ ๊ฐ’์œผ๋กœ๋งŒ ๊ฐ–๋Š” ๊ฒƒ์œผ๋กœ ๊ฐ’์„ 0, 1๋กœ encoding ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์„ ๋งํ•œ๋‹ค. ์˜ˆ์‹œ๋Š” Yes/No, Pass/Non-Pass ๋“ฑ์ด ์žˆ๋‹ค. ์ผ๋ฐ˜ Linear regression์€ ์ˆซ์ž์˜ ๋ฒ”์œ„๊ฐ€ ๋„“๊ธฐ ๋•Œ๋ฌธ์— ๋‘ ๊ฐ€์ง€๋กœ ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ•จ์ˆ˜์˜ ๋ฒ”์œ„๋ฅผ 0~1๋กœ ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ํ•„์š”ํ•˜๋‹ค. ๊ฐ’์ด 0,1๋กœ ๋‚˜๋‰˜๋Š” ๊ฒฝ์šฐ, y๊ฐ’์ด 0~1์˜ ๊ฐ’์„ ๊ฐ–๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด์„œ sigmoid function์„ ์ ์šฉํ•œ๋‹ค.

Sigmoid Function(์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜)

: ํ•จ์ˆซ๊ฐ’์˜ ๋ฒ”์œ„๊ฐ€ 0~1์˜ ๊ฐ’์„ ๊ฐ–๋„๋ก ํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค. y๊ฐ’์ด ๋‚˜์˜ค๋ฉด 0.5 ๋ฏธ๋งŒ์€ 0์œผ๋กœ 0.5 ์ด์ƒ์ด๋ฉด 1๋กœ ์ƒ๊ฐํ•œ๋‹ค.


05-2. Logistic (regression) classification: cost function & gradient descent

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ๋ถ„์„(๋ถ„๋ฅ˜)์—์„œ์˜ COST ๋ฌธ์ œ

: ์„ ํ˜• ํšŒ๊ท€์—์„œ๋Š” cost๊ฐ€ 2์ฐจ ํ•จ์ˆ˜๋กœ ์ตœ์†Œ๊ฐ’์„ ์ฐพ๊ธฐ ์‰ฝ๊ฒŒ ๋˜์–ด์žˆ๋‹ค. ํ•˜์ง€๋งŒ sigmoid ํ•จ์ˆ˜๋ฅผ ์“ด ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ๋ถ„์„์€ ๊ตฌ๋ถˆ๊ตฌ๋ถˆํ•œ ๊ทธ๋ž˜ํ”„๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค. ์ด๋•Œ, gradient decent๋ฅผ ์ ์šฉํ–ˆ์„ ๋•Œ ์„ค์ •์— ๋”ฐ๋ผ ์ตœ์†Œ๊ฐ’์ด ์•„๋‹Œ local minimum์„ ์ฐพ์•„๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ๋ถ„์„(๋ถ„๋ฅ˜)์—์„œ์˜ Cost Fuction

: e์˜ ์ƒ๋ฐ˜๋œ ํ•จ์ˆ˜ log๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. y=1์ผ ๋•Œ, ๋งŒ์•ฝ์— H(x)๊ฐ€ 1๋กœ ์˜ˆ์ธกํ–ˆ๋‹ค๋ฉด cost function์€ 0์œผ๋กœ ์ตœ์†Œ๊ฐ€ ๋œ๋‹ค. y=1์ผ ๋•Œ, cost function์€ ๋ฌดํ•œ๋Œ€๋กœ ๊ฐ€๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค์‹œ ํ•™์Šตํ•˜๊ฒŒ ๋  ๊ฒƒ์ด๋‹ค. ์ด์ฒ˜๋Ÿผ ์ ์šฉํ•˜๋ฉด ๋˜‘๊ฐ™์ด ์ด์ฐจ ํ•จ์ˆ˜์ฒ˜๋Ÿผ ์ƒ์„ฑ๋˜๊ณ , gradient descent๋กœ ๊ธฐ์กด ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜์—ฌ ์ตœ์†Œ๊ฐ’์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.


06-1. Softmax classification: Multinomial classification

Multinomial classification, multinomial logistic regression(๋‹คํ•ญ ์„ ํ˜• ํ•จ์ˆ˜)

: y๊ฐ’์ด 3๊ฐœ ์ด์ƒ์˜ ๋ฒ”์ฃผ๋ฅผ ๊ฐ€์งˆ ๋•Œ ์‚ฌ์šฉํ•˜๋Š” ๋ถ„๋ฅ˜/์˜ˆ์ธก ๋ชจ๋ธ์ด๋‹ค.

๋‹คํ•ญ ์„ ํ˜• ํ•จ์ˆ˜ ์‹

: ์˜ˆ๋ฅผ ๋“ค์–ด ์•„๋ž˜์™€ ๊ฐ™์€ Y๊ฐ’์ด A, B, C๊ฐ€ ์žˆ๋‹ค๋ฉด ๋ถ„๋ฅ˜ ๋˜๋Š” ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” 3๊ฐœ์˜ ์‹์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด์ฒ˜๋Ÿผ 3๊ฐœ์˜ ์‹์„ ํ•˜๋‚˜์˜ MATRIX๋กœ ๋‚˜ํƒ€๋‚ด๊ฒŒ ๋˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.


06-2. Softmax classification: softmax and cost function

Multinomial์—์„œ sigmoid๋Š” ์–ธ์ œ ์‚ฌ์šฉํ•˜๋‚˜?

: multinomial classification/regression์—์„œ๋Š” sigmoid๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  softmax๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

Softmax ํ•จ์ˆ˜์˜ ํŠน์ง•
: ์˜ˆ์ธกํ•œ y๊ฐ’์„ 0~1 ์‚ฌ์ด๋กœ ๋‚˜ํƒ€๋‚ด๊ณ , ์˜ˆ์ธกํ•œ ๋ชจ๋“  y๊ฐ’์˜ ํ•ฉ์€ 1์ด ๋˜๊ฒŒ ํ•œ๋‹ค. one-hot encoding์„ ํ†ตํ•ด ๊ทธ์ค‘ ๊ฐ€์žฅ ํฐ ๊ฐ’์„ 1๋กœ ๋งŒ๋“ค๊ณ , ๋‚˜๋จธ์ง€๋Š” 0์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์ ์ ˆํ•œ y๊ฐ’์„ ๋ถ„๋ฅ˜/์˜ˆ์ธกํ•œ๋‹ค.


07-1. Learning rate, data preprocessing, overfitting

learning rate์ด๋ž€?

: gradient descent ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•  ๋•Œ ์ตœ์†Œ๊ฐ’์„ ์ฐพ๊ธฐ ์œ„ํ•œ step์˜ ๊ฐ„๊ฒฉ์ด๋‹ค. learning rate์ด ํด ๋•Œ๋Š” ์ตœ์†Œ๊ฐ’์„ ์ฐพ์ง€ ๋ชปํ•˜๊ณ  ๋›ฐ์–ด๋„˜์„ ์ˆ˜ ์žˆ๋Š” overshooting์˜ ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธธ ์ˆ˜ ์žˆ๋‹ค. ๋ฐ˜๋ฉด learning rate์ด ์ž‘์œผ๋ฉด ํ•™์Šต์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๊ฑฐ๋‚˜ local minimum์—์„œ ๋ฉˆ์ถ”๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ learning rate๋ฅผ ์„ค์ •ํ•  ๋•Œ๋Š” cost function์„ ๊ด€์ฐฐํ•˜๋ฉด์„œ reasonableํ•œ ๊ฐ’์„ ์ฐพ๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค.

Preprocessing(์ „์ฒ˜๋ฆฌ)์˜ ํ•„์š”์„ฑ?

: ํšจ๊ณผ์ ์œผ๋กœ gradient descent ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์ „์ฒ˜๋ฆฌํ•˜๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•˜๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋ณ€์ˆ˜ x1๊ฐ’์ด 0~10์‚ฌ์ด์ด๊ณ , ๋ณ€์ˆ˜ x2๊ฐ’์ด -10000์—์„œ 10000์‚ฌ์ด์ผ ๋•Œ cost function์€ ์™œ๊ณก๋ผ์„œ ๊ทธ๋ ค์ง€๊ณ , gradient descent๋Š” ์กฐ๊ธˆ๋งŒ ์ž˜๋ชป ์›€์ง์—ฌ๋„ ์ตœ์†Œ๊ฐ’์œผ๋กœ ๊ฐ€๋Š” ๊ธธ์„ ๋ฒ—์–ด๋‚  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ normalizing ์ •๊ทœํ™” ๋“ฑ์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๊ฐ€ ํŠน์ • ๋ฒ”์œ„ ์•ˆ์— ๋“ค์–ด๊ฐ€๋„๋ก ์ „์ฒ˜๋ฆฌํ•˜๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•˜๋‹ค.

Overfitting(๊ณผ์ ํ•ฉ)์ด๋ž€?

: ๊ณผ์ ํ•ฉ์€ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์…‹์— ๊ณผํ•˜๊ฒŒ ํ•™์Šต๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šต ๋ฐ์ดํ„ฐ๋Š” ์ž˜ ์˜ˆ์ธก/๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฐ˜๋ฉด ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก/๋ถ„๋ฅ˜์˜ ์„ฑ๋Šฅ์ด ๋‚ฎ์€ ๊ฒฝ์šฐ๋ฅผ ๋งํ•œ๋‹ค.

์˜ค๋ฅธ์ชฝ ๊ทธ๋ฆผ์ด overfitting

Overfitting(๊ณผ์ ํ•ฉ)์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•

- ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ
- ๋ณ€์ˆ˜(feature)์˜ ์ˆ˜๋ฅผ ์ค„์ด๊ธฐ
- ์ผ๋ฐ˜ํ™”(regularization)

Regularization(์ผ๋ฐ˜ํ™”)์ด๋ž€?

W(weight) ๊ฐ’์ด ์ปค์งˆ์ˆ˜๋ก cost function์˜ ๊ตด์ ˆ์ด ๋” ๊นŠ์–ด์ง„๋‹ค. ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(gradient descent)์„ ์ž˜ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ตด๊ณก์„ ํŽผ์น˜๊ธฐ ์œ„ํ•ด Weight๊ฐ’์„ ์ž‘๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜ํ™”์ด๋‹ค. ์ตœ์†Œ๊ฐ’์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋„๋ก lambda๋ฅผ ์กฐ์ ˆํ•˜์—ฌ ์ผ๋ฐ˜ํ™”๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค.


07-2. Learning and test data sets

๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํŒŒ์•…ํ•˜๋Š” ๋ฒ•?

: ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํŒŒ์•…ํ•  ๋•Œ๋Š” ์ด๋ฏธ ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ train ๋ฐ์ดํ„ฐ์™€ test ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆ ๋†“๊ณ  ์ง„ํ–‰ํ•œ๋‹ค. ๋˜ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” train data, validation data, test data๋กœ ์„ธ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒฝ์šฐ๋„ ์žˆ๋Š”๋ฐ validation ๋ฐ์ดํ„ฐ๋Š” gradient descent์—์„œ ์‚ฌ์šฉํ•˜๋Š” learning rate ๊ฐ’์ด๋‚˜ ์ผ๋ฐ˜ํ™”์— ์‚ฌ์šฉ๋˜๋Š” lambda๋ฅผ ํŠœ๋‹ํ•˜๋Š”๋ฐ ์‚ฌ์šฉํ•œ๋‹ค.

Online learning์ด๋ž€?

: ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์„ ๋•Œ ๋‚˜๋ˆ ์„œ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š” ๊ฒฝ์šฐ์ด๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด 100๋งŒ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค๋ฉด 10๋งŒ ๊ฐœ์”ฉ ํ•™์Šต์‹œํ‚ค๋ฉด์„œ ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด๊ฐ€๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. online learning์„ ํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” ํ•ญ์ƒ์€ ์•„๋‹ˆ์ง€๋งŒ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ์ž˜ ์˜ˆ์ธก/๋ถ„๋ฅ˜ํ•˜๋Š” ์žฅ์ ์ด ์žˆ๋‹ค.