๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

A PIECE OF DATA/๐Ÿ• ๋”ฅ๋Ÿฌ๋‹

[๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ดˆ] ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ์ดˆโ‘ (linear regression, Cost function, Multivariable linear regression)

* ๋”ฅ๋Ÿฌ๋‹ session์€ Sung Kim๋‹˜์˜ ๊ฐ•์˜ (์œ ํŠœ๋ธŒ)๋ฅผ ์š”์•ฝ/์ •๋ฆฌํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.

** Sung Kim๋‹˜์˜ ๊ฐ•์˜์™€ ์ž๋ฃŒ๋Š” ์•„๋ž˜์˜ ์ž๋ฃŒ๋ฅผ ์ฐธ๊ณ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

- Andrew Ng's ML class

- Convolutional Neural Networks for Visual Recognition

- Tensorflow


๋ชฉํ‘œ

๊ธฐ์ดˆ์ ์ธ ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ดํ•ด

- Linear regression, Logistic regression(classification)

- Neural networks, Convolutional Neural Network, Recurrent Neural Network


01. Machine Learning Basics

๋จธ์‹ ๋Ÿฌ๋‹์ด๋ž€?

๋จธ์‹ ๋Ÿฌ๋‹์€ ์‚ฌ๋žŒ์ด ๋ชจ๋“  ๊ทœ์น™์„ ๋งŒ๋“ค์–ด๋‚ผ ์ˆ˜ ์—†๋‹ค๋Š” ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ๋‚˜์˜จ ๋ฐฉ๋ฒ•์ด๋‹ค. ๋จธ์‹ ๋Ÿฌ๋‹์€ ๋ง ๊ทธ๋Œ€๋กœ ๊ธฐ๊ณ„๊ฐ€ ํ•™์Šตํ•œ๋‹ค๋Š” ๋œป์—์„œ ์ปดํ“จํ„ฐ๊ฐ€ ์ธ๊ฐ„์˜ ๋„์›€์ด ์—†์ด ์Šค์Šค๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์„ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๋ถ„์•ผ์ด๋‹ค.

 

๋จธ์‹ ๋Ÿฌ๋‹์˜ ๋‘ ๊ฐ€์ง€ ๋ฐฉ์‹

Supervised learning(์ง€๋„ ํ•™์Šต)

์ •๋‹ต์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ ์…‹์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๊ฒƒ

์˜ˆ์‹œ: ๊ณ ์–‘์ด, ๊ฐœ, ๋จธ๊ทธ์ปต, ๋ชจ์ž ๋“ฑ์˜ ์‚ฌ์ง„์„ ํ†ตํ•ด ํ•™์Šต์‹œํ‚จ ๋’ค ์‚ฌ์ง„์„ ๋ณด์—ฌ์ฃผ๊ณ  ์–ด๋–ค ์‚ฌ์ง„์ธ์ง€ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ

Unsupervised learning(๋น„์ง€๋„ ํ•™์Šต)

์ •๋‹ต์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์…‹์ด ์žˆ๋Š” ๊ฒƒ

์˜ˆ์‹œ: ์›Œ๋“œ ํด๋ผ์šฐ๋“œ, ๊ตฌ๊ธ€ ๋‰ด์Šค ๊ทธ๋ฃนํ™”

 

Supervised learning(์ง€๋„ ํ•™์Šต)์˜ ์œ ํ˜•

Regression

์‹œ๊ฐ„์— ๋”ฐ๋ผ ์‹œํ—˜ ์ ์ˆ˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ

Binary classification

์‹œ๊ฐ„์— ๋”ฐ๋ผ Pass/non-pass๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ

Multi-label classification

์‹œ๊ฐ„์— ๋”ฐ๋ผ ์‹œํ—˜ ์ ์ˆ˜(A, B, C, D, F)๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ


02. Linear Regression

Linear Regression ๋ฐฉ์‹

1. ๋ฐ์ดํ„ฐ์™€ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๊ณ  Linear regression ๋ชจ๋ธ์— ์ ํ•ฉํ• ์ง€ ๊ฐ€์„ค์„ ์„ธ์šด๋‹ค.

2.  H(x) ๊ฐ€์„ค์— ๋ฐ์ดํ„ฐ๋ฅผ ์ž˜ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š” W(weight), b(bias) ๊ฐ’์„ ์ฐพ๋Š”๋‹ค.

3. ๋ฐ์ดํ„ฐ๋ฅผ ์ž˜ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š” ๊ฐ’ ์ฐพ๋Š” ๊ณผ์ •์ธ Cost function์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ณผ์ •์„ ์ง„ํ–‰ํ•œ๋‹ค.

 

Cost fuction(Loss function)์ด๋ž€?

: ์˜ˆ์ธกํ•œ ๊ฐ’์ด ์‹ค์ œ ๊ฐ’๊ณผ ์–ผ๋งˆํผ ์ฐจ์ด๋ฅผ ๋ณด์ด๋Š”์ง€ ํ™•์ธํ•˜๋Š” ํ•จ์ˆ˜


03. How to minimize cost

Cost(=Loss) ํ•จ์ˆ˜์˜ ๊ทธ๋ž˜ํ”„

์ผ์ฐจ ํ•จ์ˆ˜์ธ H(xi)์˜ ์ œ๊ณฑ์„ ์ทจํ–ˆ์„ ๋•Œ costํ•จ์ˆ˜๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ์ด์ฐจ ํ•จ์ˆ˜ ํ˜•ํƒœ์˜ ๋ชจํ˜•์ด ๋œ๋‹ค.

 

๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์ด๋ž€?

ํ•จ์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•˜๊ณ  ๊ฒฝ์‚ฌ์˜ ์ ˆ๋Œ“๊ฐ’์ด ๋‚ฎ์€ ์ชฝ์œผ๋กœ ์ด๋™์‹œ์ผœ ๊ทน๊ฐ’์— ์ด๋ฅผ ๋•Œ๊นŒ์ง€ ๋ฐ˜๋ณต์‹œํ‚ค๋Š” ๊ฒƒ์ด๋‹ค. (์ถœ์ฒ˜: ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ• wikipedia)

ํ•™์Šต์„ ํ•˜๋Š” ๊ฒƒ์€ cost๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฐ’์„ ์ฐพ์•„๊ฐ€๋Š” ๊ณผ์ •์ด๋‹ค. cost function์„ ์ตœ์†Œํ™”ํ•˜๋Š”๋ฐ ์‚ฌ์šฉํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ Grdient descent algorithm(๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•)์ด๋‹ค.

 

๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์˜ ๋ฐฉ๋ฒ•

1. H(x)์˜ Weight๊ณผ b๋ฅผ ์ฐพ์•„์„œ ์‹ค์ œ ๊ฐ’๊ณผ ์˜ˆ์ธก๊ฐ’์˜ ์ฐจ์ด์ธ cost๋ฅผ ์ฐพ๋Š”๋‹ค.

2. ํŒŒ๋ผ๋ฏธํ„ฐ(W, b)๋Š” ๋ณ€๊ฒฝํ•ด๊ฐ€๋ฉด์„œ cost๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฐ’์œผ๋กœ ์„ ์ •ํ•œ๋‹ค.


04. Multivariable Linear Regression

Linear regression์„ ์„ค๊ณ„ํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ๊ฒƒ

- Hypothesis ์‹

- Cost function

- Gradient descent algorithm

 

Multivariable linear regression(๋‹คํ•ญ ์„ ํ˜• ํšŒ๊ท€)์ด๋ž€?

ํ•˜๋‚˜์˜ ๋ณ€์ˆ˜๋กœ ํ•˜๋‚˜์˜ ๊ฐ’์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด ์„ ํ˜• ํšŒ๊ท€๋ผ๋ฉด ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ณ€์ˆ˜๋กœ ํ•˜๋‚˜์˜ ๊ฐ’์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๊ฐ„๋‹จํ•œ ์˜ˆ์‹œ๋กœ ์ˆ˜ํ•™(X1), ์˜์–ด(X2), ๊ตญ์–ด(X3)์˜ ์ค‘๊ฐ„๊ณ ์‚ฌ ์ ์ˆ˜๋กœ ๊ธฐ๋ง๊ณ ์‚ฌ์˜ ์ ์ˆ˜(Y)๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๋งค๊ฐœ๋ณ€์ˆ˜(W, b)๊ฐ€ ๋งŽ์•„์ง€๋Š” ๊ฒƒ์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ• = Matrix ์‚ฌ์šฉ

๊ฐ€๋กœ๋กœ๋Š” ํ•œ ID๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์ด๊ณ  ์„ธ๋กœ๋Š” ํ•˜๋‚˜์˜ ๋ณ€์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์ด๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด 5๋ช…์˜ ํ•™์ƒ์˜ 3๊ฐ€์ง€ ์ค‘๊ฐ„๊ณ ์‚ฌ ๊ณผ๋ชฉ ์ ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ผ ๋•Œ x11, x12, x13์€ ์˜ํฌ์˜ ์ค‘๊ฐ„๊ณ ์‚ฌ ๊ตญ์–ด, ์ˆ˜ํ•™, ์˜์–ด ์ ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์ด๋‹ค.

H(๊ฐ€์„ค) = XW