Multivariate Linear Regression and the Octave

Posted by Jerry on October 31, 2016

Enviroment Setup Instructions

Multivariate Linear Regression

Mutiple Features

Gradient Descent for Multiple Variables

Gradient Descent in Practice (Feature Scaling)

Make sure features are on a similar scale.

Get every Feature into approximately a -1 ≤ xi ≤ 1 range 将数据集(训练集)进行缩放,原因是直接使用原始的数据集进行梯度下降,迭代次数会非常多而且相当复杂。

Mean normalization Replace xi with xi - ui (average value) / si (range) to make features have approximately zero mean (Do not apply to x0 = 1)

Gradient Descent in Practice (Learning Rate)

  1. “Debugging”: How to make sure gradient descent is working correctly

example automatic convergence test: Declare convergence if J(θ) decrease by less than 10 -3 in one iteration Gradient descent not working : use smaller α

α 很足够小,那么每次迭代后的梯度都会下降,但是收敛的速度很慢 α 太大了,不能够每次迭代梯度都能下降,可能造成不收敛(may not converge or slow converge)

  1. How to choose learning rate α

Features and polynomial Regression(多项式回归)

That is, how to fit a polynomial, like a quadratic function, or a cubic function, to your data

Computing Parameters Analytically

Normal Equation(标准方程法)

Normal Equation: Method to solve for θ Analytically. θ=(XTX)-1XTy

m training examples, n features the disadvantages and advantages between Gradient Descent and Normal Equation Gradient Descent Normal Equation • Need to choose α • No need to choose α
• Needs many iterations. • Don’t need to iterate. • Works well even • Need to compute (XTX)-1 时间复杂度 O(n3) when n is large. • Slow if n is very large (N>10000)

Normal Equation Noninvertibility

Normal Equation : θ=(XTX)-1XTy what if XTX is non-invertible?(singular/degenerate)

  1. Redundant features(linearly dependent). E.g. x1=size in feet 2 x2 = size in m2
  2. Too many features(e.g.m≤n) · Delete some features, or use regularization
prinv(x'*x)*x'*y
pinv and inv
pinv函数可以伪逆(Return the pseudoinverse of X.)

Octave Tutorial

format long/short
v = 1:0.1:2
v = 1:6
ones(1,3)
C = 2*ones(2,3)
v = rand(3,3) 随机随机矩阵
w = randn(1,1000)高斯随机矩阵
eye(4)单位矩阵
help rand
size(A)返回矩阵大小,size(A,1)返回矩阵的行数
who/whos
v = princeY(1:10)
load princeY.dat
save hello.mat v
save hello.txt v -ascii

A = [1 2;3 4;5 6]
A(3,2)
A(2,:)
A = [A,[100;101;102]]
A(:)

A*C
A.*C
abs(v)
log(v)
exp(v)
A'
max(a)
[val,ind]=max(a)
max(A)
magic(Number)

sum
prod
floor
ceil
max(A,[],1)
max(max(A))
sum(sum(A.*eye(9)))
flipud(eye(9))

Word List

Feature Scaling: 特征缩放 Mean normalization:均值归一化 iteration: 迭代 converge:使收敛 Noninvertibility:不可逆性 round off your answer to two decimal :保留两位小数 unvectorized/verctorized : 未向量化的/向量化的