Preface
With the ever increasing amounts of data in electronic form, the need for automated methods for data analysis continues to grow. The goal of machine learning is to develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest. Machine learning is thus closely related to the fields of statistics and data mining, but differs slightly in terms of its emphasis and terminology. This book provides a detailed introduction to the field, and includes worked examples drawn from application domains such as molecular biology, text processing, computer vision, and robotics.
Probability
probability theory
@(Joint probabilities)
We define the probability of the joint event A and B as follows:
P(A,B)=P(A∩B)=P(A∣B)P(B)
Given a joint distribution on two events P(A,B), we define the marginal distribution as follows:
P(A)=∑bP(A,B)=∑bP(A∣B=b)P(B=b)
@(Mean and variance)
The most familiar property of a distribution is its mean, or expected value, denoted by μ. For discrete rv’s, it’s defined as E[X]=∑x∈Xxp(x), and for continuous rv’s, it’s defined as E[X]=∫Xxp(x)dx.
The variance is a measure of the “spread” of a distribution, denoted by σ . This is defined as follows
var[X]=E[(X−μ)2]=∫(x−μ)2p(x)dx=∫x2p(x)dx+μ2∫p(x)dx−2μ∫xp(x)dx=E[X2]−μ2
→E[X2]=μ2+var[X]=μ2+σ2
@(Transformations of random variables)
If x∼p() is some random variable, and y=f(x)=Ax+b, then
E[y]=E[Ax+b]=Aμ+b where μ=E[x]
cov[y]=cov[Ax+b]=AΣAT, where Σ=cov[x]
@(Multivariate change of variables *)
Let f be a function that maps Rn to Rn, and let y=f(x). Then its Jacobian matrix J is given by
Jx→y=∂(x1,⋯,xn)∂(y1,⋯,yn)=⎝⎜⎛∂x1∂y1⋮∂x1∂yn⋯⋱⋯∂xn∂y1⋮∂xn∂yn⎠⎟⎞.
|det J| measures how much a unit cube changes in volume when we apply f.
Gaussian models
Pdf for an MVN(multivariate norma) in D dimensions is defined by the following:
N(x∣μ,Σ)=(2π)D/2∣Σ∣1/21exp[−21(x−μ)TΣ−1(x−μ)](1.0.1)
The expression inside the exponent is the Mahalanobis distance between a data vector x and the mean vector μ, We can gain a better understanding of this quantity by performing an eigen-decomposition of Σ. That is, we write Σ=UΛUT , where U is an orthonormal matrix of eigenvectors satsifying UTU=I, and Λ is a diagonal matrix of eigenvalues.
Σ−1=U−TΛ−1U−1=UΛ−1UT=i=1∑Dλi1uiuiT
(x−μ)TΣ−1(x−μ)=(x−μ)T(i=1∑Dλi1uiuiT)(x−μ)=i=1∑Dλi1(x−μ)TuiuiT(x−μ)=i=1∑Dλiyi2
where yi=uiT(x−μ).Recall that the equation for an ellipse in 2d is λ1y12+λ2y22=1