통계학 | 수학/Math for ML

Geometry and Linear Algebraic Operations

채채씨 2022. 1. 12. 21:54
728x90
반응형

 

1. Geometry of vectors

 

  • Vector is a list of numbers
  • There are two types of vector: column vector, row vector
    • data examples → column vectors
      • Although column vector is default, for making tabular dataset we can treat data examples as row vector in the matrix conventionally
    • weights used to form weighted sums → row vectors
  • geometric interpretations of vectors
    • points in space
    • directions in space

 

points in space

  • considering tasks as collections of points in space
  • picturing the tasks as discovering how to seperate two distinct clusters of points

 

directions in space

  • v = [3, 2] → the location 3 units to the right and 2 units up from the origin (3 step to the right and 2 step up)

 

 

  • visualizing vector addition by following the directions given by one vector, and then following the directions given by the other

 

 


 

2. Dot Products and Angles

 

Dot Product

  • $\mathbf{u}^{\top} \mathbf{v}=\sum_{i} u_{i} \cdot v_{i}$
  • $\mathbf{u} \cdot \mathbf{v}=\mathbf{u}^{\top} \mathbf{v}=\mathbf{v}^{\top} \mathbf{u}$ (the notation of classical multiplication)

geometric interpertation : dot product is closely related to the angle between two vectors

 

 

  • consider two vectors v (length=r), w (length=s) → $\mathbf{v}=(r, 0)$ and $\mathbf{w}=(s \cos (\theta), s \sin (\theta))$ 
    • $\mathbf{v} \cdot \mathbf{w}=r s \cos (\theta)=\|\mathbf{v}\|\|\mathbf{w}\| \cos (\theta)$
    • $\theta=\arccos \left(\frac{\mathbf{v} \cdot \mathbf{w}}{\|\mathbf{v}\|\|\mathbf{w}\|}\right)$

 

import torch

def angle(v, w):
  return torch.acos(v.dot(w) / (torch.norm(v) * torch.norm(w)))

angle(torch.tensor([0, 1, 2], dtype=torch.float32), torch.tensor([2.0, 3, 4])) #tensor(0.4190)

 

※ orthogonal 
$\theta=\pi / 2$ ($\theta=\pi / 2$) is same thing as $\cos (\theta)=0$
two vectors are orthogonal if and only if $\mathbf{v} \cdot \mathbf{w}=0$

 

"why is computing the angle useful?"
Consider an image and a duplicate image, where every pixel value is the same but 10% the brightness
   - If we calculate the distance between the original image and the darker one, the distance can be large
   - If we consider the angle, for any vector v, the angle between v and 0.1⋅v will be zero
   - Scaling vectors keeps the same direction and just changes the length

 

 


 

2.1. Cosine Similarity

 

Cosine Similarity is used when the angle is employed to measure the closeness of two vectors, 

 

$$
\cos (\theta)=\frac{\mathbf{v} \cdot \mathbf{w}}{\|\mathbf{v}\|\|\mathbf{w}\|}
$$

 

  • two vectors poin in the same direction → maximum value of 1
  • two vectors point in opposite directions → minimum value of 1
  • two vectors are orthogonal → a value of 0
  • if the components of high-dimensional vectors are sampled randomly with mean 0, their cosine will nearly always be close to 0

 


 

3. Hyperplanes

 

what are the points v with w⋅v=1 ?”

 

$$
\|\mathbf{v}\|\|\mathbf{w}\| \cos (\theta)=1 \Longleftrightarrow\|\mathbf{v}\| \cos (\theta)=\frac{1}{\|\mathbf{w}\|}=\frac{1}{\sqrt{5}}
$$

 

 

  • $\|\mathbf{v}\| \cos (\theta)$ is the length of the projection of the vector v onto direction of w
  • y = 1 - 2x

 

2차원

 

3차원

 

Based on the orthogonal point v · w = 1, when it is v · w > 1, it is more similar to a vector(2, 1) and when it is v · w < 1, it is less similar to a vector (2, 1)

 


 

4. Geometry of Linear Transformations

 

a geometric understanding of linear transformations represented by matrices is also important

 

$$
\mathbf{A}=\left[\begin{array}{ll}
a & b \\
c & d
\end{array}\right]
$$

 

we multiple $\mathbf{v}=[x, y]^{\top}$

 

$$
\begin{aligned}
\mathbf{A} \mathbf{v} &=\left[\begin{array}{ll}
a & b \\
c & d
\end{array}\right]\left[\begin{array}{l}
x \\
y
\end{array}\right] \\
&=\left[\begin{array}{l}
a x+b y \\
c x+d y
\end{array}\right] \\
&=x\left[\begin{array}{l}
a \\
c
\end{array}\right]+y\left[\begin{array}{l}
b \\
d
\end{array}\right] \\
&=x\left\{\mathbf{A}\left[\begin{array}{l}
1 \\
0
\end{array}\right]\right\}+y\left\{\mathbf{A}\left[\begin{array}{l}
0 \\
1
\end{array}\right]\right\}
\end{aligned}
$$

 

For example,

 

$$
\mathbf{A}=\left[\begin{array}{cc}
1 & 2 \\
-1 & 3
\end{array}\right]
$$

 

$$
\mathbf{v}=[2,-1]^{\top}
$$

 

  • $\mathbf{v}=2 \cdot[1,0]^{\top}+-1 \cdot[0,1]^{\top}$
  • the matrix A will send vector v to $2\left(\mathbf{A}[1,0]^{\top}\right)+-1(\mathbf{A}[0,1])^{\top}=2[1,-1]^{\top}-[2,3]^{\top}=[0,-5]^{\top}$

 

 

  • All matrices can do is take the original coordinates on our space and skew, rotate, and scale them

 


 

5. Linear Dependence

 

consider specific matrix B

 

$$
\mathbf{B}=\left[\begin{array}{ll}
2 & -1 \\
4 & -2
\end{array}\right]
$$

 

This compresses the entire plane down to live on the single line y = 2x

  • $\mathbf{b}_{1}=[2,4]^{\top}$
  • $\mathbf{b}_{2}=[-1,-2]^{\top}$
  • transformed by the matrix BB as a weighted sum of the columns of the matrix  $a_{1} \mathbf{b}_{1}+a_{2} \mathbf{b}_{2}$ → linear combination
  • $\mathbf{b}_{1}=-2 \cdot \mathbf{b}_{2}$

 

$$
a_{1} \mathbf{b}_{1}+a_{2} \mathbf{b}_{2}=-2 a_{1} \mathbf{b}_{2}+a_{2} \mathbf{b}_{2}=\left(a_{2}-2 a_{1}\right) \mathbf{b}_{2}
$$

 

$$
\mathbf{b}_{1}+2 \cdot \mathbf{b}_{2}=0
$$

 

 a collection of vectors $\mathbf{v}_{1}, \ldots, \mathbf{v}_{k}$ are linearly dependent if there exist coefficients $a_{1}, \ldots, a_{k}$ not all equal to zero

 

$$
\sum_{i=1}^{k} a_{i} \mathbf{v}_{\mathbf{i}}=0
$$

 

  • a linear dependence in the columns of a matrix is a witness to the fact that our matrix is compressing the space down to some lower dimension  
  • If there is no linear dependence we say the vectors are linearly independent ($a_{1}, \ldots, a_{k}$ all equal to zero). If the columns of a matrix are linearly independent, no compression occurs and the operation can be undone

 


 

6. Rank

 

  • Rank: maximum number of linearly independent vectors
  • Column Rank: maximum number of linearly independent column vectors.
  • Row Rank: maximum number of linearly independent row vectors.

 

$$
\mathbf{B}=\left[\begin{array}{ll}
2 & 4 \\
-1 & -2
\end{array}\right]
$$

 

rank(B)=1

 


 

7. Invertibility

 

multiplication by a full-rank matrix (i.e., someA that is n×n matrix with rank n), we should always be able to undo it

 

 

When the above equation is satisfied, A is called invertable matrix and A-1 is called the inverse matrix

 

identity &amp;amp;amp;amp;amp;nbsp;matrix

 

Consider specific matrix A

 

The inverse matrix is calculated as follows

 

 


 

8. Determinant

 

 

  • determinant = ad−bc
  • a matrix A is invertible if and only if the determinant is not equal to zero

 

import torch

torch.det(torch.tensor([[1, -1], [2, 3]], dtype=torch.float32)) #tensor(5.)

 


 

[reference]

728x90
반응형