Geometry and Linear Algebraic Operations

통계학 | 수학/Math for ML

Geometry and Linear Algebraic Operations

채채씨 2022. 1. 12. 21:54

728x90

1. Geometry of vectors

Vector is a list of numbers
There are two types of vector: column vector, row vector
- data examples → column vectors
  - Although column vector is default, for making tabular dataset we can treat data examples as row vector in the matrix conventionally
- weights used to form weighted sums → row vectors

geometric interpretations of vectors
- points in space
- directions in space

points in space

considering tasks as collections of points in space
picturing the tasks as discovering how to seperate two distinct clusters of points

directions in space

v = [3, 2] → the location 3 units to the right and 2 units up from the origin (3 step to the right and 2 step up)

visualizing vector addition by following the directions given by one vector, and then following the directions given by the other

2. Dot Products and Angles

Dot Product

$\mathbf{u}^{\top} \mathbf{v}=\sum_{i} u_{i} \cdot v_{i}$
$\mathbf{u} \cdot \mathbf{v}=\mathbf{u}^{\top} \mathbf{v}=\mathbf{v}^{\top} \mathbf{u}$ (the notation of classical multiplication)

geometric interpertation : dot product is closely related to the angle between two vectors

consider two vectors v (length=r), w (length=s) → $\mathbf{v}=(r, 0)$ and $\mathbf{w}=(s \cos (\theta), s \sin (\theta))$
- $\mathbf{v} \cdot \mathbf{w}=r s \cos (\theta)=\|\mathbf{v}\|\|\mathbf{w}\| \cos (\theta)$
- $\theta=\arccos \left(\frac{\mathbf{v} \cdot \mathbf{w}}{\|\mathbf{v}\|\|\mathbf{w}\|}\right)$

import torch

def angle(v, w):
  return torch.acos(v.dot(w) / (torch.norm(v) * torch.norm(w)))

angle(torch.tensor([0, 1, 2], dtype=torch.float32), torch.tensor([2.0, 3, 4])) #tensor(0.4190)

※ orthogonal
$\theta=\pi / 2$ ($\theta=\pi / 2$) is same thing as $\cos (\theta)=0$
two vectors are orthogonal if and only if $\mathbf{v} \cdot \mathbf{w}=0$

"why is computing the angle useful?"
Consider an image and a duplicate image, where every pixel value is the same but 10% the brightness
- If we calculate the distance between the original image and the darker one, the distance can be large
- If we consider the angle, for any vector v, the angle between v and 0.1⋅v will be zero
- Scaling vectors keeps the same direction and just changes the length

2.1. Cosine Similarity

Cosine Similarity is used when the angle is employed to measure the closeness of two vectors,

$$
\cos (\theta)=\frac{\mathbf{v} \cdot \mathbf{w}}{\|\mathbf{v}\|\|\mathbf{w}\|}
$$

two vectors poin in the same direction → maximum value of 1
two vectors point in opposite directions → minimum value of 1
two vectors are orthogonal → a value of 0
if the components of high-dimensional vectors are sampled randomly with mean 0, their cosine will nearly always be close to 0

3. Hyperplanes

“what are the points v with w⋅v=1 ?”

$$
\|\mathbf{v}\|\|\mathbf{w}\| \cos (\theta)=1 \Longleftrightarrow\|\mathbf{v}\| \cos (\theta)=\frac{1}{\|\mathbf{w}\|}=\frac{1}{\sqrt{5}}
$$

$\|\mathbf{v}\| \cos (\theta)$ is the length of the projection of the vector v onto direction of w
y = 1 - 2x

Based on the orthogonal point v · w = 1, when it is v · w > 1, it is more similar to a vector(2, 1) and when it is v · w < 1, it is less similar to a vector (2, 1)

4. Geometry of Linear Transformations

a geometric understanding of linear transformations represented by matrices is also important

$$
\mathbf{A}=\left[\begin{array}{ll}
a & b \\
c & d
\end{array}\right]
$$

we multiple $\mathbf{v}=[x, y]^{\top}$

$$
\begin{aligned}
\mathbf{A} \mathbf{v} &=\left[\begin{array}{ll}
a & b \\
c & d
\end{array}\right]\left[\begin{array}{l}
x \\
y
\end{array}\right] \\
&=\left[\begin{array}{l}
a x+b y \\
c x+d y
\end{array}\right] \\
&=x\left[\begin{array}{l}
a \\
c
\end{array}\right]+y\left[\begin{array}{l}
b \\
d
\end{array}\right] \\
&=x\left\{\mathbf{A}\left[\begin{array}{l}
1 \\
0
\end{array}\right]\right\}+y\left\{\mathbf{A}\left[\begin{array}{l}
0 \\
1
\end{array}\right]\right\}
\end{aligned}
$$

For example,

$$
\mathbf{A}=\left[\begin{array}{cc}
1 & 2 \\
-1 & 3
\end{array}\right]
$$

$$
\mathbf{v}=[2,-1]^{\top}
$$

$\mathbf{v}=2 \cdot[1,0]^{\top}+-1 \cdot[0,1]^{\top}$
the matrix A will send vector v to $2\left(\mathbf{A}[1,0]^{\top}\right)+-1(\mathbf{A}[0,1])^{\top}=2[1,-1]^{\top}-[2,3]^{\top}=[0,-5]^{\top}$

All matrices can do is take the original coordinates on our space and skew, rotate, and scale them

5. Linear Dependence

consider specific matrix B

$$
\mathbf{B}=\left[\begin{array}{ll}
2 & -1 \\
4 & -2
\end{array}\right]
$$

This compresses the entire plane down to live on the single line y = 2x

$\mathbf{b}_{1}=[2,4]^{\top}$
$\mathbf{b}_{2}=[-1,-2]^{\top}$
transformed by the matrix BB as a weighted sum of the columns of the matrix $a_{1} \mathbf{b}_{1}+a_{2} \mathbf{b}_{2}$ → linear combination
$\mathbf{b}_{1}=-2 \cdot \mathbf{b}_{2}$

$$
a_{1} \mathbf{b}_{1}+a_{2} \mathbf{b}_{2}=-2 a_{1} \mathbf{b}_{2}+a_{2} \mathbf{b}_{2}=\left(a_{2}-2 a_{1}\right) \mathbf{b}_{2}
$$

$$
\mathbf{b}_{1}+2 \cdot \mathbf{b}_{2}=0
$$

a collection of vectors $\mathbf{v}_{1}, \ldots, \mathbf{v}_{k}$ are linearly dependent if there exist coefficients $a_{1}, \ldots, a_{k}$ not all equal to zero

$$
\sum_{i=1}^{k} a_{i} \mathbf{v}_{\mathbf{i}}=0
$$

a linear dependence in the columns of a matrix is a witness to the fact that our matrix is compressing the space down to some lower dimension
If there is no linear dependence we say the vectors are linearly independent ($a_{1}, \ldots, a_{k}$ all equal to zero). If the columns of a matrix are linearly independent, no compression occurs and the operation can be undone

6. Rank

Rank: maximum number of linearly independent vectors
Column Rank: maximum number of linearly independent column vectors.
Row Rank: maximum number of linearly independent row vectors.

$$
\mathbf{B}=\left[\begin{array}{ll}
2 & 4 \\
-1 & -2
\end{array}\right]
$$

▶ rank(B)=1

7. Invertibility

multiplication by a full-rank matrix (i.e., someA that is n×n matrix with rank n), we should always be able to undo it

When the above equation is satisfied, A is called invertable matrix and A-1 is called the inverse matrix

Consider specific matrix A

The inverse matrix is calculated as follows

8. Determinant

determinant = ad−bc
a matrix A is invertible if and only if the determinant is not equal to zero

import torch

torch.det(torch.tensor([[1, -1], [2, 3]], dtype=torch.float32)) #tensor(5.)

[reference]

https://d2l.ai/chapter_appendix-mathematics-for-deep-learning/geometry-linear-algebraic-ops.html

728x90

저작자표시 비영리