Deep Learning(深度学习)

Posted on 2019-05-04 Edited on 2020-01-08 In 深度学习

应用数学与机器学习基础

线性代数

@(特征分解)

方阵 $A$ 的特征向量(eigenvector)是指与 $A$ 相乘后相当于对该向量进行缩放的非零向量 $v$ : $Av = λv$ .
标量 $λ$ 被称为这个特征向量对应的特征值(eigenvalue)。

假设矩阵 $A$ 有 $n$ 个线性无关的特征向量 $\{v^{(1)},...,v^{(n)}\}$ ，对应着特征值 $\{λ_1,...,λ_n\}$ 。我们将特征向量连接成一个矩阵，使得每一列是一个特征向量: $V = [v^{(1)},...,v^{(n)}]$ 。类似地，我们也可以将特征值连接成一个向量 $\mathbb λ = [λ_1, . . . , λ_n]^⊤$ 。因此 $A$ 的特征分解(eigendecomposition)可以记作

$A = Vdiag(\mathbb \lambda)V^{-1}$ $A_{symetric} = Q\varLambda Q^T$

@(奇异值分解)[SVD]

我们将矩阵 A 分解成三个矩阵的乘积 $A = UDV^T$
假设 $A$ 是一个 $m×n$ 的矩阵，那么 $U$ 是一个 $m×m$ 的矩阵， $D$ 是一个 $m×n$ 的矩阵， $V$ 是一个 $n × n$ 矩阵。

这些矩阵中的每一个经定义后都拥有特殊的结构。矩阵 $U$ 和 $V$ 都定义为正交矩阵，而矩阵 $D$ 定义为对角矩阵。注意，矩阵 $D$ 不一定是方阵。

对角矩阵 $D$ 对角线上的元素被称为矩阵 $A$ 的 奇异值(singular value)。矩阵 $U$ 的列向量被称为 左奇异 向量(left singular vector)，矩阵 $V$ 的列向量被称 右奇异 向量(right singular vector)。

我们可以用与 $A$ 相关的特征分解去解释 $A$ 的奇异值分解。 $A$ 的左奇异向量(left singular vector)是 $AA^⊤$ 的特征向量。 $A$ 的右奇异向量(right singular vector)是 $A^⊤A$ 的特征向量。 $A$ 的非零奇异值是 $A^⊤A$ 特征值的平方根，同时也是 $AA^⊤$ 特征值的平方根。

SVD 最有用的一个性质可能是拓展矩阵求逆到非方矩阵。

@(Moore-Penrose 伪逆)

对于非方矩阵而言，其逆矩阵没有定义。假设在下面的问题中，我们希望通过矩阵 $A$ 的左逆 $B$ 来求解线性方程。 $Ax=b \to x = By$ 。取决于问题的形式，我们可能无法设计一个唯一的映射将 $A$ 映射到 $B$ 。

如果矩阵 $A$ 的行数大于列数，那么上述方程可能没有解。如果矩阵 $A$ 的行数小于列数，那么上述矩阵可能有多个解。

Moore-Penrose 伪逆(Moore-Penrose pseudoinverse)使我们在这类问题上取得了一定的进展。矩阵 A 的伪逆定义为
$\displaystyle A^+ = \lim_{\alpha \searrow 0}(A^TA + \alpha I)^{-1}A^T$
计算伪逆的实际算法没有基于这个定义，而是使用下面的公式:
$A^+ = VD^+U^T$

矩阵 $U$ ， $D$ 和 $V$ 是矩阵 $A$ 奇异值分解后得到的矩阵。对角矩阵 $D$ 的伪逆 $D^+$ 是其非零元素取倒数之后再转置得到的

当矩阵 $A$ 的列数多于行数时，使用伪逆求解线性方程是众多可能解法中的一种。特别地， $x = A^+y$ 是方程所有可行解中欧几里得范数 $∥x∥_2$ 最小的一个。
当矩阵 $A$ 的行数多于列数时，可能没有解。在这种情况下，通过伪逆得到的 $x$ 使得 $Ax$ 和 $y$ 的欧几里得距离 $∥Ax−y∥_2$ 最小。

@(迹运算)

迹运算返回的是矩阵对角元素的和: $Tr(A) = \sum_i A_{i,i}$
显然 $Tr(A) = Tr(A^T)$

有些矩阵运算很难描述，而通过矩阵乘法和迹运算符号可以清楚地表示。

例如，迹运算提供了另一种描述矩阵Frobenius范数的方式:
$||A||_F = \sqrt{Tr(AA^T)}$

多个矩阵相乘得到的方阵的迹，和将这些矩阵中的最后一个挪到最前面之后相乘的迹是相同的。

$Tr(ABC)=Tr(CAB)=Tr(BCA)$

或者更一般地，

$Tr(\prod_{i=1}^nF^{(i)}) = Tr(F^{(n)}\prod_{i=1}^{n-1}F^)$

即使循环置换后矩阵乘积得到的矩阵形状变了，迹运算的结果依然不变。假设 $A ∈ R^{m×n}, B ∈ R^{n×m}$ 则 $Tr(AB) = Tr(BA)$

@(行列式)

行列式，记作 $det(A)$ ，是一个将方阵 $A$ 映射到实数的函数。行列式等于矩阵特征值的乘积。行列式的绝对值可以用来衡量矩阵参与矩阵乘法后空间扩大或者缩小了多少。如果行列式是 0，那么空间至少沿着某一维完全收缩了，使其失去了所有的体积。如果行列式是1，那么这个转换保持空间体积不变。

$det(A) = \prod_{i=1}^n\lambda_i$

Computer Vision - A Modern Approach（现代计算机视觉）

Posted on 2019-04-23 Edited on 2020-01-29 In 计算机图形学

EARLY VISION: JUST ONE IMAGE

Linear Filters

Local Image Features

An object is separated from its background in an image by an occluding contour. Draw a path in the image that crosses such a contour. On one side, pixels lie on the object, and on the other, the background. Finding occluding contours is an important challenge, because the outline of an object—which is one cue to its shape—is formed by occluding contours.

COMPUTING THE IMAGE GRADIENT

For an image $I$ , the gradient is $\displaystyle \nabla I = (\frac{\partial I}{\partial x}, \frac{\partial I}{\partial y})^T$ , which we could estimate by observing that $\begin{cases}\frac{\partial I}{\partial x}\approx I_{i+1,y} - I_{i,j} \\ \frac{\partial I}{\partial y}\approx I_{i,y+1} - I_{i,j}\end{cases}$

These kinds of derivative estimates are known as finite differences. Image noise tends to result in pixels not looking like their neighbors, so that simple finite differences tend to give strong responses to noise. As a result, just taking one finite difference for $x$ and one for $y$ gives noisy gradient estimates. The way to deal with this problem is to smooth the image and then differentiate it

REPRESENTING THE IMAGE GRADIENT

There are two important representations of the image gradient.

The first is to compute edges, where there are very fast changes in brightness. These are usually seen as points where the magnitude of the gradient is extrema

The second is to use gradient orientations, which are largely independent of illumination intensity

FINDING CORNERS AND BUILDING NEIGHBORHOODS

Points worth matching are corners, because a corner can be localized, which means we can tell where a corner is. This motivates the more general term interest point often used to describe a corner.

DESCRIBING NEIGHBORHOODS WITH SIFT AND HOG FEATURES

We know the center, radius, and orientation of a set of an image patch, and must now represent it. Orientations should provide a good representation. They are unaffected by changes in image brightness, and different textures tend to have different orientation fields. The pattern of orientations in different parts of the patch is likely to be quite distinctive. Our representation should be robust to small errors in the center, radius, or orientation of the patch, because we are unlikely to estimate these exactly right.

Computer Vision - Models, Learning And Inference(计算机视觉:模型、学习和推理)

Posted on 2019-04-02 Edited on 2020-01-03 In 计算机视觉

Probability

Definition

Conditional probability

The conditional probability of $x$ given that $y$ takes value $y^∗$ tells us the relative propensity of the random variable $x$ to take different outcomes given that the random variable $y$ is fixed to value $y^*$ . This conditional probablity is written as $Pr(x, y = y^*)$ .
The conditional probability $Pr(x|y = y^∗)$ can be recovered from the joint distribution $Pr(x,y)$ .

In particular, we examine the appropriate slice $Pr(x,y = y^∗)$ of the joint distribution.The values in the slice tell us about the relative probability that $x$ takes various values having observed $y = y^∗$ , but they do not themselves form a valid probability distribution; they cannot sum to one as they constitute only a small part of the joint distribution which did itself sum to one. To calculate the conditional probability distribution, we hence normalize by the total probability in the slice

$Pr(x|y=y^*) = \frac{Pr(x, y = y^*)}{\int Pr(x, y=y^*)dx} = \frac{Pr(x, y = y^*)}{Pr(y=y^*)} \tag{1.0.1}$ $\to Pr(x|y) = \frac{Pr(x,y)}{Pr(y)} \to \begin{cases}Pr(x,y) = Pr(x|y) Pr(y) \\Pr(x,y) = Pr(y|x)Pr(x)\end{cases} \to Pr(x|y)Pr(y) = Pr(y|x) Pr(x)$ $\to \begin{aligned}Pr(\omega, x, y, z) &= Pr(\omega, x, y|z, z)Pr(z)\\&= Pr(\omega, x|y,z)Pr(y|z)Pr(z)\\&=Pr(\omega|x, y,z)Pr(x|y,z)Pr(y|z)Pr(z)\end{aligned}$

Expectation

Given a function $f [\cdot]$ that returns a value for each possible value $x^∗$ of the variable $x$ and a probability $P r(x = x^∗)$ that each value of $x$ occurs, we sometimes wish to calculate the expected output of the function. If we drew a very large number of samples from the probability distribution, calculated the function for each sample, and took the average of these values, the result would be the expectation.

The expected value of a function $f[\cdot]$ of a random variable $x$ is defined as
$E[f[x]] = \sum_xf[x]Pr(x) \tag{1.0.2}$ $E[f[x]] = \int f[x]Pr(x)dx \tag{1.0.2}$
for the discrete and continuous cases, respectively. This idea generalizes to functions $f[\cdot]$ of more than one random variable so that, for example $\displaystyle E[f[x,y]] = \int \int f[x,y]Pr(x,y)dxdy$

Special cases of expectation. For some functions $f(x)$ , the expectation $E[f(x)]$ is given a special name. Here we use the notation $\mu_x$ to represent the mean with respect to random variable $x$ and $\mu_y$ the mean with respect to random variable $y$ .

Function $f[\cdot]$ Expectation

$x$ mean, $\mu_x$

$x^k$ $k^{th}$ moment about zero

$(x-\mu_x)^k$ $k^{th}$ moment about the mean

$(x-\mu_x)^2$ variance

$(x-\mu_x)^3$ skew

$(x-\mu_x)^4$ kurtosis

$(x-\mu_x)(y - \mu_y)$ covariance of $x$ and $y$

Function $f[\cdot]$	Expectation
$x$	mean, $\mu_x$
$x^k$	$k^{th}$ moment about zero
$(x-\mu_x)^k$	$k^{th}$ moment about the mean
$(x-\mu_x)^2$	variance
$(x-\mu_x)^3$	skew
$(x-\mu_x)^4$	kurtosis
$(x-\mu_x)(y - \mu_y)$	covariance of $x$ and $y$

There are four rules for manipulating expectations, which can be easily proved from the original definition

The expected value of a constant $k$ with respect to the random variable $x$ is just the constant itself

$E[k] = k$

The expected value of a constant $κ$ times a function $f[x]$ of the random variable $x$ is $κ$ times the expected value of the function

$E[κf [x]] = κE[f [x]]$

The expected value of the sum of two functions of a random variable $x$ is the sum of the individual expected values of the functions

$E[f [x] + g[x]] = E[f [x]] + E[g[x]]$

The expected value of the product of two functions $f[x]$ and $g[y]$ of random variables $x$ and $y$ is equal to the product of the individual expected values if the variables $x$ and $y$ are independent

$E[f [x]g[y]] = E[f [x]]E[g[y]] \hspace{2em} \text{where x, y independent}$

With above rules, get the relationship between the second moment around zero and the second moment about the mean (variance)

$\begin{aligned}E[(x-\mu)^2] &= E[x^2 - 2x\mu + \mu^2] \\&= E[x^2] -2E[x\mu] + E[\mu^2] \\&= E[x^2] - 2\mu E[x] + E[\mu^2] \\ &= E[x^2] - 2E[x]E[x] + E[x]E[x] \\ &= E[x^2] - E[x]E[x]\end{aligned}$

Math Formula Theorem (常用数学公式定理)

Posted on 2019-03-21 Edited on 2020-02-04 In 数学理论

记录一些常用到的数学公式、定理以及分析方法.

Procrustes Analysis(普氏分析)

前言

选取N幅同类目标物体的二维图像，并法标注轮廓点，这样就得到训练样本集

\Omega = \{X_1, X_2, \cdots, X_N\}

由于图像中目标物体的形状和位置存在较大偏差，因此所得到的数据并不具有仿射不变性，需要对其进行归一化处理。这里采用Procrustes分析方法对样本集中的所有形状集合进行归一化。形状和位置的载体还是样本点的空间坐标。

定义

普氏分析法是一种用来分析形状分布的方法。数学上来讲，就是不断迭代，寻找标准形状(canonical shape)，并利用最小二乘法寻找每个样本形状到这个标准形状的仿射变化方式。

本书中，两个形状的归一化过程（一个形状为canonical shape，另一个为样本形状）：

求每个样本点 $i(i=1,2..,n)$ 在N幅图像中的均值

(\bar x_i, \bar y_i) = \bigg(\frac{1}{N}\sum_{j=1}^Nx_{ji}, \frac{1}{N}\sum_{j=1}^Ny_{ji}\bigg)

对所有形状的大小进行归一化，即将每个样本点减去其对应均值

(x_i', y_i') = (x_i - \bar x_i, y_i - \bar y_i)

根据去中心化数据，计算每幅图像中形状的重心，对于第 $i$ 幅图像，其重心为

(\bar x_i, \bar y_i) = \bigg(\frac{1}{n}\sum_{j=1}^nx_{ji}, \frac{1}{n}\sum_{j=1}^ny_{ji}\bigg)

根据重心和角度，将标准和样本形状对齐在一起，使得两个形状的普氏距离最小，下式为普氏距离定义

P_d^2 = \sum_{i=1}^n[(x_{i1} - x_{i2})^2 + (y_{i1} - y_{i2})^2]

这个第 $4$ 步的具体做法，不断迭代以下过程：

通过计算每幅图像中所有归一化样本点的平均值得到每个图像的标准形状canonical shape。
利用最小二乘法求每个图像中样本形状到标准形状的旋转角度。根据普氏距离的定义，也就是求

\min_{a,b}\sum_{i=1}^n\bigg|\bigg|\begin{bmatrix}a & -b\\ b & a\end{bmatrix}\begin{bmatrix}x_i \\ y_i\end{bmatrix}-\begin{bmatrix}c_x\\c_y\end{bmatrix}\bigg|\bigg|^2

其中的 $a$ 和 $b$ 表示仿射变换里旋转变化的参数：

\begin{bmatrix}a & -b\\ b & a\end{bmatrix} = \begin{bmatrix}kcos(\theta)& -ksin(\theta) \\ ksin(\theta) & kcos(\theta)\end{bmatrix}

对上式求偏导数，可以得到所求的a和b：

\begin{bmatrix}a \\ b\end{bmatrix} = \frac{1}{\Sigma_i(x_i^2 + y_i^2)}\sum_{i=1}^n\begin{bmatrix}x_ic_x + y_ic_y \\ x_ic_y - y_ic_x\end{bmatrix}

根据旋转参数，对样本形状做旋转变化，得到和标准形状对齐的新的形状

\begin{bmatrix}x' \\ y'\end{bmatrix} = \begin{bmatrix}a & -b \\ b & a\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix}

重复以上步骤，直到达到指定循环次数或者前后两次迭代之间canonical shape的绝对范数满足一定阈值

Vector Calculus(向量微积分)

Posted on 2018-12-17 Edited on 2019-12-18 In 高等数学

Vector Calcuclus

Vectors and Parametric Curves

Basic

Definition (Scalar, Point, Bi-point, Vector)

Scalar
A scalar $α ∈ \R$ is simply a real number.

Point, Bi-point
A point $r ∈ \R^2$ is an ordered pair of real numbers, $r = (x, y)$ with $x ∈ \R$ and $y ∈ \R$ . Here the first coordinate x stipulates the location on the horizontal axis and the second coordinate y stipulates the location on the vertical axis. Given two points r and r′ in $\R^2$ the directed line segment with departure point r and arrival point r′ is called the bi-point r, r′ and is denoted by [r,r′]. We say that r is the tail of the bi-point [r, r′] and that r′ is its head. The Euclidean length or norm of bi-point [a, b] is simply the distance between a and b and it is denoted by $||[a,b]|| = \sqrt{(a_1 - b_1)^2 + (a_2 - b_2)^2}$

Vector
A vector $\vec a ∈ \R^2$ is a codification of movement of a bi-point: given the bi-point [r, r′], we associate to it the vector $\overrightarrow {rr^′} = \begin{bmatrix}x' - x \\ y′ - y\end{bmatrix}$ stipulating a movement of $x' - x$ units from (x, y) in the horizontal axis and of $y' - y$ units from the current position in the vertical axis.The zero vector $\vec 0 = \begin{bmatrix}0 \\ 0\end{bmatrix}$ indicates no movement in either direction.

Pytorch Tutorial And Deep Learning(Pytorch与深度学习)

Posted on 2018-12-12 Edited on 2020-01-12 In 深度学习

深度学习框架pytorch

前言

所有代码都在当前版本pytorch下测试通过

1	import torch
2	print(torch.__version__)
3	print(torch.device('cuda' if torch.cuda.is_available() else 'cpu'))

1	1.1.0.post2
2	cpu

CS131 Homework

Posted on 2018-11-12 Edited on 2020-01-17 In 计算机视觉

Filters-Instagram

This notebook includes both coding and written questions. Please hand in this notebook file with all the outputs and your answers to the written questions.

# Setup
import numpy as np
import matplotlib.pyplot as plt
from time import time
#from skimage import io

from __future__ import print_function

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
%load_ext autoreload
%autoreload 2

Convolutions

@(Commutative Property)
Recall that the convolution of an image $f: ℝ^2→ℝ$ and a kernel $h: ℝ^2→ℝ$ is defined as follows:

$\begin{aligned}(f∗h)[m,n] &=∑_{i=−∞}^∞∑_{j=−∞}^∞f[i,j]⋅h[m−i,n−j] \tag{1.0.1}\\ &= (h∗f)[m,n] \\ &= ∑_{i=−∞}^∞∑_{j=−∞}^∞h[i,j]⋅f[m−i,n−j]\end{aligned}$

将式中积分变量 i 和 j 置换为 m−x 和 n−y，即可证明

@(Linear and Shift Invariance)
Let $f$ be a function $ℝ^2→ℝ$ . Consider a system $f\xrightarrow sg$ , where $g=(f∗h)$ with some kernel $h:ℝ^2→ℝ$ . Show that $S$ defined by any kernel $h$ is a Linear Shift Invariant (LSI) system. In other words, for any $h$ , show that $S$ satisfies both of the following:

$S[a⋅f1+b⋅f2]=a⋅S[f1]+b⋅S[f2]$ $\text{If } f[m,n]\xrightarrow sg[m,n] \text{ then } f[m−m_0,n−n_0]\xrightarrow sg[m−m_0,n−n_0]$

Machine Learning - A Probabilistic Perspective(概率论机器学习)

Posted on 2018-10-08 Edited on 2020-01-05 In 机器学习

Preface

With the ever increasing amounts of data in electronic form, the need for automated methods for data analysis continues to grow. The goal of machine learning is to develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest. Machine learning is thus closely related to the fields of statistics and data mining, but differs slightly in terms of its emphasis and terminology. This book provides a detailed introduction to the field, and includes worked examples drawn from application domains such as molecular biology, text processing, computer vision, and robotics.

Probability

probability theory

@(Joint probabilities)

We define the probability of the joint event A and B as follows:
$P(A,B) = P(A\cap B) = P(A|B)P(B)$
Given a joint distribution on two events P(A,B), we define the marginal distribution as follows:
$P(A) = \sum_bP(A,B)= \sum_bP(A|B=b)P(B=b)$

@(Mean and variance)

The most familiar property of a distribution is its mean, or expected value, denoted by $μ$ . For discrete rv’s, it’s defined as $E[X] = \sum_{x\in X}xp(x)$ , and for continuous rv’s, it’s defined as $E[X] = \int_Xxp(x)dx$ .
The variance is a measure of the “spread” of a distribution, denoted by $σ$ . This is defined as follows
$\begin{aligned}var[X] &= E[(X-\mu)^2] = \int(x-\mu)^2p(x)dx\\ &= \int x^2p(x)dx + \mu^2\int p(x)dx - 2\mu\int xp(x)dx \\ &= E[X^2] - \mu^2 \end{aligned}$
$\\ \to \\ E[X^2] = \mu^2 + var[X] = \mu^2 + \sigma^2$

Computer Vision Algorithms And Applications(计算机视觉算法与应用)

Posted on 2018-09-13 Edited on 2020-01-01 In 计算机视觉

Image Proccessing

Point Operators

@(Point Operators)[pixel||color|compositing]

@(Pixel)

Two commonly used point processes are multiplication and addition with a constant
$g(x) = af (x) + b \tag{1.0.1}$
The parameters $a > 0$ and $b$ are often called the gain and bias parameters; sometimes these parameters are said to control contrast and brightness.
The bias and gain parameters can also be spatially varying $g(x) = a(x)f(x) + b(x)$

Multiplicative gain (both global and spatially varying) is a linear operation, since it obeys the superposition principle $h(f_0 + f_1) = h(f_0) + h(f_1)$

Another commonly used dyadic (two-input) operator is the linear blend operator
$g(x) = (1-\alpha)f_0(x) + \alpha f_1(x) \tag{1.0.2}$

@(Compositing and matting)

Compositing equation $C = (1-\alpha)B + \alpha F$ .

This operator attenuates the influence of the background image B by a factor (1 − α) and then adds in the color (and opacity) values corresponding to the foreground layer F

The Elements of Statistical Learning(统计学习精要)

Posted on 2018-09-07 Edited on 2020-01-05 In 机器学习

Overview of Supervised Learning

Two Simple Approaches to Prediction: Least Squares and Nearest Neighbors

we develop two simple but powerful prediction methods: the linear model fit by least squares and the k-nearest-neighbor prediction rule. The linear model makes huge assumptions about structure and yields stable but possibly inaccurate predictions. The method of k-nearest neighbors makes very mild structural assumptions: its predictions are often accurate but can be unstable.

Linear Models and Least Squares

Given a vector of inputs $X^T = (X_1,X_2,...,X_p)$ , we predict the output $Y$ via the model
$\hat Y = \hat\beta_0 + \sum_{j=1}^pX_j\hat\beta_j \tag{1.0.1}$

The term $\hat\beta_0$ is the intercept, also known as the bias in machine learning.Often it is convenient to include the constant variable 1 in $X$ , include $\hat\beta_0$ in the vector of coefficients $\beta$ , and then write the linear model in vector form as an inner product
$\hat Y = X^T\hat\beta \tag{1.0.2}$
where $X^T$ denotes vector or matrix transpose ( $X$ being a column vector).Here we are modeling a single output, so $Y$ is a scalar; in general $Y$ can be a K–vector, in which case $β$ would be a $p×K$ matrix of coefficients.

In the least squares approach, we pick the coefficients $β$ to minimize the residual sum of squares

$RSS(\beta) = \sum_{i=1}^N(y_i - x_i^T\beta)^2 \\ \to \text{ in matrix notation }\\ RSS(\beta) = (Y - X\beta)^T(Y - X\beta)$
$Y = X\beta$ has no solution, so $X^TY = X^TX\beta \to X^T(Y - X\beta) = 0$ , if $X^TX$ is nonsingular, then the unique solution is given by
$\hat\beta = (X^TX)^{-1}X^TY \tag{1.0.3}$

1	# Setup
2	import numpy as np
3	import matplotlib.pyplot as plt
4	from time import time
5	#from skimage import io
6
7	from __future__ import print_function
8
9	%matplotlib inline
10	plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
11	plt.rcParams['image.interpolation'] = 'nearest'
12	plt.rcParams['image.cmap'] = 'gray'
13
14	# for auto-reloading extenrnal modules
15	%load_ext autoreload
16	%autoreload 2