This page will mainly be some notes and examples that I hope will be useful to my students.
I'm planning to update this page to serve as a general statistic reference.
Probability I
Probability Space
A Probability Space is a triple $(\Omega, \mathcal{F}, P)$ where
$\Omega$ is a sample space (a set of outcomes)
$\mathcal{F}$ is a sigma-algebra (a collection of subsets of $\Omega$)
$P$ is a probability measure (a function from $\mathcal{F}$ to $[0,1]$)
For events $E_1, E_2\in \mc{F}$, $E_1$ and $E_2$ are independent if
$$P(E_1\cap E_2)=P(E_1)P(E_2)$$
The
conditional probability $\mb{P}(E_1|E_2)$, is defined as
$$P(E_1|E_2)= \frac{P(E_1\cap E_2)}{P(E_2)}$$
assuming $E_2$ has non-zero probability.
If $$|\Omega|<\infty ,\quad \mathcal{F}=2^{\Omega}, \quad P(E)=\frac{|E|}{|\Omega|}$$ then $(\Omega,
\mathcal{F}, P)$ is a probability space.
Let $$\Omega=\{x_1x_2\cdots x_n :x_1\in \{0,1\} \},\quad\ \mc{F}=\sigma(\Omega)$$
$$ P(x_1x_2\cdots x_n)=p^j(1-p)^{n-j},\quad j = |\{x_i:x_i=1 \}|$$
then $(\Omega, \mathcal{F}, P)$ is a probability space.
Random Variables
A Random Variable is a measurable function
$$X: \Omega \to (R,\mc{R}) $$
where $(R,\mc{R})$ is a measurable space.
The probability that $X$ takes on a value in a measurable set $E\in \mc{R}$ is written
as
$$
\P{E}= P(X^{-1} E)
$$
If $|R|<\infty$, then $X$ is a discrete random variable
$$\P{E}=\sum_{x\in E} \P{X=x}\delta_x$$
If $|R|=\infty$, then $X$ is a
continuous random variable.
The density of $X$ is a measurable function $f_X$ such that
$$\P{X \in E}=\int_{X^{-1} E} d P=\int_E f_X(x)\ d x$$
$\mb{P}($ $< Z<$ $)\simeq$
If $$\Omega=\{1,2,\cdots,n\}^2,\quad\P{E}=\dfrac{|E|}{n^2} $$ $$
X(\omega)=\sum_{i=1}^2 \omega_i$$ then $X$ is a random variable.
$$\P{X\in (-\pi,\pi)} = \frac{|\{(1,1),(1,2),(2,1) \}|}{n^2}=\dfrac{3}{n^2}$$
Let $$\Omega=\{x_1x_2\cdots x_n :x_1\in \{0,1\} \},\quad\ \mc{F}=\sigma(\Omega)$$
$$ P(x_1x_2\cdots x_n)=p^j(1-p)^{n-j},\quad j = |\{x_i:x_i=1 \}|$$
$$
X(\omega)=\sum_{i=1}^n \omega_i$$ then $X$ is a random variable, known as binomial random
variable,
with
$$\P{X=k}=\left(\begin{array}{l}
n \\
k
\end{array}\right) p^k (1-p)^{n-k}, \qquad k\in\{0,1,\cdots,n\}$$
The probability density function
$$f(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}, \qquad x\in \R$$
defines the Gaussian Random Variable with mean $\mu$ and variance $\sigma^2$.
Moments
The set of all random variables
$$
\{X :\Omega \to \R \}
$$ is a vector space under the trivial addition and scalar multiplication operations.
We can always be enlarge $\Omega$ to accomodate new random variables.
Thus $\Omega$ is often ommited when talk about random variables.
Denote
$$
\ms{L}_2= \{ X:\Omega \to \R \mid \E[X^2]<\infty \} $$ then $\ms{L}_2$ is a Hilbert Space with the inner
product
$$
\langle X,Y \rangle = \E[XY]:= \iint_{\mathbb{R^2}} xy\ f_X f_Y\ d x d y
$$
$$
\norm{X}^2=\langle X,X \rangle
$$
This also allows us to view probability problems geometrically.
The $k$-th moment of $X$ is defined as
$$\mu_k=\E[X^k]:= \int_\Omega X(\omega)^k P(d\omega)=\int_\R x^k \P{dx}= \int_{\mathbb{R}} x^k f_X d x$$
$\mu_1$ is the mean of $X$, will also be denoted as $\mu_X$.
$\mu_2$ is the variance of $X$, will also be denoted as $\sigma_X^2$.
$\sigma_X$ is the standard deviation of $X$.
If $X\sim \operatorname{Bin}(n,p)$, then $$\mb{E}(X)=np$$
$$\sigma^2(X)=npq$$
Let $X:\Omega\to \R^{\ge 0}$
$$\E[X]= \int_0^\infty \P{(X\ge x)}\ d x$$
More generally, for any $n\in \N$, $X:\Omega\to \R$
$$\E[|X|^n] = \int_0^\infty \P{(|X|^n\ge x)}\ d x$$
$$
\begin{aligned}
\E[X]&= \int_\Omega X \ dP \\
&= \int_\Omega\left(\int_0^\infty 1[X>y] \ dy\right) \ dP\\
&= \int_0^\infty \int_\Omega 1[X>y] \ d P \ dy\\
&= \int_0^\infty \P{(X\ge y)}\ d y
\end{aligned}
$$
Join Distribution
Let $X=(x_1,x_2)$ where $x$ are two real-valued random variables,
then the joint distribution of $X$ is characterized by
$$
\P[X]{ [a,b]\times [c,d]} = \P{x_1\in [a,b], x_2\in [c,d]}
$$
The expectation of $f(X)$ is defined as
$$
\E[f(X)]=\int_{\R^2} f(x_1,x_2) \P[X]{dx_1 dx_2}
$$
The covariance of $X$ and $Y$ is defined as
$$\mathtt{Cov}(X, Y)=\mb{E}[(X-\mu_X)(Y-\mu_Y)]=\Inn{X-\mu_X}{Y-\mu_Y}$$
The correlation of $X$ and $Y$ is defined as
$$\rho(X,Y) =\frac{\mathtt{Cov}(X, Y)}{\norm{X}\norm{Y}} \color{blue}= \cos( \angle(X-\mu_X,Y-\mu_Y))$$
Discriptive Statistics
Moments
Central tendency
Mean :
$$\mu(X) = \frac{1}{n} \sum_{i=1}^n x_i$$
Median : assume the data is sorted
$$\mathtt{Med}(X)= \begin{cases}X_{\lceil \frac{n}{2}\rceil} & \text { if } \mathrm{n} \text { is odd } \\
\frac{X_{\lceil\frac{n-1}{2}\rceil}+X_{\lfloor\frac{n+1}{2}\rfloor}}{2} & \text { if } \mathrm{n} \text
{ is even }\end{cases}$$
$$\begin{cases} \mathtt{P}(-1< Z < 1) \simeq 0.68\\ \mathtt{P}(-2 < Z < 2) \simeq 0.95\\ \mathtt{P}(-3 < Z < 3)
\simeq 0.997 \end{cases}$$ The probability of getting a value between $-1$ and $1$ is more than half, and it
is
very unlikely to be outside of $(-2, 2)$.
Any computation for $ \mathscr{N}(\mu,\sigma)$ can be
converted
to a computation for $ \mathscr{N}(0,1)$ by
$$Z = \frac{X-\mu}{\sigma}$$
Results can be convert back to $ \mathscr{N}(\mu,\sigma)$ by
$$X = Z*\sigma + \mu$$
Covariance & Correlation
$\rho=$
In the book, the correlation coefficient is defined by the following
formula
$$\rho(X,Y)=\mb{E}\{ \frac{X-\mu_X}{\sigma_X} \frac{Y-\mu_Y}{\sigma_Y}\} = \frac{1}{n}
\sum_{i=1}^{n}\left(\frac{X_{i}-\mu_X}{\sigma_{X}}\right)\left(\frac{Y_{i}-\mu_Y}{\sigma_{Y}}\right) $$
A different, but equivalent, way to introduce the correlation coefficient, is to first define
covariance and consider correlation as a normalized version of it.
Regression
Regression Line
The regression line is the line passing through $(\mu_X,\mu_Y)$ with slope
$$\frac{\rho \cdot \sigma_Y}{\sigma_X}$$
Since this is a 100 level introductory class for students from all majors, we will not go into the
technical
details.
If interested, see the page on Linear Regression for
the
general case.
RMS Error for Regression
Root Mean Square Error (RMS) is a measure of the error between the predicted value and the actual value.
$$\mathtt{RMS} = \sqrt{\frac{\sum_{i=1}^n(\hat{y_i}-y_i)^2}{\mathtt{n}}}$$
In addition to the above, commonly accepted definition, the RMS error is also given by (at least in the one
dependent
and one independent variable case)
$$\sqrt{1-\rho^{2}} \sigma_y$$
Among all lines, the one that makes the smallest RMS error in predicting $y$ from $x$ is
the
regression line.
Tests of Significance
Given a set of points, $$x_1,\cdots,x_n$$ drawn from $\ms{N}(\mu,\sigma^2)$,
then $$\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim\ms{N}(0,1) $$
and $$\frac{\bar{X}-\mu}{S/\sqrt{n}}\sim\mathtt{T}(n-1)$$ Where $S^{2}=\frac{1}{n-1}
\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2} $ is the sample variance.
The second result is especially useful, since in practice $\sigma$ is often unknown, and we have to estimate
it
with the sample standard deviation, $S$.
As can be seem from the demo above, when $n>30$, the Student-T distribution is very close to
that
of the
Standard
Normal. Thus in practice, if $n>30$ then we use the $Z$-test regardless whether $\sigma$ is known.
Applications to Finance
CAPM
Let $R_f$ be the return of the risk-free asset, and $R_m$ be the return of the market portfolio.
$R_p$ is the return of a portfolio $p$
The Capital Asset Pricing Model (CAPM) is the following linear regression model:
$$
(R_p-R_f)=\alpha_p+\beta_{p}(R_m-R_f)
$$
$\alpha_p$ is the alpha of the portfolio $p$.
$\beta_{p}$ is the beta of the portfolio $p$.
Let $R_p, \sigma_p$ be the return and volatility of a portfolio.
The beta of portfolio $p$ with respect to the market is given by
$$\beta_{p}=\frac{\mathrm{Cov}(R_p,R_m)}{\sigma_m^2}= \frac{\rho_{p,m}\sigma_p}{\sigma_m}$$
Sharpe Ratio
Let $R_a, \sigma_a$ be the return and volatility of a asset, and $R_f$ be mean return of the risk-free
asset.
Then the Sharpe ratio of the portfolio is defined as
$$S_a=\frac{\E{R_a-R_f}}{\sigma_a}$$
The Sharpe ratio can be viewed as a standardized measure of expected return
$$
\text{Treynor ratio} = \frac{\E{R_a-R_f}}{\beta_a}
$$
$$
\text{Generalized Sharpe ratio} = \frac{\E{[R_a-R_b]}}{\sigma_a}
$$
Efficient Frontier
The Efficient Frontier is the collection of risk-return pairs $$
\{(\sigma_P,\E R_P) \mid\ !\exists P'\ :\ \E R_P = \E R_P' \wedge \sigma_{P'}<\sigma_P \}$$
Let $P$ be a risky portfolio, and $R_f$ be the return of the risk-free asset.
Let $C$ be a combination of $P$ and the risk-free asset.
The collection of all risk-return pair $$ (\sigma_C, \E(R_C) )$$
for all possible combinations $C$ gives the Capital Allocation Line (CAL).
For a given risky portfolio $P$, the CAL is given by the line
$$\E(R_C)=R_f+\sigma_C S_P$$
Let $CAL_T$ be the CAL tangent to the efficient frontier.
Let $P$ be the corresponding portfolio.
Then $P$ is the Tangency portfolio, and
$P$ has the highest Sharpe ratio among all portfolios.