# Elementary Statistics, Spring 2022

This page will mainly be some notes and examples that I hope will be useful to my students.

I'm planning to update this page to serve as a general statistic reference.

## Probability I

### Probability Space

A Probability Space is a triple $(\Omega, \mathcal{F}, P)$ where
1. $\Omega$ is a sample space (a set of outcomes)
2. $\mathcal{F}$ is a sigma-algebra (a collection of subsets of $\Omega$)
3. $P$ is a probability measure (a function from $\mathcal{F}$ to $[0,1]$)
For events $E_1, E_2\in \mc{F}$, $E_1$ and $E_2$ are independent if $$P(E_1\cap E_2)=P(E_1)P(E_2)$$ The conditional probability $\mb{P}(E_1|E_2)$, is defined as $$P(E_1|E_2)= \frac{P(E_1\cap E_2)}{P(E_2)}$$ assuming $E_2$ has non-zero probability.
• If $$|\Omega|<\infty ,\quad \mathcal{F}=2^{\Omega}, \quad P(E)=\frac{|E|}{|\Omega|}$$ then $(\Omega, \mathcal{F}, P)$ is a probability space.
• Let $$\Omega=\{x_1x_2\cdots x_n :x_1\in \{0,1\} \},\quad\ \mc{F}=\sigma(\Omega)$$ $$P(x_1x_2\cdots x_n)=p^j(1-p)^{n-j},\quad j = |\{x_i:x_i=1 \}|$$ then $(\Omega, \mathcal{F}, P)$ is a probability space.

### Random Variables

A Random Variable is a measurable function $$X: \Omega \to (R,\mc{R})$$ where $(R,\mc{R})$ is a measurable space.

The probability that $X$ takes on a value in a measurable set $E\in \mc{R}$ is written as $$\P{E}= P(X^{-1} E)$$
• If $|R|<\infty$, then $X$ is a discrete random variable $$\P{E}=\sum_{x\in E} \P{X=x}\delta_x$$
• If $|R|=\infty$, then $X$ is a continuous random variable.
The density of $X$ is a measurable function $f_X$ such that $$\P{X \in E}=\int_{X^{-1} E} d P=\int_E f_X(x)\ d x$$

$\mb{P}($ $< Z<$ $)\simeq$
• If $$\Omega=\{1,2,\cdots,n\}^2,\quad\P{E}=\dfrac{|E|}{n^2}$$ $$X(\omega)=\sum_{i=1}^2 \omega_i$$ then $X$ is a random variable. $$\P{X\in (-\pi,\pi)} = \frac{|\{(1,1),(1,2),(2,1) \}|}{n^2}=\dfrac{3}{n^2}$$
• Let $$\Omega=\{x_1x_2\cdots x_n :x_1\in \{0,1\} \},\quad\ \mc{F}=\sigma(\Omega)$$ $$P(x_1x_2\cdots x_n)=p^j(1-p)^{n-j},\quad j = |\{x_i:x_i=1 \}|$$ $$X(\omega)=\sum_{i=1}^n \omega_i$$ then $X$ is a random variable, known as binomial random variable, with $$\P{X=k}=\left(\begin{array}{l} n \\ k \end{array}\right) p^k (1-p)^{n-k}, \qquad k\in\{0,1,\cdots,n\}$$
• The probability density function $$f(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}, \qquad x\in \R$$ defines the Gaussian Random Variable with mean $\mu$ and variance $\sigma^2$.

### Moments

The set of all random variables $$\{X :\Omega \to \R \}$$ is a vector space under the trivial addition and scalar multiplication operations. We can always be enlarge $\Omega$ to accomodate new random variables. Thus $\Omega$ is often ommited when talk about random variables. Denote $$\ms{L}_2= \{ X:\Omega \to \R \mid \E[X^2]<\infty \}$$ then $\ms{L}_2$ is a Hilbert Space with the inner product $$\langle X,Y \rangle = \E[XY]:= \iint_{\mathbb{R^2}} xy\ f_X f_Y\ d x d y$$ $$\norm{X}^2=\langle X,X \rangle$$ This also allows us to view probability problems geometrically.
The $k$-th moment of $X$ is defined as $$\mu_k=\E[X^k]:= \int_\Omega X(\omega)^k P(d\omega)=\int_\R x^k \P{dx}= \int_{\mathbb{R}} x^k f_X d x$$
• $\mu_1$ is the mean of $X$, will also be denoted as $\mu_X$.
• $\mu_2$ is the variance of $X$, will also be denoted as $\sigma_X^2$.
$\sigma_X$ is the standard deviation of $X$.
• If $X\sim \operatorname{Bin}(n,p)$, then $$\mb{E}(X)=np$$ $$\sigma^2(X)=npq$$
Let $X:\Omega\to \R^{\ge 0}$ $$\E[X]= \int_0^\infty \P{(X\ge x)}\ d x$$ More generally, for any $n\in \N$, $X:\Omega\to \R$ $$\E[|X|^n] = \int_0^\infty \P{(|X|^n\ge x)}\ d x$$ \begin{aligned} \E[X]&= \int_\Omega X \ dP \\ &= \int_\Omega\left(\int_0^\infty 1[X>y] \ dy\right) \ dP\\ &= \int_0^\infty \int_\Omega 1[X>y] \ d P \ dy\\ &= \int_0^\infty \P{(X\ge y)}\ d y \end{aligned}

#### Join Distribution

Let $X=(x_1,x_2)$ where $x$ are two real-valued random variables, then the joint distribution of $X$ is characterized by $$\P[X]{ [a,b]\times [c,d]} = \P{x_1\in [a,b], x_2\in [c,d]}$$ The expectation of $f(X)$ is defined as $$\E[f(X)]=\int_{\R^2} f(x_1,x_2) \P[X]{dx_1 dx_2}$$ The covariance of $X$ and $Y$ is defined as $$\mathtt{Cov}(X, Y)=\mb{E}[(X-\mu_X)(Y-\mu_Y)]=\Inn{X-\mu_X}{Y-\mu_Y}$$ The correlation of $X$ and $Y$ is defined as $$\rho(X,Y) =\frac{\mathtt{Cov}(X, Y)}{\norm{X}\norm{Y}} \color{blue}= \cos( \angle(X-\mu_X,Y-\mu_Y))$$

## Discriptive Statistics

### Moments

#### Central tendency

Mean : $$\mu(X) = \frac{1}{n} \sum_{i=1}^n x_i$$ Median : assume the data is sorted $$\mathtt{Med}(X)= \begin{cases}X_{\lceil \frac{n}{2}\rceil} & \text { if } \mathrm{n} \text { is odd } \\ \frac{X_{\lceil\frac{n-1}{2}\rceil}+X_{\lfloor\frac{n+1}{2}\rfloor}}{2} & \text { if } \mathrm{n} \text { is even }\end{cases}$$

#### Dispersion

Variance : $$\sigma^2(X) = \frac{1}{n} \sum_{i=1}^n (x_i-\mu)^2$$ Standard Deviation : $$\sigma(X) :=\sqrt{\sigma^2}$$

#### Symmetry

Skewness : $$\gamma_1(X) = \frac{1}{n}\sum_{i=1}^n (\frac{x_i-\mu}{\sigma})^3$$

#### Shape

Kurtosis : $$\gamma_2(X) = \frac{1}{n}\sum_{i=1}^n (\frac{x_i-\mu}{\sigma})^4$$

### Normal Approximation

$$\begin{cases} \mathtt{P}(-1< Z < 1) \simeq 0.68\\ \mathtt{P}(-2 < Z < 2) \simeq 0.95\\ \mathtt{P}(-3 < Z < 3) \simeq 0.997 \end{cases}$$ The probability of getting a value between $-1$ and $1$ is more than half,
and it is very unlikely to be outside of $(-2, 2)$.
Any computation for $\mathscr{N}(\mu,\sigma)$ can be converted to a computation for $\mathscr{N}(0,1)$ by $$Z = \frac{X-\mu}{\sigma}$$ Results can be convert back to $\mathscr{N}(\mu,\sigma)$ by $$X = Z*\sigma + \mu$$

### Covariance & Correlation

$\rho=$
In the book, the correlation coefficient is defined by the following formula $$\rho(X,Y)=\mb{E}\{ \frac{X-\mu_X}{\sigma_X} \frac{Y-\mu_Y}{\sigma_Y}\} = \frac{1}{n} \sum_{i=1}^{n}\left(\frac{X_{i}-\mu_X}{\sigma_{X}}\right)\left(\frac{Y_{i}-\mu_Y}{\sigma_{Y}}\right)$$ A different, but equivalent, way to introduce the correlation coefficient, is to first define covariance and consider correlation as a normalized version of it.

### Regression

#### Regression Line

The regression line is the line passing through $(\mu_X,\mu_Y)$ with slope $$\frac{\rho \cdot \sigma_Y}{\sigma_X}$$ Since this is a 100 level introductory class for students from all majors, we will not go into the technical details. If interested, see the page on Linear Regression for the general case.

#### RMS Error for Regression

Root Mean Square Error (RMS) is a measure of the error between the predicted value and the actual value. $$\mathtt{RMS} = \sqrt{\frac{\sum_{i=1}^n(\hat{y_i}-y_i)^2}{\mathtt{n}}}$$ In addition to the above, commonly accepted definition, the RMS error is also given by (at least in the one dependent and one independent variable case) $$\sqrt{1-\rho^{2}} \sigma_y$$ Among all lines, the one that makes the smallest RMS error in predicting $y$ from $x$ is the regression line.

## Tests of Significance

Given a set of points, $$x_1,\cdots,x_n$$ drawn from $\ms{N}(\mu,\sigma^2)$, then $$\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim\ms{N}(0,1)$$ and $$\frac{\bar{X}-\mu}{S/\sqrt{n}}\sim\mathtt{T}(n-1)$$ Where $S^{2}=\frac{1}{n-1} \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}$ is the sample variance. The second result is especially useful, since in practice $\sigma$ is often unknown, and we have to estimate it with the sample standard deviation, $S$.

As can be seem from the demo above, when $n>30$, the Student-T distribution is very close to that of the Standard Normal. Thus in practice, if $n>30$ then we use the $Z$-test regardless whether $\sigma$ is known.

## Applications to Finance

### CAPM

• Let $R_f$ be the return of the risk-free asset, and $R_m$ be the return of the market portfolio.
• $R_p$ is the return of a portfolio $p$
The Capital Asset Pricing Model (CAPM) is the following linear regression model: $$(R_p-R_f)=\alpha_p+\beta_{p}(R_m-R_f)$$
• $\alpha_p$ is the alpha of the portfolio $p$.
• $\beta_{p}$ is the beta of the portfolio $p$.
Let $R_p, \sigma_p$ be the return and volatility of a portfolio.

The beta of portfolio $p$ with respect to the market is given by $$\beta_{p}=\frac{\mathrm{Cov}(R_p,R_m)}{\sigma_m^2}= \frac{\rho_{p,m}\sigma_p}{\sigma_m}$$

#### Sharpe Ratio

Let $R_a, \sigma_a$ be the return and volatility of a asset, and $R_f$ be mean return of the risk-free asset. Then the Sharpe ratio of the portfolio is defined as $$S_a=\frac{\E{R_a-R_f}}{\sigma_a}$$ The Sharpe ratio can be viewed as a standardized measure of expected return $$\text{Treynor ratio} = \frac{\E{R_a-R_f}}{\beta_a}$$ $$\text{Generalized Sharpe ratio} = \frac{\E{[R_a-R_b]}}{\sigma_a}$$

### Efficient Frontier

The Efficient Frontier is the collection of risk-return pairs $$\{(\sigma_P,\E R_P) \mid\ !\exists P'\ :\ \E R_P = \E R_P' \wedge \sigma_{P'}<\sigma_P \}$$
Let $P$ be a risky portfolio, and $R_f$ be the return of the risk-free asset.
Let $C$ be a combination of $P$ and the risk-free asset.

The collection of all risk-return pair $$(\sigma_C, \E(R_C) )$$ for all possible combinations $C$ gives the Capital Allocation Line (CAL).
For a given risky portfolio $P$, the CAL is given by the line $$\E(R_C)=R_f+\sigma_C S_P$$ Let $CAL_T$ be the CAL tangent to the efficient frontier.
Let $P$ be the corresponding portfolio.
Then $P$ is the Tangency portfolio, and $P$ has the highest Sharpe ratio among all portfolios.

### Partial Moments

#### Upside and Downside return and volatility

$$\mu_{k}^+=\E{[X|X\geq \tau]} \qquad \mu_{k}^-=\E{[X|X\leq \tau]}$$ $$\sigma_+(X,Y) = \E [ \max(X-\mu_X,0) \max(Y-\mu_Y,0) ]$$ $$\sigma_-(X,Y) = \E [ \min(X-\mu_X,0) \min(Y-\mu_Y,0) ]$$ $$\sigma^2_+(X) = \E [ \max(X-\mu_X,0)^2 ]\qquad \sigma^2_-(X) = \E [ \min(X-\mu_X,0)^2 ]$$ $$\rho_+(X,Y) = \frac{\sigma_+(X,Y)}{\sigma_+(X)\sigma_+(Y)}\qquad \rho_-(X,Y) = \frac{\sigma_-(X,Y)}{\sigma_-(X)\sigma_-(Y)}$$ Downside mean and standard deviation are measures if "risks".

Upside mean and standard deviation are measures of "rewards".

#### Upside and Downside Beta

$$\beta_+ = \frac{\sigma_+(Y)}{\sigma_+(X)}\rho_+(X,Y)$$ $$\beta_- = \frac{\sigma_-(Y)}{\sigma_-(X)}\rho_-(X,Y)$$