Elementary Statistics, Spring 2022

This page will mainly be some notes and examples that I hope will be useful to my students.

I'm planning to update this page to serve as a general statistic reference.

Probability I

Probability Space

A Probability Space is a triple $(\Omega, \mathcal{F}, P)$ where
  1. $\Omega$ is a sample space (a set of outcomes)
  2. $\mathcal{F}$ is a sigma-algebra (a collection of subsets of $\Omega$)
  3. $P$ is a probability measure (a function from $\mathcal{F}$ to $[0,1]$)
For events $E_1, E_2\in \mc{F}$, $E_1$ and $E_2$ are independent if $$P(E_1\cap E_2)=P(E_1)P(E_2)$$ The conditional probability $\mb{P}(E_1|E_2)$, is defined as $$P(E_1|E_2)= \frac{P(E_1\cap E_2)}{P(E_2)}$$ assuming $E_2$ has non-zero probability.

Random Variables

A Random Variable is a measurable function $$X: \Omega \to (R,\mc{R}) $$ where $(R,\mc{R})$ is a measurable space.

The probability that $X$ takes on a value in a measurable set $E\in \mc{R}$ is written as $$ \P{E}= P(X^{-1} E) $$

$\mb{P}($ $< Z<$ $)\simeq$


The set of all random variables $$ \{X :\Omega \to \R \} $$ is a vector space under the trivial addition and scalar multiplication operations. We can always be enlarge $\Omega$ to accomodate new random variables. Thus $\Omega$ is often ommited when talk about random variables. Denote $$ \ms{L}_2= \{ X:\Omega \to \R \mid \E[X^2]<\infty \} $$ then $\ms{L}_2$ is a Hilbert Space with the inner product $$ \langle X,Y \rangle = \E[XY]:= \iint_{\mathbb{R^2}} xy\ f_X f_Y\ d x d y $$ $$ \norm{X}^2=\langle X,X \rangle $$ This also allows us to view probability problems geometrically.
The $k$-th moment of $X$ is defined as $$\mu_k=\E[X^k]:= \int_\Omega X(\omega)^k P(d\omega)=\int_\R x^k \P{dx}= \int_{\mathbb{R}} x^k f_X d x$$
  • $\mu_1$ is the mean of $X$, will also be denoted as $\mu_X$.
  • $\mu_2$ is the variance of $X$, will also be denoted as $\sigma_X^2$.
    $\sigma_X$ is the standard deviation of $X$.
Let $X:\Omega\to \R^{\ge 0}$ $$\E[X]= \int_0^\infty \P{(X\ge x)}\ d x$$ More generally, for any $n\in \N$, $X:\Omega\to \R$ $$\E[|X|^n] = \int_0^\infty \P{(|X|^n\ge x)}\ d x$$ $$ \begin{aligned} \E[X]&= \int_\Omega X \ dP \\ &= \int_\Omega\left(\int_0^\infty 1[X>y] \ dy\right) \ dP\\ &= \int_0^\infty \int_\Omega 1[X>y] \ d P \ dy\\ &= \int_0^\infty \P{(X\ge y)}\ d y \end{aligned} $$

Join Distribution

Let $X=(x_1,x_2)$ where $x$ are two real-valued random variables, then the joint distribution of $X$ is characterized by $$ \P[X]{ [a,b]\times [c,d]} = \P{x_1\in [a,b], x_2\in [c,d]} $$ The expectation of $f(X)$ is defined as $$ \E[f(X)]=\int_{\R^2} f(x_1,x_2) \P[X]{dx_1 dx_2} $$ The covariance of $X$ and $Y$ is defined as $$\mathtt{Cov}(X, Y)=\mb{E}[(X-\mu_X)(Y-\mu_Y)]=\Inn{X-\mu_X}{Y-\mu_Y}$$ The correlation of $X$ and $Y$ is defined as $$\rho(X,Y) =\frac{\mathtt{Cov}(X, Y)}{\norm{X}\norm{Y}} \color{blue}= \cos( \angle(X-\mu_X,Y-\mu_Y))$$

Discriptive Statistics


Central tendency

Mean : $$\mu(X) = \frac{1}{n} \sum_{i=1}^n x_i$$ Median : assume the data is sorted $$\mathtt{Med}(X)= \begin{cases}X_{\lceil \frac{n}{2}\rceil} & \text { if } \mathrm{n} \text { is odd } \\ \frac{X_{\lceil\frac{n-1}{2}\rceil}+X_{\lfloor\frac{n+1}{2}\rfloor}}{2} & \text { if } \mathrm{n} \text { is even }\end{cases}$$


Variance : $$\sigma^2(X) = \frac{1}{n} \sum_{i=1}^n (x_i-\mu)^2$$ Standard Deviation : $$\sigma(X) :=\sqrt{\sigma^2} $$


Skewness : $$\gamma_1(X) = \frac{1}{n}\sum_{i=1}^n (\frac{x_i-\mu}{\sigma})^3$$


Kurtosis : $$\gamma_2(X) = \frac{1}{n}\sum_{i=1}^n (\frac{x_i-\mu}{\sigma})^4$$

Normal Approximation

$$\begin{cases} \mathtt{P}(-1< Z < 1) \simeq 0.68\\ \mathtt{P}(-2 < Z < 2) \simeq 0.95\\ \mathtt{P}(-3 < Z < 3) \simeq 0.997 \end{cases}$$ The probability of getting a value between $-1$ and $1$ is more than half,
and it is very unlikely to be outside of $(-2, 2)$.
Any computation for $ \mathscr{N}(\mu,\sigma)$ can be converted to a computation for $ \mathscr{N}(0,1)$ by $$Z = \frac{X-\mu}{\sigma}$$ Results can be convert back to $ \mathscr{N}(\mu,\sigma)$ by $$X = Z*\sigma + \mu$$

Covariance & Correlation

In the book, the correlation coefficient is defined by the following formula $$\rho(X,Y)=\mb{E}\{ \frac{X-\mu_X}{\sigma_X} \frac{Y-\mu_Y}{\sigma_Y}\} = \frac{1}{n} \sum_{i=1}^{n}\left(\frac{X_{i}-\mu_X}{\sigma_{X}}\right)\left(\frac{Y_{i}-\mu_Y}{\sigma_{Y}}\right) $$ A different, but equivalent, way to introduce the correlation coefficient, is to first define covariance and consider correlation as a normalized version of it.


Regression Line

The regression line is the line passing through $(\mu_X,\mu_Y)$ with slope $$\frac{\rho \cdot \sigma_Y}{\sigma_X}$$ Since this is a 100 level introductory class for students from all majors, we will not go into the technical details. If interested, see the page on Linear Regression for the general case.

RMS Error for Regression

Root Mean Square Error (RMS) is a measure of the error between the predicted value and the actual value. $$\mathtt{RMS} = \sqrt{\frac{\sum_{i=1}^n(\hat{y_i}-y_i)^2}{\mathtt{n}}}$$ In addition to the above, commonly accepted definition, the RMS error is also given by (at least in the one dependent and one independent variable case) $$\sqrt{1-\rho^{2}} \sigma_y$$ Among all lines, the one that makes the smallest RMS error in predicting $y$ from $x$ is the regression line.

Tests of Significance

Given a set of points, $$x_1,\cdots,x_n$$ drawn from $\ms{N}(\mu,\sigma^2)$, then $$\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim\ms{N}(0,1) $$ and $$\frac{\bar{X}-\mu}{S/\sqrt{n}}\sim\mathtt{T}(n-1)$$ Where $S^{2}=\frac{1}{n-1} \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2} $ is the sample variance. The second result is especially useful, since in practice $\sigma$ is often unknown, and we have to estimate it with the sample standard deviation, $S$.

As can be seem from the demo above, when $n>30$, the Student-T distribution is very close to that of the Standard Normal. Thus in practice, if $n>30$ then we use the $Z$-test regardless whether $\sigma$ is known.

Applications to Finance


The Capital Asset Pricing Model (CAPM) is the following linear regression model: $$ (R_p-R_f)=\alpha_p+\beta_{p}(R_m-R_f) $$ Let $R_p, \sigma_p$ be the return and volatility of a portfolio.

The beta of portfolio $p$ with respect to the market is given by $$\beta_{p}=\frac{\mathrm{Cov}(R_p,R_m)}{\sigma_m^2}= \frac{\rho_{p,m}\sigma_p}{\sigma_m}$$

Sharpe Ratio

Let $R_a, \sigma_a$ be the return and volatility of a asset, and $R_f$ be mean return of the risk-free asset. Then the Sharpe ratio of the portfolio is defined as $$S_a=\frac{\E{R_a-R_f}}{\sigma_a}$$ The Sharpe ratio can be viewed as a standardized measure of expected return $$ \text{Treynor ratio} = \frac{\E{R_a-R_f}}{\beta_a} $$ $$ \text{Generalized Sharpe ratio} = \frac{\E{[R_a-R_b]}}{\sigma_a} $$

Efficient Frontier

The Efficient Frontier is the collection of risk-return pairs $$ \{(\sigma_P,\E R_P) \mid\ !\exists P'\ :\ \E R_P = \E R_P' \wedge \sigma_{P'}<\sigma_P \}$$
Let $P$ be a risky portfolio, and $R_f$ be the return of the risk-free asset.
Let $C$ be a combination of $P$ and the risk-free asset.

The collection of all risk-return pair $$ (\sigma_C, \E(R_C) )$$ for all possible combinations $C$ gives the Capital Allocation Line (CAL).
For a given risky portfolio $P$, the CAL is given by the line $$\E(R_C)=R_f+\sigma_C S_P$$ Let $CAL_T$ be the CAL tangent to the efficient frontier.
Let $P$ be the corresponding portfolio.
Then $P$ is the Tangency portfolio, and $P$ has the highest Sharpe ratio among all portfolios.

Partial Moments

Upside and Downside return and volatility

$$ \mu_{k}^+=\E{[X|X\geq \tau]} \qquad \mu_{k}^-=\E{[X|X\leq \tau]} $$ $$\sigma_+(X,Y) = \E [ \max(X-\mu_X,0) \max(Y-\mu_Y,0) ] $$ $$\sigma_-(X,Y) = \E [ \min(X-\mu_X,0) \min(Y-\mu_Y,0) ] $$ $$\sigma^2_+(X) = \E [ \max(X-\mu_X,0)^2 ]\qquad \sigma^2_-(X) = \E [ \min(X-\mu_X,0)^2 ] $$ $$\rho_+(X,Y) = \frac{\sigma_+(X,Y)}{\sigma_+(X)\sigma_+(Y)}\qquad \rho_-(X,Y) = \frac{\sigma_-(X,Y)}{\sigma_-(X)\sigma_-(Y)} $$ Downside mean and standard deviation are measures if "risks".

Upside mean and standard deviation are measures of "rewards".

Upside and Downside Beta

$$ \beta_+ = \frac{\sigma_+(Y)}{\sigma_+(X)}\rho_+(X,Y) $$ $$ \beta_- = \frac{\sigma_-(Y)}{\sigma_-(X)}\rho_-(X,Y) $$