Elementary Statistics (STAT 201)¶

Probability Distribution¶

What is probability distribution?¶

Recall Bell-shape distribution

Random Variable¶

A random variable is a numerical measurement of the outcomes of a random phenomenon.
- We usually use a capital letter, such as X, to denote the random variable, and a lowercase letter, such as x, to denote the specific values it takes on.

Random Variable - example¶

Tossing a coin: we could get Heads or Tails.
Let's give them the values $\operatorname{Heads} = 0$ and $\operatorname{Tails} = 1$ and we have a Random Variable.

Defination of Proablility Distribution¶

The probability distribution of a random variable specifies its possible values and their probabilities.
Probability distribution describes the population.

Recall the defination of statistic & parameter¶

Statistic: numerical summary of a sample.
Parameter: numerical summary of a population.

Defination of Proablility Distribution¶

We call the numerical summaries of a probability distribution parameters, often denoted by Greek letters.
- The expected value or (theorectical) mean, $\mu$, measures the center of the distribution.
- The standard deviation, $\sigma$, or the variance, $\sigma^2$ measures the variability of the distribution.

Distinguish between sample mean $\bar{x}$ and expected value $\mu$¶

Sample mean $\bar{x}$ describes the center of a sample.
Expected value $\mu$ describes the center of a population/distribution.

Distinguish between sample standard deviation $s$ and sample standard deviation $\sigma$¶

Sample standard deviation $s$ describes the variability of a sample.
Standard deviation $\sigma$ describes the variability of a population/distribution.

Why greek letters?¶

Because we don't know their values in reality. We need to estimate them.
More distinguishable.

A brief summary of previous chapers¶

In chpater 1-3, we studied numerical summaries and visulization of a sample.
Now in chapter 5 and 6, we take a deep dive into population.

Discrete Random Variables¶

A discrete random variable $X$ takes a collection of distinct values (such as $0,1, 2, \dots$). Its probability distribution assigns a probability $P(x)$ to each possible value $x$.
The probability distribution is valid if
- For each $x$, $0 \leq P(x) \leq 1$
- The probabilities for all the possible $x$ values sum up to 1

Expected value/Mean of Discrete Random Variables¶

$$\mu =\sum xP(x)$$
Remark:
- The sum $\sum xP(x)$ is called the weighted average or weighted sum. The probabilities $P(x)$ are the weights given to each value of $x$.
- $\mu$ is not observable, but is estimable.

For a investor...¶

Suppose you are a investor, there are 2 investment options for you:
- Option 1: Whole life investment, each year you will earn \$100 for sure.
- Option 2: Whole life investment, each year you will earn:
  - With probability 50%, lose \$200.
  - With probability 50%, earn \$1000.

Let's calculate the mean return each year¶

Option 1: $\mu = \$100 $
Option 2: $ \mu = -200 \times 50\% + 1000 \times 50\% = \$400 $
On average, the return of option 2 is much higher than option 1.

Continuous Random Variable - the defination¶

A continuous random variable has possible values that form an interval.
Its probability distribution is specified by a density curve, which determines the probability that the random variable falls in any particular interval of values.

Continuous Random Variable - properties¶

The probability of a specific interval is the area under the density curve over that interval.
Each interval has probability between 0 and 1.
The total area under the curve is 1.

Continuous Random Variable - properties¶

For a continuous random variable $X$, $P(X = x) = 0$ for any $x$. And hence we only consider the following types of probabilities:

Left-tail: $P(X < x)$
Right-tail: $P(X > x)$
In-between: $P(a < X < b)$

Continuous Random Variable - An example¶

For instance, the figure below is the density curve for a normal distribution (bell-shape) with mean $\mu = 0$ and standard deviation $\sigma = 1$. The shaded area represents the probability that $X$ is in between −2 and 1, which we will write as $P(−2 < X < 1)$.

An important instance of discrete distrbution : Binomial Distribution¶

We call a random trial with two possible outcomes, success or failure, a Bernoulli trial. Define a parameter $p = P(\operatorname{success})$, the probability of success.
Consider $X$ = number of suceess. Distribution of $X$ could be represented as:

$x$	$P(x)$
1	$p$
0	$1-p$

An example : toss a coin, let $X$ = the number of heads.

Binomial Distribution - the assumptions¶

Consider a sequence of $n$ Bernoulli trials which satisfy the following conditions:
- Each trial has two possible outcomes, a success or a failure.
- Each trial has the same probability of success, which is denoted by $p$.
- The $n$ trials are independent.

Binomial Distribution - the defination¶

Let the random variable $X$ = number of successes. $X$ follows a binomial distribution with parameters $n$ and $p$, which we write as

$$X \sim binomial(n, p)$$

The tilde $\sim$ represents the word follows. Possible values of $X$ are $x = 0, 1, 2, \dots, n$
Example : toss a coin $n$ times, let $X$ = the number of heads.

Binomial Distribution - the probability calculation¶

Suppose $X \sim binomial(n, p)$. For a specific number of successes $x = 0, 1,\dots, n$, the probability of $X = x$ is given by $$ P(X = x) = \frac{n!}{x!(n-x)!} p^x (1-p)^{n-x}$$
Remark:
- $n!$ is called n-factorial. $n! = 1 \times 2 \times \dots \times (n−1) \times n$.
- We define $0!=1$

Binomial Distribution - mean and standard deviation¶

Suppose $X \sim binomial(n, p)$, then the mean and standard deviation of $X$:

$$ \mu = np $$

$$ \sigma = \sqrt{np(1-p)} $$

An important continous ditribution : Standard Normal distribution¶

Standard normal distribution is a continuous probability distribution that is symmetric about its mean $\mu = 0$ and has standard deviation $\sigma = 1$.
We use the letter $Z$ exclusively to denote the random variable following standard normal distribution, which we write as

$$Z \sim \mathcal{N} (0, 1)$$

Standard Normal Distribution -- properties¶

The random variable $Z$ can take on any real number between $-\infty$ and $+\infty$. We call the specific values $Z$ takes on the z-scores (recall z-scores in Chapter 2).
Because of the symmetry, it has the following properties:
- $P(Z < 0) = P(Z > 0) = \frac{1}{2}$
- $P(Z > z) = P(Z < −z)$
- $P(Z < z) + P(Z < −z) = 1$

Standard Normal Distribution -- percentile¶

The $(100 × p)th$ percentile is the z-score with left-tail probability $P(Z < z) = p$.
For instance, since $P(Z < 0) = 0.5$, $z = 0$ is the $50th$ percentile of $\mathcal{N}(0, 1)$.

Standard Normal Distribution -- z-table¶

How do we calculate $P(Z < z)$ for arbitrary z? How do we calculate any arbitrary percentile for the standard normal distribution?
---> Use z-table:

In [11]:

IFrame("http://users.stat.ufl.edu/~athienit/Tables/Ztable.pdf", width=1000, height=800)

Out[11]:

Normal Distribution -- a generalized standard normal distribution¶

Normal distribution is a continuous probability distribution that is symmetric about its mean μ and has standard deviation σ. If X follows a normal distribution, we write

$$X ∼ \mathcal{N} (\mu, \sigma)$$
If $\mu=0, \sigma=1$, boils down to standard normal.

Normal Distribution¶

Z-score: a bridge between standard normal and normal distriution¶

If $$X ∼ \mathcal{N} (\mu, \sigma)$$

then $$Z = \frac{X-\mu}{\sigma} \sim \mathcal{N} (0, 1) $$

In [7]:

IFrame("https://en.wikipedia.org/wiki/Normal_distribution", width=900, height=600)

Out[7]:

Elementary Statistics (STAT 201)¶

Probability Distribution¶

What is probability distribution?¶

Random Variable¶

Random Variable - example¶

Defination of Proablility Distribution¶

Recall the defination of statistic & parameter¶

Defination of Proablility Distribution¶

Distinguish between sample mean $\bar{x}$ and expected value $\mu$¶

Distinguish between sample standard deviation $s$ and sample standard deviation $\sigma$¶

Why greek letters?¶

A brief summary of previous chapers¶

Discrete Random Variables¶

Expected value/Mean of Discrete Random Variables¶

For a investor...¶

Let's calculate the mean return each year¶

Continuous Random Variable - the defination¶

Continuous Random Variable - properties¶

Continuous Random Variable - properties¶

Continuous Random Variable - An example¶

An important instance of discrete distrbution : Binomial Distribution¶

Binomial Distribution - the assumptions¶

Binomial Distribution - the defination¶

Binomial Distribution - the probability calculation¶

Binomial Distribution - mean and standard deviation¶

An important continous ditribution : Standard Normal distribution¶

Standard Normal Distribution -- properties¶

Standard Normal Distribution -- percentile¶

Standard Normal Distribution -- z-table¶

Normal Distribution -- a generalized standard normal distribution¶

Normal Distribution¶

Z-score: a bridge between standard normal and normal distriution¶

End¶