A random variable is a numerical measurement of the outcomes of a random phenomenon.
Tossing a coin: we could get Heads or Tails.
Let's give them the values $\operatorname{Heads} = 0$ and $\operatorname{Tails} = 1$ and we have a Random Variable.
The probability distribution of a random variable specifies its possible values and their probabilities.
Probability distribution describes the population.
We call the numerical summaries of a probability distribution parameters, often denoted by Greek letters.
A discrete random variable $X$ takes a collection of distinct values (such as $0,1, 2, \dots$). Its probability distribution assigns a probability $P(x)$ to each possible value $x$.
The probability distribution is valid if
For each $x$, $0 \leq P(x) \leq 1$
The probabilities for all the possible $x$ values sum up to 1
$$\mu =\sum xP(x)$$
Remark:
Suppose you are a investor, there are 2 investment options for you:
Option 1: $\mu = \$100 $
Option 2: $ \mu = -200 \times 50\% + 1000 \times 50\% = \$400 $
On average, the return of option 2 is much higher than option 1.
A continuous random variable has possible values that form an interval.
Its probability distribution is specified by a density curve, which determines the probability that the random variable falls in any particular interval of values.
For a continuous random variable $X$, $P(X = x) = 0$ for any $x$. And hence we only consider the following types of probabilities:
For instance, the figure below is the density curve for a normal distribution (bell-shape) with mean $\mu = 0$ and standard deviation $\sigma = 1$. The shaded area represents the probability that $X$ is in between −2 and 1, which we will write as $P(−2 < X < 1)$.
We call a random trial with two possible outcomes, success or failure, a Bernoulli trial. Define a parameter $p = P(\operatorname{success})$, the probability of success.
Consider $X$ = number of suceess. Distribution of $X$ could be represented as:
$x$ | $P(x)$ |
---|---|
1 | $p$ |
0 | $1-p$ |
Consider a sequence of $n$ Bernoulli trials which satisfy the following conditions:
Each trial has two possible outcomes, a success or a failure.
Each trial has the same probability of success, which is denoted by $p$.
The $n$ trials are independent.
Let the random variable $X$ = number of successes. $X$ follows a binomial distribution with parameters $n$ and $p$, which we write as
$$X \sim binomial(n, p)$$
The tilde $\sim$ represents the word follows. Possible values of $X$ are $x = 0, 1, 2, \dots, n$
Example : toss a coin $n$ times, let $X$ = the number of heads.
Suppose $X \sim binomial(n, p)$. For a specific number of successes $x = 0, 1,\dots, n$, the probability of $X = x$ is given by $$ P(X = x) = \frac{n!}{x!(n-x)!} p^x (1-p)^{n-x}$$
Remark:
Suppose $X \sim binomial(n, p)$, then the mean and standard deviation of $X$:
$$ \mu = np $$
$$ \sigma = \sqrt{np(1-p)} $$
Standard normal distribution is a continuous probability distribution that is symmetric about its mean $\mu = 0$ and has standard deviation $\sigma = 1$.
We use the letter $Z$ exclusively to denote the random variable following standard normal distribution, which we write as
$$Z \sim \mathcal{N} (0, 1)$$
The random variable $Z$ can take on any real number between $-\infty$ and $+\infty$. We call the specific values $Z$ takes on the z-scores (recall z-scores in Chapter 2).
Because of the symmetry, it has the following properties:
$P(Z < 0) = P(Z > 0) = \frac{1}{2}$
$P(Z > z) = P(Z < −z)$
$P(Z < z) + P(Z < −z) = 1$
The $(100 × p)th$ percentile is the z-score with left-tail probability $P(Z < z) = p$.
For instance, since $P(Z < 0) = 0.5$, $z = 0$ is the $50th$ percentile of $\mathcal{N}(0, 1)$.
IFrame("http://users.stat.ufl.edu/~athienit/Tables/Ztable.pdf", width=1000, height=800)
Normal distribution is a continuous probability distribution that is symmetric about its mean μ and has standard deviation σ. If X follows a normal distribution, we write
$$X ∼ \mathcal{N} (\mu, \sigma)$$
If $\mu=0, \sigma=1$, boils down to standard normal.
If $$X ∼ \mathcal{N} (\mu, \sigma)$$
then $$Z = \frac{X-\mu}{\sigma} \sim \mathcal{N} (0, 1) $$
IFrame("https://en.wikipedia.org/wiki/Normal_distribution", width=900, height=600)