Probability, Random Processes and Inferencepescamilla/PRPI/slides/PRPI_3.pdf · Probability, Random...

INSTITUTO POLITÉCNICO NACIONALCENTRO DE INVESTIGACION EN COMPUTACION

Probability, Random Processes and Inference

Dr. Ponciano Jorge Escamilla Ambrosiopescamilla@cic.ipn.mx

http://www.cic.ipn.mx/~pescamilla/

Laboratorio de

Ciberseguridad

Course Content

1.4. General Random Variables

1.4.1. Continuous Random Variables and PDFs

1.4.2. Cumulative Distribution Function

1.4.3. Normal Random Variables

1.4.4. Joint PDFs of Multiple Random Variables

1.4.5. Conditioning

1.4.6. The Continuous Bayes’ Rule

1.4.7. The Strong Law of Large Numbers

❑ Continuous random variables

➢ The velocity of a vehicle traveling along the highway

❑ Continuous random variables can take on any real

value in an interval.

➢ possibly of infinite length, such as (0,) or the entire real

❑ In this section the concepts and method for discrete

r.v.s, such as expectation, PMF, and conditioning,

for their continuous counterparts are introduced.

General Random Variables

❑ Continuous random variable. A random variable is

called continuous if there exists a non negative

function fX, called the probability density function

of X, or PDF, such that:

❑ For every subset B of the real line

Probability Density Function

❑ The probability that the value of X falls within an

interval is:

which can be interpreted as the area under the graph of

the PDF.

❑ For any single value a, we have:

❑ For this reason, including or excluding the endpoints

of an interval has no effect on its probability:

❑ To qualify as a PDF, a function fX must be:

o nonnegative, i.e., fX(x) 0 for every x,

o have the normalisation property:

❑ Graphically, this means that the entire area under the

graph of the PDF must be equal to 1.

Discrete vs. continuous r.v.s.

Recall that for a discrete r.v., the CDF jumps at every point in the

support, and is flat everywhere else. In contrast, for a continuous

r.v. the CDF increases smoothly.

❑ For a continuous r.v. X with CDF, FX(x), the

probability density function (PDF) of X is the

derivative fX(x) of the CDF, given by fX(x) = F′X (x).

The support of X, and of its distribution, is the set of

all x where fX(x) > 0.

❑ The PDF represents the “density” of probability at

the point x.

Discrete vs. continuous r.v.s.

❑ To get from the PDF back to the CDF we apply:

❑ Thus, analogous to how we obtained the value of a

discrete CDF at x by summing the PMF over all

values less than or equal to x; here we integrate the

PDF over all values up to x, so the CDF is the

accumulated area under the PDF.

❑ Since we can freely convert between the PDF and

the CDF using the inverse operations of integration

and differentiation, both the PDF and CDF carry

complete information about the distribution of a

continuous r.v.

❑ Thus the PDF completely specifies the behavior

of continuous random variables.

❑ For an interval [x, x+] with very small length , we

So we can view fX(x) as the “probability mass per unit

length” near x.

Even though a PDF is used to calculate

event probabilities, fX(x) is not the

probability of any particular event.

In particular, it is not restricted to be

less than or equal to one.

❑ An important way in which continuous r.v.s differ

from discrete r.v.s is that for a continuous r.v. X,

P(X = x) = 0 for all x. This is because P(X = x) is the

height of a jump in the CDF at x, but the CDF of X

has no jumps! Since the PMF of a continuous r.v.

would just be 0 everywhere, we work with a PDF

instead.

❑ The PDF is analogous to the PMF in many ways, but

there is a key difference: for a PDF fX , the quantity

fX(x) is not a probability, and in fact it is possible to

have fX(x) > 1 for some values of x. To obtain a

probability, we need to integrate the PDF.

❑ In summary:

➢To get a desired probability, integrate the PDF over

the appropriate range.

❑ The Logistic distribution has CDF:

❑ To get the PDF, we differentiate the CDF, which

gives:

❑ Example:

Examples of PDFs

❑ The Rayleigh distribution has CDF:

❑ To get the PDF, we differentiate the CDF, which

gives:

❑ Example:

Examples of PDFs

❑ A continuous r.v. X is said to have Uniform

distribution on the interval (a, b) if its PDF is:

❑ The CDF is the accumulated area under the PDF:

Examples of PDFs

❑ We denote this by X Unif(a, b).

❑ The Uniform distribution that we will most frequently

use is the Unif(0, 1) distribution, also called the

standard Uniform.

❑ The Unif(0, 1) PDF and CDF are particularly simple:

f(x) = 1 and F(x) = x for 0 < x < 1.

❑ For a general Unif(a, b) distribution, the PDF is

constant on (a, b), and the CDF is ramp-shaped,

increasing linearly from 0 to 1 as x ranges from a to b.

Examples of PDFs

For Uniform distributions, probability is proportional to length.

PDF Properties

❑ The expected value or expectation or mean of a

continuous r.v. X is defined by:

❑ This sis similar to the discrete case except that the

PMF is replaced by the PDF, and summation is

replaced by integration.

❑ Its mathematical properties are similar to the discrete

Expected Value and Variance of a

Continuous r.v.

❑ If X is a continuous random variable with given

PDF, then any real-valued function Y = ɡ(X) of X is

also a random variable.

➢Note that Y can be a continuous r.v., but Y can also be

discrete, e.g., ɡ(x) = 1 for x ˃ 0 and ɡ(x) = 0, otherwise.

❑ In either case, the mean of ɡ(X) satisfies the

expected value rule:

Continuous r.v.

❑ The nth moment of a continuous r.v. X is defined as

E[Xn], the expected value of the random variable Xn.

❑ The variance of X denoted as var(X), is defined as the

expected value of the random variable (X - E[X])2 :

Continuous r.v.

❑ Example. Consider a uniform PDF over an interval

[a, b], its expectation is given by:

Continuous r.v.

❑ Its variance is given as:

Continuous r.v.

❑ The exponential continuous random variable has

where is a positive parameter characterising the

PDF, with

Continuous r.v.

❑ The probability that X exceeds a certain value

decreases exponentially. This is, for any a 0, we

❑ An exponential random variable can be a good

model for the amount of time until an incident of

interest takes place.

➢ a message arriving at a computer, some equipment

breaking down, a light bulb burning out, etc.

Continuous r.v.

❑ The mean of the exponential r.v. X is calculated by:

Continuous r.v.

❑ The variance of the exponential r.v. X is calculated

Continuous r.v.

❑ The cumulative distribution function, CDF, of a

random variable X is denoted as FX and provides the

probability P(X x). In particular for every x we

Cumulative Distribution Functions

The CDF FX(x) “accumulates” probability “up to” the value of x.

❑ Any random variable associated with a given

probability model has CDF, regardless of whether it

is discrete or continuous.

➢ {X x} is always an event and therefore has well-defined

probability.

Cumulative Distribution Functions

Normal Random Variables

❑ A continuous random variable X is normal or

Gaussian or normally distributed if it has PDF of

the form:

where μ and σ are two scalar parameters characterising

the PDF (abbreviated N(μ, σ2), and referred to as

normal density function), with σ assumed positive.

❑ It can be verified that the normalisation property

holds:

N(1,1)

❑ If X is N(μ, σ2), then: E(X) = μ

Proof: The PDF is symmetric about x = μ.

❑ If X is N(μ, σ2), then: Var(X) = σ2

Proof:

❑ Its maximum value occurs at the mean value of its

argument.

❑ It is symmetrical about the mean value.

❑ The points of maximum absolute slope occur at one

standard deviation above and below the mean.

❑ Its maximum value is inversely proportional to its

standard deviation.

❑ The limit as the standard deviation approaches zero

is a unit impulse.

Linear Function of a Normal

Random Variable

❑ If X is a normal r.v. with mean and variance 2,

and if a 0, b are scalars, then the random variable:

Y = aX + b

is also normal, with mean and variance:

E[Y] = a + b, var(Y) = a2 2

Standard Normal Random Variables

❑ A normal random variable Y with zero mean and

unit variance, N(0, 1), is said to be a standard

normal. Its PDF and CDF are denoted by and ,

respectively:

❑ The PDF of a normal r.v. cannot be integrated in

terms of the common elementary functions, and

therefore the probabilities of X falling in various

intervals are obtained from tables or by computer.

❑ Example, the Standard Normal Table.

❑ The table only provides the values of (y) for y 0,

because the omitted values can be calculated using

the symmetry of the PDF.

❑ It would be overwhelming to construct tables for all

μ and σ values required in application.

➢ Standardise the r.v.

❑ Let X be a normal (Gaussian) random variable with

mean μ and variance σ 2 values. We standardise X by

defining a new random variable Y given by:

❑ Since Y is a linear function of X, it is normal, This

means:

❑ Thus, Y is a standard normal random variable.

➢ This allows us to calculate the probability of any event

defined in terms of X by redefining the event in terms of

Y, and then using the standard normal table.

❑ Example 1:

❑ Example 2: The annual snowfall at a particular

geographic location is modelled as a normal random

variable with a mean = 60 inches and a standard

deviation of = 20. What is the probability that this

year’s snowfall will be at least 80 inches?

❑ Solution:

❑ Example 3: (Height Distribution of Men). Assume

that the height X, in inches, of a randomly selected

man in a certain population is normally distributed

with μ = 69 and σ = 2.6. Find

1. P(X < 72),

2. P(X > 72),

3. P(X < 66),

4. P(|X − μ| < 3).

❑ The table gives (z) only for z ≥ 0, and for z < 0 we

need to make use of the symmetry of the normal

distribution. This implies that, for any z, P(Z < −z) =

P(Z > z). Thus, solution:

❑ Normal r.v.s. are often used in signal processing and

communications engineering to model noise and

unpredictable distortions of signals.

❑ Example:

❑ Solution:

❑ Three important benchmarks for the Normal

distribution are the probabilities of falling within

one, two, and three standard deviations of the mean.

The 68-95-99.7% rule tells us that these probabilities

are what the name suggests.

❑ (68-95-99.7% rule). If X N(μ, 2), then:

Standardising

❑ Three important benchmarks for the Normal

distribution are the probabilities of falling within

one, two, and three standard deviations of the mean.

The 68-95-99.7% rule tells us that these probabilities

are what the name suggests.

❑ (68-95-99.7% rule). If X N(μ, 2), then:

Standardising

Joint PDF of Multiple Random

Variables

❑ Two continuous random variables associated with

the same experiment are jointly continuous and can

be described in terms of a joint PDF fX,Y if fX,Y is a

nonnegative function that satisfies:

for every subset B of the two-dimensional plane.

❑ The notation means that the integration is carried out

over the set B.

Variables

❑ In the particular case where B is a rectangle of the

form B = {(x, y) | a x b, c y d}, we have:

❑ If B is the entire two-dimensional plane, then we

obtain the normalisation property:

Variables

❑ To interpret the joint PDF, we let be a small

positive number and consider the probability of a

small rectangle. Then we have:

so we can view fX,Y(a, c) as the probability per unit

area in the vicinity of (a, c).

Variables

❑ The joint PDF contains all relevant probabilistic

information on the random variables X, Y, and their

dependencies.

❑ Therefore, the joint PDF allow us to calculate the

probability of any event that can be defined in terms

of these two random variables.

Marginals

❑ Marginal PDF. For continuous r.v.s X and Y with

joint PDF fX,Y, the marginal PDF of X is:

❑ Similarly, the marginal PDF of Y is:

Marginals

❑ Marginalisation works analogously with any number

of variables. For example, if we have the joint PDF

of X, Y, Z,W but want the joint PDF of X,W, we

just have to integrate over all possible values of Y

and Z:

➢ Conceptually this is very easy—just integrate over the unwanted

variables to get the joint PDF of the wanted variables—but

computing the integral may or may not be difficult.

Marginals

❑ Example 1.

Marginals

❑ Example 1.

Joint CDFs

❑ If X and Y are two random variables associated with

the same experiment, their joint CDF is defined by:

❑ The joint CDF is the joint probability of the two

events {X ≤ x} and {Y ≤ y}.

❑ If X and Y are described by a joint PDF fX,Y, then:

Variables

❑ Conversely, if X and Y are continuous with joint

CDF FX,Y their joint PDF is the derivative of the

joint CDF with respect to x and y:

Joint CDF of Multiple Random

Variables

❑ Let X and Y be described by a uniform PDF on the

unit square. The joint CDF is given by:

❑ It can be verified that:

for al (x, y) in the unit square.

Expectation

❑ If X and Y are jointly continuous random variables

and ɡ is some function, then Z = ɡ (X, Y) is also a

random variable. Thus the expected value rule

applies:

❑ As an important special case, for any scalars a, b,

and c, we have:

More than Two Random Variables

❑ The joint PDF of three random variables X, Y, and Z

is defined in analogy with the case of two random

variables. For example:

❑ For any set B. We have the relations such as:

❑ The expected value rule takes the form:

❑ If ɡ is linear, of the form aX +bY + cZ, then:

Conditioning

❑ The conditional PDF of a continuous random

variable X, given an event A with P(A) 0, is

defined as a nonnegative function fX|A that satisfies:

for any subset B of the real line.

Conditioning

❑ In particular, by letting B be the entire real line, we

obtain the normalisation property:

so that fX|A is a legitimate PDF.

Conditioning

❑ In the important special case where we condition on

an event of the form {X A}, with P(X A) 0,

the definition of conditional probabilities yields:

❑ By comparing with the earlier formula, it gives:

Conditioning

Joint Conditional PDF

❑ Suppose that X and Y are jointly continuous random

variables, with joint PDF fX,Y. If we condition on a

positive probability event of the form C = {(X,Y)

A}, we have:

❑ In this case, the conditional PDF of X, given this

event, can be obtained from the formula:

❑ These two formulas provide one possible method for

obtaining the conditional PDF of a random variable

X when the conditioning event is not of the form {X

A}, but instead defined in terms multiple random

variables.

❑ A version of the total probability theorem, which

involves conditional PDFs is given as: if the events

A1,…, An form a partition of the sample space, then:

❑ Using the total probability theorem:

❑ Finally, the formula can be written as:

❑ We then take the derivative of both sides, with

respect to x, and obtain the desired result.

Conditioning

❑ To interpret the conditional PDF, let us fix some

small positive numbers 1 and 2, and condition on

the event B = {y Y y + 2}. We have:

❑ Therefore, fX|Y(x|y)1 provides us with the

probability that X belongs to a small interval [x, x +

1], given that Y belongs to a small interval [y, y +

2]. Since fX|Y(x|y)1 does not depend on 2, we can

think of the limiting case where 2 decreases to zero

and write:

❑ And more generally:

❑ The conditional probability PDF fX|Y(x|y) can be seen

as a description of the probability law of X, given that

the event {Y = y} has occurred.

❑ As in the discrete case, the PDF fX|Y, together with the

marginal PDF fy are sometimes used to calculate the

joint PDF.

➢ This approach can also be used for modelling: instead of

directly specifying fX|Y, it is often natural to provide a

probability law for Y, in terms of a PDF fY, and then provide a

conditional PDF fX|Y(x|y) for X, given any possible value y of Y.

❑ Example. The speed of a typical vehicle that drives

past a police radar is modelled as an exponentially

distributed random variable X with mean 50 miles

per hour. The police radar’s measurement Y of the

vehicle’s speed has an error which is modelled as a

normal random variable with zero mean and

standard deviation equal to one tenth of the vehicle’s

speed. What is the joint PDF of X and Y?

❑ Solution. We have fX(x) = (1/50)e-x/50, for x 0.

Also, conditioned on X = x, the measurement Y has

a normal PDF with mean x and variance x2/100.

Therefore:

❑ Thus, for all x 0 and all y:

Conditional PDF for More Than Two r.v.s.

❑ Conditional PDF can be defined for the extension for

the case of more than two random variables:

❑ The analogue multiplication rule is given as:

Conditional Expectation

❑ For a continuous random variable X, we define the

conditional expectation E[X|A] given an event A,

similar to the unconditional case, except that we now

need to use the conditional PDF fX|A.

❑ Let X and Y be jointly continuous random variables,

and let A be an event with P(A) 0, then the

conditional expectation of X given the event A is

defined by:

❑ The conditional expectation of X given that Y = y is

defined by:

❑ The expectation rule, for a function ɡ(x):

❑ Total expectation theorem: Let A1, A2,…, An be

disjoint events that form a partition of the same

space, and assume that P(Ai) ˃ 0 for all i. Then:

❑ Similarly:

❑ There are natural analogues for the case of functions

of several random variables. For example:

❑ And:

Independence

❑ Two continuous random variable X and Y are

independent if their joint PDF is the product of the

marginal PDFs:

❑ Comparing with the formula fX,Y(x, y) = fX|Y(x|y)fY(y),

we see that independence is the same as the

condition:

or, symmetrically:

Independence

❑ For the case of more than three random variables, for

example, we say that three random variables X, Y,

and Z are independent if:

Independence

❑ Example. Independent Normal Random Variables.

Let X and Y be independent normal random

variables with means x, y, and variances 2x,

respectively. Their joint PDF is of the form:

❑ This joint PDF has the shape of a bell cantered at (x,

y), and whose width in the x and y directions is

proportional to 2x and 2

y, respectively.

Independence

❑ Additional insight into the form of the PDF can be

get by considering its contours.

➢ i.e., sets of points ata which the PDF takes a constant

value.

❑ These contours are described by an equation of the

and are ellipses whose two axes are horizontal and

vertical. If 2x = 2

y, then the contours are circles.

Independence

❑ If X and Y are independent, then any two events of

the form {X A} and {Y B} are independent:

Independence

❑ Independence implies that:

❑ The property:

can be used to provide a general definition of

independence between two random variables, e.g., if X

is discrete and Y is continuous.

Independence

❑ Similarly than to the discrete case, if X and Y are

independent, then:

for any two functions ɡ and h.

❑ The variance of the sum of independent random

variables is equal to the sum of their variances:

Summary of Independence

The continuous Bayes’ rule:

Inference problem

posterior

The continuous Bayes’ rule

❑ Inference problem:

❑ We have an unobserved random variable X with

known PDF, and we obtain a measurement Y

according to a conditional PDF fX|Y. Given an

observed value y of Y, the inference problem is to

evaluate the conditional PDF fX|Y(x|y).

❑ Thus, whatever information is provided by the event

{Y = y} is captured by the conditional PDF fX|Y(x|y).

It thus suffices to evaluate this PDF. From the

formula fX fY|X = fX,Y = fY fX|Y, it follows:

❑ Based on the normalisation property

an equivalent expression is:

The Bayes’ rule – discrete unknown,

continuous measurement

The Bayes’ rule – continuous unknown,

discrete measurement

Sums of Independent Random Variables

Convolution

❑ Let Z = X + Y, where X and Y are independent

integer-valued random variables with PMFs pX and

pY, respectively. Then, for any integer z:

❑ The resulting PMF pZ is called the convolution of the

PMFs of X and Y.

Covariance and Correlation

❑ The covariance of two random variables X and Y,

denoted by cov(X, Y), is defined as:

❑ When cov(X, Y) = 0, we say X and Y are

uncorrelated.

➢A positive o negative covariance indicates that the values

of X− E[X] and Y − E[Y] obtained in a single experiment

“tend” to have the same or the opposite sign, respectively.

❑ Multiplying this out and using linearity, we have an

equivalent expression:

❑ Covariance has the following key properties:

1. Cov(X, X) = Var(X).

2. Cov(X, Y) = Cov(Y, X).

3. Cov(X, c) = 0 for any constant c.

4. Cov(aX, Y ) = aCov(X, Y) for any constant a.

5. Cov(X + Y,Z) = Cov(X,Z) + Cov(Y,Z).

6. Cov(X + Y,Z +W) = Cov(X,Z) + Cov(X,W) +

Cov(Y,Z) + Cov(Y,W).

7. Var(X + Y ) = Var(X) + Var(Y) + 2Cov(X, Y ). For

n r.v.s X1, . . . ,Xn,

❑ The correlation coefficient (X,Y) of two random

variables X and Y that have nonzero variances is

defined as:

❑ It may be viewed as a normalised version of the

covariance cov(X, Y).

❑ ranges from -1 to 1.

❑ If > 0 (or < 0), then the values of X − E[X] and Y

− E[Y] “tend” to have the same (or opposite,

respectively) sign.

➢ The size of || provides a normalized measure of the

extent to which this is true.

➢Always assuming that X and Y have positive variances, it

cab be shown that = 1 (or = −1) if and only if

there exists a positive (or negative, respectively)

constant c such that:

Probability, Random Processes and Inferencepescamilla/PRPI/slides/PRPI_3.pdf · Probability, Random...

Documents

Transcript of Probability, Random Processes and Inferencepescamilla/PRPI/slides/PRPI_3.pdf · Probability, Random...

Random 100220013340-phpapp01

Revista Random

Random techn presentation levis

Presentasi Random File

Demand Forecasting of Individual Probability Density ...

A-118 NOISE +RANDOM VOLTAGE

Novedades Random House Mondadori México

Random zine 03

Labelled Markov Processes - cs.cornell.edu

Random Pipol Proyecto (1)

Random Shareable -Libro de Arte-

EQUIPAMIENTO GALÉNICO MIC FOOD processes PLANTAS ...

Revista con gustos random

Random style stories

Generador Random

Blancafort a random house

SPET Stochastic Processes and Ergodic Theoryeventos.cmm.uchile.cl/flacam2019/wp-content/uploads/... · 2019. 11. 6. · Parallel Sessions W1, 11:00-12:30 SPET Scaling limits of random

Random Walk Presentation

Products and processes with microalgae

9 managing processes