## Probability and Statistics  ## Probability and Statistics

Probability and statistics, the branch of mathematics related to the laws governing random events, including the collection, analysis, interpretation, and visualization of numerical data. Probability originated from the study of gambling and insurance in the 17th century, and it is now an indispensable tool in the social sciences and natural sciences. It can be said that statistics originated from the census thousands of years ago; however, as a unique scientific discipline, it developed into the study of population, economics, and moral behavior in the early 19th century, and developed into the analysis of these in the latter part of the century. Mathematical tools for numbers. For technical information on these topics, see Probability Theory and Statistics. Some very fundamental terms/concepts related to probability and statistics often come across any literature related to Machine Learning and AI.

Random Experiment

A random experiment is a physical situation whose outcome cannot be predicted until it is observed.

Sample Space

A sample space is a set of all possible outcomes of a random experiment.

Random Variables

A random variable is a variable whose possible values are numerical outcomes of a random experiment. There are two types of random variables.

1. Discrete Random Variable is one which may take on only a countable number of distinct values such as 0,1,2,3,4,…….. Discrete random variables are usually (but not necessarily) counts.
2. Continuous Random Variable is one that takes an infinite number of possible values. Continuous random variables are usually measurements.

### Probability

Probability is the measure of the likelihood that an event will occur in a Random Experiment. Probability is quantified as a number between 0 and 1, where, loosely speaking, 0 indicates impossibility and 1 indicates certainty. The higher the probability of an event, the more likely it is that the event will occur.

#### Conditional Probability

Conditional Probability is a measure of the probability of an event given that (by assumption, presumption, assertion, or evidence) another event has already occurred.

Independence

Two events are said to be independent of each other if the probability that one event occurs in no way affects the probability of the other event occurring, or in other words, if we have an observation about one event it doesn’t affect the probability of the other.

Conditional Independence

Two events A and B are conditionally independent given a third event C precisely if the occurrence of A and the occurrence of B are independent events in their conditional probability distribution given C.

Expectation

The expectation of a random variable X is written as E(X). If we observe N random values of X, then the mean of the N values will be approximately equal to E(X) for large N.

Variance

The variance of a random variable X is a measure of how concentrated the distribution of a random variable X is around its mean.

Probability Distribution

Is a mathematical function that maps all possible outcomes of a random experiment with its associated probability. It depends on the Random Variable X, whether it’s discrete or continuous.

1. Discrete Probability Distribution: The mathematical definition of a discrete probability function, p(x), is a function that satisfies the following properties.
2. Continuous Probability Distribution: The mathematical definition of a continuous probability function, f(x), is a function that satisfies the following properties.

Joint Probability Distribution

If X and Y are two random variables, the probability distribution that defines their simultaneous behavior during outcomes of a random experiment is called a joint probability distribution.

Conditional Probability Distribution (CPD)

If Z is a random variable who is dependent on other variables X and Y, then the distribution of P(Z|X, Y) is called CPD of Z w.r.t X and Y. It means for every possible combination of random variables X, Y we represent a probability distribution over Z.

1. Conditioning/Reduction
2. Marginalization

Factor

A factor is a function or a table that takes a number of random variables {X_1, X_2,…, X_n} as an argument and produces a real number as an output. The set of input random variables are called the scope of the factor.