The joint distribution describes the probability law of two random variables considered together. For both discrete variables using a probability mass function and continuous variables using a probability density function, its computation follows the chain rule of probability.
1. General Rule
For any two variables $A$ and $B$, whether independent or dependent,
$$P(A, B) = P(A \mid B)P(B) = P(B \mid A)P(A)$$This is the universal identity for joint probability: joint distribution equals a marginal distribution times a conditional distribution.
2. Independent Case
$A$ and $B$ are independent if
$$P(A \mid B) = P(A) \quad \text{or} \quad P(B \mid A) = P(B)$$Substituting this into the general rule gives
$$P(A, B) = P(A)P(B)$$This factorization is valid if and only if the two variables are independent.
3. Dependent Case
If $A$ and $B$ are dependent, then in general
$$P(A \mid B) \neq P(A)$$so the product $P(A)P(B)$ is not valid for the joint distribution.
The joint distribution must be computed from one of the following:
$$P(A, B) = P(B \mid A)P(A)$$or
$$P(A, B) = P(A \mid B)P(B)$$Thus one marginal distribution and the corresponding conditional distribution are sufficient to determine the joint distribution.
4. Discrete and Continuous Forms
For discrete random variables, the joint distribution is written as a PMF such as
$$P(A=a, B=b)$$For continuous random variables, the joint distribution is written as a PDF such as
$$f_{A,B}(a,b) = f_{B \mid A}(b \mid a)f_A(a) = f_{A \mid B}(a \mid b)f_B(b)$$The structure is identical in both settings.