Skip to main content

IAI +011

· 10 min read

Uncertain Domain

  • Agent may need to handle uncertainty
  • Environment is partially observable
    • Fully observable: An agent knows where it is.
    • The drone never knows its exact position, only an estimated one.
    • The true user intent is not directly observable, only noisy speech data.
    • It cannot see the whole environment or detect humans perfectly.
    • The true health state (disease present or not) is hidden.
  • Environment is non-deterministic
    • AI Trading Agent
      • even if the agent takes the same action, the outcome can vary dramatically.
      • from market prices changes unpredictably due to external events (news, other traders, economic data).
      • network latency, competing traders' behavior are stochasic and independent, random fluctuations can affect the outcome.
      • buy action could lead to profit, loss, or no change depending on uncontrollable external factors.
    • AI Helpdesk Chatbot
      • even when the chatbot executes the same action, it sends queries to multiple backend systems, the result may differ each time.
      • network latency or failures, backend load, concurrent requests, and external dependencies can lead to different response times or outcomes.
      • same command doesn't guarantee the same outcome.
  • Autonomous Driving
    • Partial Observability
      • Sensors cannot detect everything, blind spots, weather effects, occluded vehicles.
    • Non-Determinism
      • Even if the car signals a turn and starts moving, other drivers might brake suddenly, pedestrians might cross unexpectedly, or traffic lights might malfunction.
      • depending on random external events.
  • Medical Diagnosis & Treatment Assistant
    • Partial Observability
      • The patient's internal health state is not directly visible.
      • tests can be inaccurate or incomplete: false positives/negatives, missing data
    • Non-Determinism
      • The same treatment can have different effects on different patients due to genetics, lifestyle, or random biological responses.
      • A drug may succeed, fail, or cause side effects.

Bayes' Theorem

P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

  • P(AB)P(A|B): Posterior probability, the probability of event A given taht B has occurred.
  • P(BA)P(B|A): Likelihood, the probability of event B occurring given that A is true.
  • P(A)P(A): Prior probability, the initial probability of A being true before considering B.
  • P(B)P(B): Evidence, the total probability of B occurring.
  • examples
    • 1% of people have a certain disease (prior probability). P(Disease)=0.01P(Disease) = 0.01
    • the test for the disease is 99% accurate (likelihood).
      • P(PositiveTestDisease)=0.99P(PositiveTest|Disease) = 0.99
      • P(NegativeTestNoDisease)=0.99P(NegativeTest|NoDisease) = 0.99
      • so P(PositiveTestNoDisease)=0.01P(PositiveTest|NoDisease) = 0.01
    • P(PositiveTest)=P(PositiveTestDisease)P(Disease)+P(PositiveTestNoDisease)P(NoDisease)P(PositiveTest) = P(PositiveTest|Disease) \cdot P(Disease) + P(PositiveTest|NoDisease) \cdot P(NoDisease)
      • =0.990.01+0.010.99=0.0198= 0.99 \cdot 0.01 + 0.01 \cdot 0.99 = 0.0198
    • P(DiseasePositiveTest)=P(PositiveTestDisease)P(Disease)P(PositiveTest)P(Disease|PositiveTest) = \frac{P(PositiveTest|Disease) \cdot P(Disease)}{P(PositiveTest)}
      • =0.990.010.01980.5= \frac{0.99 \cdot 0.01}{0.0198} \approx 0.5
  • the conditional probability P(effectcause)P(effect|cause) quantifies the relationship in the causal direction from cause to effect.
  • but P(causeeffect)P(cause|effect) is often what we really want to know, describing the relationship in the diagnostic direction from effect to cause.
    • In medical dignosis, the doctor knows P(symptomsdisease)P(symptoms|disease) from medical studies, and want to derive a P(diseasesymptoms)P(disease|symptoms) for a particular patient.
    • P(diseasesymptoms)=P(symptomsdisease)P(disease)P(symptoms)P(disease|symptoms) = \frac{P(symptoms|disease) \cdot P(disease)}{P(symptoms)}

General Form of Bayes' Rule

P(YX)=αP(XY)P(Y)P(Y|X) = \alpha \cdot P(X|Y) \cdot P(Y)

  • α\alpha is the normalization constant needed to make the entries in P(YX)P(Y|X) sum to 1 for each value of X.
  • conditional independency
    • two variables X and Y are conditionally independent given a third variable Z if the value of X provides no information about Y once Z is known.
    • P(X,YZ)=P(XZ)P(YZ)P(X, Y|Z) = P(X|Z) \cdot P(Y|Z)

Constructing Bayesian Network

  • constructing a Bayesian network in which resulting joint probability distributions are a good representation of the agent's knowledge.
    1. P(x1,...,xn)=i=1nP(xiparents(xi))P(x_1, ..., x_n) = \prod_{i=1}^{n} P(x_i | \text{parents}(x_i))
    1. rewrite the entries in the joint probability distribution in terms of conditional probabilities using the product rule.
    • P(x1...xn)=P(xnxn1...x1)P(xn1...x1)P(x_1 \land ... \land x_n) = P(x_n | x_{n-1} \land ... \land x_{1}) \cdot P(x_{n-1} \land ... \land x_{1})
    1. repeat step 2 until all variables are included.
    • P(x1,...,xn)=P(xnxn1...x1)P(xn1xn2...x1)P(x2x1)P(x1)P(x_1, ..., x_n) = P(x_n | x_{n-1} \land ... \land x_{1}) \cdot P(x_{n-1} | x_{n-2} \land ... \land x_{1}) \cdots P(x_2 | x_1) \cdot P(x_1)

Knowledge Representation

  • The agent's knowledge can at best provide only a degree of belief in the relevant logical sentences.
  • A probabilistic agent may have a numerical degree of belief between 0 and 1.

Bayesian Network

  • a directed graph
  • each node repressents a random variable and edge with an arrow from X to Y represents that X is a parent of Y.
  • each node is associated with a probability distribution
  • if a node Y has parents X, the Y node has a conditional probability distribution with respect to its parents X, i.e. P(Yparents(Y))P(Y| \text{parents}(Y))
  • if a node has no any parents, it has an unconditional probability distribution.

Burglary Example

  • a new burglar alarm installed at home.
  • it is reliable at detecting a burglary, but also responds on occasion to minor earthquakes
  • you have two neighbors, John and Mary who promised to call you at work when they hear the alarm.
  • given the evidence of who has or has not called, estimate the probability of a burglary.
  • Knowledge representation as the probability of each event that can occur in this case.

P(eventj)=number of ccurrences of event jtotal number of experimentsP(\text{event}_{j}) = \frac{\text{number of ccurrences of event }_j}{\text{total number of experiments}}

  • calulate the full joint probability distribution for all relevant variables.
    • specifying probabilities for all possible events one by one is unnatural and tedious.

P(B,E,A,J,M)=i5P(Xiparents(Xi))=P(B)P(E)P(AB,E)P(JA)P(MA)P(B, E, A, J, M) = \prod_{i}^{5} P(X_i | \text{parents}(X_i)) = P(B) \cdot P(E) \cdot P(A|B,E) \cdot P(J|A) \cdot P(M|A)

P(M,J,A,E,B)=P(MJ,A,E,B)P(J,A,E,B)P(JA,E,B)P(A,E,B)P(AE,B)P(E,B)P(EB)P(B)=P(MJ,A,E,B)P(JA,E,B)P(AE,B)P(EB)P(B)=P(MJ,A,E,B)P(JA,E,B)P(AE,B)P(E)P(B)i.e. P(EB)=P(E) (Earthquake is independent of Burglary)=P(MA)P(JA)P(AE,B)P(E)P(B)i.e. P(JA,E,B)=P(JA) (JohnCalls depends only on Alarm)i.e. P(MJ,A,E,B)=P(MA) (MaryCalls depends only on Alarm)\begin{align*} & P(M, J, A, E, B) = P(M | J, A, E, B) \cdot P(J, A, E, B) \\ & P(J | A, E, B) \cdot P(A, E, B) \\ & \quad P(A | E, B) \cdot P(E, B) \\ & \quad \quad P(E | B) \cdot P(B) \\ & = P(M|J,A,E,B) \cdot P(J|A,E,B) \cdot P(A|E,B) \cdot P(E|B) \cdot P(B) \\ & = P(M|J,A,E,B) \cdot P(J|A,E,B) \cdot P(A|E,B) \cdot P(E) \cdot P(B) \\ & \quad i.e.\space P(E|B) = P(E) \text{ (Earthquake is independent of Burglary)} \\ & = P(M|A) \cdot P(J|A) \cdot P(A|E,B) \cdot P(E) \cdot P(B) \\ & \quad i.e.\space P(J|A,E,B) = P(J|A) \text{ (JohnCalls depends only on Alarm)} \\ & \quad i.e.\space P(M|J,A,E,B) = P(M|A) \text{ (MaryCalls depends only on Alarm)} \end{align*}

P(M,J,A,E,B)=P(B)P(E)P(AB,E)P(JA)P(MA) P(M, J, A, E, B) = P(B) \cdot P(E) \cdot P(A|B,E) \cdot P(J|A) \cdot P(M|A)

Probabilistic reasnoing

  • to explain how to use network models to reason under uncertainty.
  • Exact inference: calculate the exact posterior probabilities.
  • Approximate inference: Stochastic approximation techniques such as likelihood weighting and Markov Chain Monte Carlo
    • can give reasonable estimates of the true posterior probabilities in a network
    • can cope with much larger networks than exact inference algorithms.
  • Static worlds: Each random variable has a single fixed value.
  • Dynamic worlds: view the world as a series of snapshots, or time slices.
    • Each of snapshots contains a set of random variables, some observed and some hidden.
    • The values of these variables can change over time.
    • With the assumption of state sequences are Markov, the future is independent of the past given the present.

Inference in Bayesian Networks

  • Computing the posterior probability distribution of a set of query variables, given some observed events, called some assignment of values to a set of evidence variables.
  • XX: query variable(s)
    • is it raining?
  • EE: the set of evidence variables with observed values E1,...,EmE_{1}, ..., E_{m}
    • weather report says it is cloudy.
    • weather report says there is high humidity.
  • ee: a particular observed event.
    • E1=cloudyE_{1} = \text{cloudy}
    • E2=high humidityE_{2} = \text{high humidity}
  • YY: the non-evidence, non-query variables.
    • the other variables in the network.
  • X={X}EYX = \{X\} \cup E \cup Y
    • the complete set of variables in the network.
  • P(Xe)P(X|e): the posterior probability distribution of the query variable(s) given the evidence.
  • P(Xe)=αyP(X,e,y)P(X|e) = \alpha \sum_{y} P(X, e, y)

Exact inference in Bayesian Networks

  • One exact inference method is called by inference by enumeration.
  • Observed some event A=trueA = true and B=trueB = true, want to compute the posterior probability distribution P(CA,B)P(C|A, B).
    • Use the full conditional probability table
    • Use the Bayesian Network
  • It calculates all the required probability components in each branch in the evaluation tree, and this results in repetitive calculations.
  • To avoid this, we can use variable elimination algorithm and clustering algorithms.
  • Calculation of probability distribution for a query variable is complexity exponential.
  • Enumeration method can become intractable in large, multiple connected networks. -> use approximate inference methods.

For any Bayesian network with given nodes, X={X1,X2,...,Xn}X = \{X_1, X_2, ..., X_n\}, the joint probability distribution is given by: P(X)=P(X1X2Xn)=i=1nP(Xiparents(Xi))P(X) = P(X_1 \land X_2 \land \ldots \land X_n) = \prod_{i=1}^{n} P(X_i | \text{parents}(X_i))

Using the Bayesian network, we can compute the conditional probability.

P(Bevent)=αeaP(B,E,A,j,m) P(B | event) = \alpha \sum_{e}\sum_{a} P(B, E, A, j, m)

  • i=1,P(X1parents(X1))=P(B)i=1, P(X_1 | \text{parents}(X_1)) = P(B)
  • i=2,P(X2parents(X2))=P(E)i=2, P(X_2 | \text{parents}(X_2)) = P(E)
  • i=3,P(X3parents(X3))=P(AB,E)i=3, P(X_3 | \text{parents}(X_3)) = P(A|B,E)
  • i=4,P(X4parents(X4))=P(jA)i=4, P(X_4 | \text{parents}(X_4)) = P(j|A)
  • i=5,P(X5parents(X5))=P(mA)i=5, P(X_5 | \text{parents}(X_5)) = P(m|A)
  • P(B,E,A,j,m)=P(B)P(E)P(AB,E)P(jA)P(mA)P(B, E, A, j, m) = P(B) \cdot P(E) \cdot P(A|B,E) \cdot P(j|A) \cdot P(m|A)
P(Bevent)=αeaP(B,E,A,j,m)=αeaP(B)P(E)P(AB,E)P(jA)P(mA)=αP(B)eP(E)aP(AB,E)P(jA)P(mA)\begin{align*} & P(B | event) = \alpha \sum_{e}\sum_{a} P(B, E, A, j, m) \\ & = \alpha \sum_{e}\sum_{a} P(B) \cdot P(E) \cdot P(A|B,E) \cdot P(j|A) \cdot P(m|A) \\ & = \alpha \cdot P(B) \sum_{e} P(E) \sum_{a} P(A|B,E) \cdot P(j|A) \cdot P(m|A) \end{align*}

Berglary example calculation:

P(bevent)=αP(b)eP(E)aP(Ab,E)P(jA)P(mA)=αP(b)[P(e)aP(Ab,e)P(jA)P(mA)+P(¬e)aP(Ab,¬e)P(jA)P(mA)]aP(A,b,e)P(jA)P(mA)=P(a,b,e)P(ja)P(ma)+P(¬a,b,e)P(j¬a)P(m¬a)aP(Ab,¬e)P(jA)P(mA)=P(a,b,¬e)P(ja)P(ma)+P(¬a,b,¬e)P(j¬a)P(m¬a)=αP(b)[P(e)(P(ab,e)P(ja)P(ma)+P(¬ab,e)P(j¬a)P(m¬a))+P(¬e)(P(ab,¬e)P(ja)P(ma)+P(¬ab,¬e)P(j¬a)P(m¬a))]\begin{align*} & P(b | event) = \alpha P(b) \sum_{e} P(E) \sum_{a} P(A|b,E) \cdot P(j|A) \cdot P(m|A) \\ & = \alpha P(b) \bigg[ P(e) \sum_{a} P(A|b,e) \cdot P(j|A) \cdot P(m|A) + P(\neg e) \sum_{a} P(A|b,\neg e) \cdot P(j|A) \cdot P(m|A) \bigg] \\ & \quad \sum_{a} P(A,b,e) \cdot P(j|A) \cdot P(m|A) \\ & \quad = P(a,b,e) \cdot P(j|a) \cdot P(m|a) + P(\neg a,b,e) \cdot P(j|\neg a) \cdot P(m|\neg a) \\ & \quad \sum_{a} P(A|b,\neg e) \cdot P(j|A) \cdot P(m|A) \\ & \quad = P(a,b,\neg e) \cdot P(j|a) \cdot P(m|a) + P(\neg a,b,\neg e) \cdot P(j|\neg a) \cdot P(m|\neg a) \\ & = \alpha P(b) \bigg[ P(e) \big( P(a|b,e) \cdot P(j|a) \cdot P(m|a) + P(\neg a|b,e) \cdot P(j|\neg a) \cdot P(m|\neg a) \big) \\ & \quad \quad + P(\neg e) \big( P(a|b,\neg e) \cdot P(j|a) \cdot P(m|a) + P(\neg a|b,\neg e) \cdot P(j|\neg a) \cdot P(m|\neg a) \big) \bigg] \\ \end{align*}