Uncertain Domain
- Agent may need to handle uncertainty
- Environment is partially observable
- Fully observable: An agent knows where it is.
- The drone never knows its exact position, only an estimated one.
- The true user intent is not directly observable, only noisy speech data.
- It cannot see the whole environment or detect humans perfectly.
- The true health state (disease present or not) is hidden.
- Environment is non-deterministic
- AI Trading Agent
- even if the agent takes the same action, the outcome can vary dramatically.
- from market prices changes unpredictably due to external events (news, other traders, economic data).
- network latency, competing traders' behavior are stochasic and independent, random fluctuations can affect the outcome.
- buy action could lead to profit, loss, or no change depending on uncontrollable external factors.
- AI Helpdesk Chatbot
- even when the chatbot executes the same action, it sends queries to multiple backend systems, the result may differ each time.
- network latency or failures, backend load, concurrent requests, and external dependencies can lead to different response times or outcomes.
- same command doesn't guarantee the same outcome.
- Autonomous Driving
- Partial Observability
- Sensors cannot detect everything, blind spots, weather effects, occluded vehicles.
- Non-Determinism
- Even if the car signals a turn and starts moving, other drivers might brake suddenly, pedestrians might cross unexpectedly, or traffic lights might malfunction.
- depending on random external events.
- Medical Diagnosis & Treatment Assistant
- Partial Observability
- The patient's internal health state is not directly visible.
- tests can be inaccurate or incomplete: false positives/negatives, missing data
- Non-Determinism
- The same treatment can have different effects on different patients due to genetics, lifestyle, or random biological responses.
- A drug may succeed, fail, or cause side effects.
Bayes' Theorem
P(A∣B)=P(B)P(B∣A)⋅P(A)
- P(A∣B): Posterior probability, the probability of event A given taht B has occurred.
- P(B∣A): Likelihood, the probability of event B occurring given that A is true.
- P(A): Prior probability, the initial probability of A being true before considering B.
- P(B): Evidence, the total probability of B occurring.
- examples
- 1% of people have a certain disease (prior probability). P(Disease)=0.01
- the test for the disease is 99% accurate (likelihood).
- P(PositiveTest∣Disease)=0.99
- P(NegativeTest∣NoDisease)=0.99
- so P(PositiveTest∣NoDisease)=0.01
- P(PositiveTest)=P(PositiveTest∣Disease)⋅P(Disease)+P(PositiveTest∣NoDisease)⋅P(NoDisease)
- =0.99⋅0.01+0.01⋅0.99=0.0198
- P(Disease∣PositiveTest)=P(PositiveTest)P(PositiveTest∣Disease)⋅P(Disease)
- =0.01980.99⋅0.01≈0.5
- the conditional probability P(effect∣cause) quantifies the relationship in the causal direction from cause to effect.
- but P(cause∣effect) is often what we really want to know, describing the relationship in the diagnostic direction from effect to cause.
- In medical dignosis, the doctor knows P(symptoms∣disease) from medical studies, and want to derive a P(disease∣symptoms) for a particular patient.
- P(disease∣symptoms)=P(symptoms)P(symptoms∣disease)⋅P(disease)
P(Y∣X)=α⋅P(X∣Y)⋅P(Y)
- α is the normalization constant needed to make the entries in P(Y∣X) sum to 1 for each value of X.
- conditional independency
- two variables X and Y are conditionally independent given a third variable Z if the value of X provides no information about Y once Z is known.
- P(X,Y∣Z)=P(X∣Z)⋅P(Y∣Z)
Constructing Bayesian Network
- constructing a Bayesian network in which resulting joint probability distributions are a good representation of the agent's knowledge.
-
- P(x1,...,xn)=∏i=1nP(xi∣parents(xi))
-
- rewrite the entries in the joint probability distribution in terms of conditional probabilities using the product rule.
- P(x1∧...∧xn)=P(xn∣xn−1∧...∧x1)⋅P(xn−1∧...∧x1)
-
- repeat step 2 until all variables are included.
- P(x1,...,xn)=P(xn∣xn−1∧...∧x1)⋅P(xn−1∣xn−2∧...∧x1)⋯P(x2∣x1)⋅P(x1)
Knowledge Representation
- The agent's knowledge can at best provide only a degree of belief in the relevant logical sentences.
- A probabilistic agent may have a numerical degree of belief between 0 and 1.
Bayesian Network
- a directed graph
- each node repressents a random variable and edge with an arrow from X to Y represents that X is a parent of Y.
- each node is associated with a probability distribution
- if a node Y has parents X, the Y node has a conditional probability distribution with respect to its parents X, i.e. P(Y∣parents(Y))
- if a node has no any parents, it has an unconditional probability distribution.
Burglary Example
- a new burglar alarm installed at home.
- it is reliable at detecting a burglary, but also responds on occasion to minor earthquakes
- you have two neighbors, John and Mary who promised to call you at work when they hear the alarm.
- given the evidence of who has or has not called, estimate the probability of a burglary.
- Knowledge representation as the probability of each event that can occur in this case.
P(eventj)=total number of experimentsnumber of ccurrences of event j
- calulate the full joint probability distribution for all relevant variables.
- specifying probabilities for all possible events one by one is unnatural and tedious.
P(B,E,A,J,M)=∏i5P(Xi∣parents(Xi))=P(B)⋅P(E)⋅P(A∣B,E)⋅P(J∣A)⋅P(M∣A)
P(M,J,A,E,B)=P(M∣J,A,E,B)⋅P(J,A,E,B)P(J∣A,E,B)⋅P(A,E,B)P(A∣E,B)⋅P(E,B)P(E∣B)⋅P(B)=P(M∣J,A,E,B)⋅P(J∣A,E,B)⋅P(A∣E,B)⋅P(E∣B)⋅P(B)=P(M∣J,A,E,B)⋅P(J∣A,E,B)⋅P(A∣E,B)⋅P(E)⋅P(B)i.e. P(E∣B)=P(E) (Earthquake is independent of Burglary)=P(M∣A)⋅P(J∣A)⋅P(A∣E,B)⋅P(E)⋅P(B)i.e. P(J∣A,E,B)=P(J∣A) (JohnCalls depends only on Alarm)i.e. P(M∣J,A,E,B)=P(M∣A) (MaryCalls depends only on Alarm)
P(M,J,A,E,B)=P(B)⋅P(E)⋅P(A∣B,E)⋅P(J∣A)⋅P(M∣A)
Probabilistic reasnoing
- to explain how to use network models to reason under uncertainty.
- Exact inference: calculate the exact posterior probabilities.
- Approximate inference: Stochastic approximation techniques such as likelihood weighting and Markov Chain Monte Carlo
- can give reasonable estimates of the true posterior probabilities in a network
- can cope with much larger networks than exact inference algorithms.
- Static worlds: Each random variable has a single fixed value.
- Dynamic worlds: view the world as a series of snapshots, or time slices.
- Each of snapshots contains a set of random variables, some observed and some hidden.
- The values of these variables can change over time.
- With the assumption of state sequences are Markov, the future is independent of the past given the present.
Inference in Bayesian Networks
- Computing the posterior probability distribution of a set of query variables, given some observed events, called some assignment of values to a set of evidence variables.
- X: query variable(s)
- E: the set of evidence variables with observed values E1,...,Em
- weather report says it is cloudy.
- weather report says there is high humidity.
- e: a particular observed event.
- E1=cloudy
- E2=high humidity
- Y: the non-evidence, non-query variables.
- the other variables in the network.
- X={X}∪E∪Y
- the complete set of variables in the network.
- P(X∣e): the posterior probability distribution of the query variable(s) given the evidence.
- P(X∣e)=α∑yP(X,e,y)
Exact inference in Bayesian Networks
- One exact inference method is called by inference by enumeration.
- Observed some event A=true and B=true, want to compute the posterior probability distribution P(C∣A,B).
- Use the full conditional probability table
- Use the Bayesian Network
- It calculates all the required probability components in each branch in the evaluation tree, and this results in repetitive calculations.
- To avoid this, we can use variable elimination algorithm and clustering algorithms.
- Calculation of probability distribution for a query variable is complexity exponential.
- Enumeration method can become intractable in large, multiple connected networks.
-> use approximate inference methods.
For any Bayesian network with given nodes, X={X1,X2,...,Xn}, the joint probability distribution is given by:
P(X)=P(X1∧X2∧…∧Xn)=∏i=1nP(Xi∣parents(Xi))
Using the Bayesian network, we can compute the conditional probability.
P(B∣event)=α∑e∑aP(B,E,A,j,m)
- i=1,P(X1∣parents(X1))=P(B)
- i=2,P(X2∣parents(X2))=P(E)
- i=3,P(X3∣parents(X3))=P(A∣B,E)
- i=4,P(X4∣parents(X4))=P(j∣A)
- i=5,P(X5∣parents(X5))=P(m∣A)
- P(B,E,A,j,m)=P(B)⋅P(E)⋅P(A∣B,E)⋅P(j∣A)⋅P(m∣A)
P(B∣event)=αe∑a∑P(B,E,A,j,m)=αe∑a∑P(B)⋅P(E)⋅P(A∣B,E)⋅P(j∣A)⋅P(m∣A)=α⋅P(B)e∑P(E)a∑P(A∣B,E)⋅P(j∣A)⋅P(m∣A)
Berglary example calculation:
P(b∣event)=αP(b)e∑P(E)a∑P(A∣b,E)⋅P(j∣A)⋅P(m∣A)=αP(b)[P(e)a∑P(A∣b,e)⋅P(j∣A)⋅P(m∣A)+P(¬e)a∑P(A∣b,¬e)⋅P(j∣A)⋅P(m∣A)]a∑P(A,b,e)⋅P(j∣A)⋅P(m∣A)=P(a,b,e)⋅P(j∣a)⋅P(m∣a)+P(¬a,b,e)⋅P(j∣¬a)⋅P(m∣¬a)a∑P(A∣b,¬e)⋅P(j∣A)⋅P(m∣A)=P(a,b,¬e)⋅P(j∣a)⋅P(m∣a)+P(¬a,b,¬e)⋅P(j∣¬a)⋅P(m∣¬a)=αP(b)[P(e)(P(a∣b,e)⋅P(j∣a)⋅P(m∣a)+P(¬a∣b,e)⋅P(j∣¬a)⋅P(m∣¬a))+P(¬e)(P(a∣b,¬e)⋅P(j∣a)⋅P(m∣a)+P(¬a∣b,¬e)⋅P(j∣¬a)⋅P(m∣¬a))]