본문으로 건너뛰기

Full Stack JavaScript Developer | Half-time Open Sourcerer.

모든 저자 보기

Location-Aware Deep Neural Network Review

· 약 3분

Architecture

Wall Detection

Wall Detection Process

  1. Pre-process floor images to gray scale.
  2. Apply Otsu's threshold and thinned method.
  3. Enhance the contrast of wall lines using Canny edge detection.
  4. Employ Hough Transform to detect and map wall lines.

3D Visualization

Limitations

  • Building structural details are unavailable or drone operations are restricted.
  • Highly irregular floor plans or buildings constructed with unique materials not extensively represented in the training data.
  • Real-time data integration
  • Refined deep learning architectures, and validation across varied building materials and layouts
  • Enhance the framework’s scalability and practical utility.

Spider GAN

Sample

{
"nodes": [
{ "id": 0, "name": "Drone", "type": "source", "features": [1, 0, 0, 0, 0.0] },
{ "id": 1, "name": "Room1", "type": "room", "features": [0, 1, 0, 0, -70.0] },
{ "id": 2, "name": "Room2", "type": "room", "features": [0, 1, 0, 0, -75.0] },
{ "id": 3, "name": "Room3", "type": "room", "features": [0, 1, 0, 0, -80.0] },
{ "id": 4, "name": "Corridor","type": "corridor", "features": [0, 0, 1, 0, -72.0] },
{ "id": 5, "name": "Wall_R1_R2","type": "wall", "features": [0, 0, 0, 1, 0.0] },
{ "id": 6, "name": "Wall_R2_R3","type": "wall", "features": [0, 0, 0, 1, 0.0] }
],

"edges": [
{ "source": 0, "target": 1, "attenuation_db": 5.0 },
{ "source": 1, "target": 5, "attenuation_db": 8.0 },
{ "source": 5, "target": 2, "attenuation_db": 8.0 },
{ "source": 2, "target": 6, "attenuation_db": 10.0 },
{ "source": 6, "target": 3, "attenuation_db": 10.0 },
{ "source": 2, "target": 4, "attenuation_db": 3.0 }
]
}

Ref

  • Hason Rudd, D., Sanin, C., En, K. M., Gao, X., Islam, M. R., Hasan, M., Wang, X., Huo, A., & Xu, G. (2025). Location-Aware Deep Neural Network for Predicting Indoor 5G RSSI and CQI Using Drone-Based External RF Sensing. Procedia Computer Science, 270, 4765–4775. https://doi.org/10.1016/j.procs.2025.09.602

Vocabulary for AI 012

· 약 3분

Vocabulary & Expressions

Term/ExpressionDefinitionSimpler ParaphraseMeaning
pluralitythe state of being plural or multiplemultiplicity복수성, 다수
premisea statement or proposition from which another is inferred or follows as a conclusionassumption전제
taxonomythe classification of something, especially organismsclassification분류
PII scrubbedthe process of removing personally identifiable information from data setsdata anonymization개인 식별 정보 제거
Air-gapped Environmenta secure computer network that is physically isolated from unsecured networksisolated network격리된 네트워크
attenuationthe reduction of the force, effect, or value of somethingreduction감쇠
CQIChannel Quality Indicator; a measure of the quality of a communication channelchannel quality measure채널 품질 지표
IBCIn-Building Coverage; the ability of a wireless communication system to provide coverage within buildingsindoor coverage실내 커버리지
RSRPReference Signal Received Power; a measure of the power level of a reference signal in a wireless communication systemreference signal power measure기준 신호 수신 전력
RSRQReference Signal Received Quality; a measure of the quality of a reference signal in a wireless communication systemreference signal quality measure기준 신호 수신 품질
RSSIReceived Signal Strength Indicator; a measure of the power present in a received radio signalsignal strength measure수신 신호 세기 지표
encompassto surround or include comprehensivelyto include포함하다, 둘러싸다
accommodateto provide space or make adjustments for somethingto make room for수용하다, 적응하다
acquisitionthe process of getting somethingobtaining획득, 습득
compriseto consist of or be made up ofto include포함하다, 구성하다
attenuation coefficienta measure of how much a material reduces the intensity of a signal passing through itsignal reduction measure감쇠 계수
calibrationthe process of adjusting and standardizing the accuracy of a measuring instrument or systemadjustment for accuracy보정, 교정
obstructionsomething that blocks or hinders progress or movementblockage장애물
annotateto add notes or comments to a text or diagramto comment on주석을 달다
equidistantlyat equal distancesuniformly spaced등거리로
intricatevery complicated or detailedcomplex복잡한, 정교한
induceto cause something to happen or existto bring about유도하다, 초래하다
delineateto describe or mark the edge of somethingto outline윤곽을 그리다, 묘사하다
account for sthto explain or justify somethingto explain~을 설명하다
calibrateto adjust or standardize a measuring instrument or systemto adjust for accuracy보정하다, 교정하다
extrudeto force or push something outto expel밀어내다, 추출하다
affirmto state or assert positivelyto confirm확언하다, 단언하다
substantiallyto a great extent or degreesignificantly상당히, 크게
applicabilitythe quality of being relevant or suitablerelevance적용 가능성
adaptabilitythe ability to adjust to new conditionsflexibility적응성
interfereto obstruct or disruptto hinder방해하다
degradationthe process of declining in quality or conditiondeterioration저하, 악화
counterargumentan argument made to oppose another argumentopposing argument반론
walk sb through sthto slowly and carefully explain something to someone or show someone how to do somethingto explain step-by-step~에게 ~을 차근차근 설명하다
transitivitythe property of a relation where if A relates to B and B relates to C, then A relates to Crelational property추이성
exhaustiveincluding all possibilities; comprehensivecomplete철저한, 포괄적인

IAI +012

· 약 8분

Structural Knowledge

  • Structural knowledge: the relationship between concepts and objects in the world.
  • Hierarchical approach: built on classification and uses hierarchies as the structures for knowledge representation, presented in a graphical format.
    • Semantic networks: nodes represent concepts, and edges represent relationships between concepts.
    • Ontologies
      • A formal specification of concepts, relationships, and constraints within a domain that enables machines to reason automatically.
      • It defines classes (concepts), subclasses (hierarchical relations), and properties (attributes or relations) that describe how entities interact.
      • Unlike simple taxonomies, ontologies also include logical axioms and constraints that specify allowable relationships and permit inference and consistency checking through automated reasoning.
      • Classes: Main concepts or categories (e.g. Person, Animal).
      • Subclasses: Subcategories (e.g. Dog ⊂ Animal).
      • Properties: Attributes (e.g. hasPart, hasColor).
      • Relations: Connection rules (e.g. MemberOf, SubsetOf).
      • Constraints: Logical constraints (e.g. disjointness, transitivity) that enable inference.

Requirement of hierarchices

  • Inclusiveness: Dog ⊆ Mammal ⊆ Animal
  • Species/differentia: Dog = Mammal + (barks, canine traits, etc.)
  • Inheritance: Mammal: has_fur, gives_live_birthDog: inherits all Mammal properties
  • Transitivity: If Dog ⊆ Mammal and Mammal ⊆ Animal, then Dog ⊆ Animal
  • Systematic and predictable rules for association and distinction
  • Mutual exclusivity: Reptile ∩ Mammal = ∅
  • Necessary and sfficient criteria:
    • Necessary: x, Mammal(x)Vertebrate(x)ProducesMilk(x)\forall x,\ \text{Mammal}(x) \Rightarrow \text{Vertebrate}(x) \land \text{ProducesMilk}(x)
    • Sufficient: x, Vertebrate(x)ProducesMilk(x)Mammal(x)\forall x,\ \text{Vertebrate}(x) \land \text{ProducesMilk}(x) \Rightarrow \text{Mammal}(x)

Advantage of hierarchical approach

  • Inferring from incomplete evidence (if the shared criteria are not obvious or easily observable).
    • Animal → Mammal → Dog: If an entity is classified as a Mammal, we can infer properties such as having fur and giving live birth, even if the entity is not explicitly identified as a Dog.
  • Excellent representations in mature domains
    • Domains where entities and relationships are well understood and stable
    • e.g. medical diagnosis, biological taxonomy, type systems in programming languages.
  • Useful for entities that are well defined and have clear class boudnaries.
    • Good fit: HTTP status codes, chemical elements, and biological species
    • Poor fit: emotions, social roles, and cultural practices
  • Some theory or model is necessary to guide the identification
    • Provides criteria for defining entities and relationships
    • e.g. Evolutionary theory in biological taxonomy, Type theory in programming languages

Partition

A partition of a category is a set of subcategories that form a disjoint, exhaustive composition of that category.

  • Disjoint: Two or more categories are disjoint if they don't share common members.
  • Exhaustive composition: The subcategories together cover all members of the parent category, leaving no member unclassified.
  • Examples:
    • ❌ Category: Animal (Not a partition)
      • Mammal, Bird (Reptiles, fish, insects are missing, not exhaustive)
    • ✅ Category: Integer (Partition)
      • Even, Odd (Disjoint and exhaustive)

Physical composition

  • PartOf Relation: Partof(a, b) is a relation representing that one thing, 'a', is a part of another thing, 'b'.
  • BunchOf Relation: BunchOf(a) is a relation, taking a set of objects 'a', to represent a composite object made up of those parts.
  • Examples:
    • Partof(Wheel, Car): A wheel is part of a car (one-to-one relation).
    • BunchOf({Wheel1, Wheel2, Wheel3, Wheel4}): A car is a bunch of four wheels (many-to-one relation).
  • Link between PartOf and BunchOf:
    • x(xs    PartOf(x,BunchOf(s)))\forall x (x \in s \implies PartOf(x, BunchOf(s)))
    • y[(x(xs    PartOf(x,y))    PartOf(BunchOf(s),y)]\forall y \Big[\big(\forall x (x \in s \implies PartOf(x, y)\big) \implies PartOf(BunchOf(s), y)\Big]
  • Why useful?
    • Reasoning from individual parts -> group -> larger object.
    • Avoiding ambiguity between: "this thing is part of", "these things together form"
    • Without BunchOf, ontologies cannot represent: piles, colleciton, aggregates, composite physical structures.

Measurements

  • Quantitive measures (Ratio, Interval)
    • Represented as numbers with units
    • Support arithmetic and unit conversion
    • Enable numeric reasoing (e.g. 2.54cm = 1 inch)
  • Non-quantitative measures (Ordinal)
    • Cannot be meaningufully represented as numbers
    • Can still be compared using ordering relactions (<, >, =)
    • Suppor qualitative reasoning (e.g. one task is more difficult than another)

Objects

  • Stuff
    • Represents substances
    • Uncountable masses
    • Definitions include only intrinsic properties (e.g. Butter, Unsalted Butter)
    • bButterPartOf(p,b)    pButterb \in Butter \land PartOf(p, b) \implies p \in Butter
  • Things
    • Represents discrete objects
    • Countable entities
    • Definitions include extrinsic properties (e.g. PoundOfButter, StickOfButter)
      • It depends on measurement
      • It depends on contextual constraints

Time

  • Fluent: a condition whose truth value can change over time.
    • e.g. "The box is on the table", On(box,table)On(box, table), On(box,table,t)On(box, table, t)
    • a time-dependent proposition.
  • Time scale and absolute time
    • Ontology represents time along a single continuous timeline with a fixed reference point.
    • Date(0,20,21,24,1,1995)=Seconds(300000000)Date(0, 20, 21, 24, 1, 1995) = Seconds(300000000)
    • It allows arithmetic operations on time and comparisons between time points.
  • Time intertvals
    • Time can be represented as moments (instants) or intervals (durations).
    • Duration(i)=Time(End(i))Time(Start(i))Duration(i) = Time(End(i)) - Time(Start(i))
  • Partition of time
    • Partition(Moments,ExtendedIntervals,Intervals)Partition({Moments, ExtendedIntervals}, Intervals)
    • All time intervals can be partitioned into moments and extended (non-zero-length) intervals.
    • It provides a complete and precise ontology of time.

Event calculus

  • T(f,t1,t2)T(f, t_1, t_2): Fluent ff is true for all time between time t1t_1 and t2t_2.
  • Happens(e,t1,t2)Happens(e, t_1, t_2): Event ee start at time t1t_1 and ends at time t2t_2.
  • Initiate(e,f,t)Initiate(e, f, t): Event ee causes fluent ff to become true at time tt.
  • Terminates(e,f,t)Terminates(e, f, t): Event ee causes fluent ff to cease to be true at time tt.
  • Initiated(f,t1,t2)Initiated(f, t_1, t_2): Fluent ff becomes true at some point between t1t_1 and t2t_2.
  • Terminated(f,t1,t2)Terminated(f, t_1, t_2): Fluent ff ceases to be true at some point between t1t_1 and t2t_2.
  • t1<t2t_1 < t_2: Time point t1t_1 occurs before time point t2t_2.
  • Happens(PutBoxOnTable, 10, 12)
  • Initiate(PutBoxOnTable, On(Box, Table), 12)
    • Initiated(On(box, table), 10, 12)
  • T(On(Box, Table), 12, 20)
  • Terminates(RemoveBoxFromTable, On(Box, Table), 20)
    • Terminated(On(box, table), 18, 22)

Successor-state axiom

후속상태공리

  • Define how the world changes after an action occurs.
    • what changes when an action happens
    • what stays the same.
  • Without successor-state axioms, we face the frame problem:
    • after every action, we would need to explicitly list all facts that did not change.
  • Move(A,B,X)Move(A, B, X): moving block A from the top of block B to position X.
    • Preconditions:
      • On(A,B)On(A, B): A is on top of B
      • Clear(A)Clear(A): nothing is on top of A
      • Clear(X)Clear(X): nothing is on position X
    • Effects:
      • On(A,X)On(A, X): A is now on position X
      • Clear(B)Clear(B): B is now clear
      • Clear(X)Clear(X): X is no longer clear (X is now occupied by A)

Semantic networks

  • Visually represent a knowledge base.
  • Support efficient inference.
  • Allow properties of an object ot be inferred from its category membership.
  • Representing individual objects, categories of objects, and relations among objects.
  • Categories are the primary buildling blokcs of large-scale knowledge representation schemes.

Taxonomy hierarchy

  • A hierarchical structure of categories
  • Each lower category is a more specific kind of its parent
  • Organized from general → specific

University Ontology

EntityClassExample Statemnets
AliceStudentenrolledIn(CS101)enrolledIn(CS101), assessedBy(Prof.Smith)assessedBy(Prof.Smith)
Prof.SmithLecturermemberOf(Lecturer)memberOf(Lecturer), teaches(CS101)teaches(CS101)
CS101CoursebelongsTo(CSDepartment)belongsTo(CSDepartment)

Knowledge graph

  • represents information and its relationships using a graph structure.
  • Nodes: entities or concepts (e.g. people, places, things).
  • Edges: relationships between nodes (e.g. "is a", "part of", "located in").

Types of Knowledge Graphs

  • General Knowledge Graphs
  • Domain-Specific Knowledge Graphs
  • Semantic Knowledge Graphs
  • Social Knowledge Graphs
  • Temporal Knowledge Graphs
  • Special Knowledge Graphs
  • Statistical Knowledge Graphs
  • Probabilistic Knowledge Graphs
  • Textual Knowledge Graphs
  • Multi-modal Knowledge Graphs

General Knowledge Graphs

  • Comprehensive information representation
  • Entity-Relationship Structure
  • Linked Data
  • Semantic Enrichment
  • Capabilities & Use Cases
    • Querying and Analysis
    • Data Integration
    • ML and AI applications

Examples

  • DBPedia: RDF, resource descritions framework
  • Wikidata
  • YAGO
  • Google Knowledge Graph
  • Microsoft Academic Graph
  • IBM Watson Knowledge Studio

Reasoning for categories

  • Infer the presence of certain objects from perceptual input
  • Infer category membership from perceived properties
  • Use category information to make predictions
  • It enables an agent to identify objects from observed properties and to predict further characteristics using category knowledge.

Reasoning using Semantic networks

  • Semantic networks are:
    • systems designed specifically for organizing categories reasoning with categories.
    • Provide graphical representations of a knowledge base
  • Using a semanctic network, reasoning can be performed based on:
    • relationships between objects
    • category membership
    • inheritance between categories
    • properties associated with categories
    • Allows an object to inherit general knowledge from its category.
  • The inheritance algorithm:
    • Starts from the object itself
    • Follows links upwards through the category hierarchy
    • Stops as soon as it finds a value for the property
    • This suppors efficient reasoning, default values, and exception handling.

RDF

Resource Description Framework (RDF)

  • Triple Structure: subject -> predicate -> object
  • Subject: represents the resource being described or identified by a URI.
  • Predicate (or Property): desribes the relationship between the subject and the object. also represented by a URI.
  • Object: represents the value or target of the relationship. It can be a URI or a literal (a string, number, or date).

IAI +011

· 약 10분

Uncertain Domain

  • Agent may need to handle uncertainty
  • Environment is partially observable
    • Fully observable: An agent knows where it is.
    • The drone never knows its exact position, only an estimated one.
    • The true user intent is not directly observable, only noisy speech data.
    • It cannot see the whole environment or detect humans perfectly.
    • The true health state (disease present or not) is hidden.
  • Environment is non-deterministic
    • AI Trading Agent
      • even if the agent takes the same action, the outcome can vary dramatically.
      • from market prices changes unpredictably due to external events (news, other traders, economic data).
      • network latency, competing traders' behavior are stochasic and independent, random fluctuations can affect the outcome.
      • buy action could lead to profit, loss, or no change depending on uncontrollable external factors.
    • AI Helpdesk Chatbot
      • even when the chatbot executes the same action, it sends queries to multiple backend systems, the result may differ each time.
      • network latency or failures, backend load, concurrent requests, and external dependencies can lead to different response times or outcomes.
      • same command doesn't guarantee the same outcome.
  • Autonomous Driving
    • Partial Observability
      • Sensors cannot detect everything, blind spots, weather effects, occluded vehicles.
    • Non-Determinism
      • Even if the car signals a turn and starts moving, other drivers might brake suddenly, pedestrians might cross unexpectedly, or traffic lights might malfunction.
      • depending on random external events.
  • Medical Diagnosis & Treatment Assistant
    • Partial Observability
      • The patient's internal health state is not directly visible.
      • tests can be inaccurate or incomplete: false positives/negatives, missing data
    • Non-Determinism
      • The same treatment can have different effects on different patients due to genetics, lifestyle, or random biological responses.
      • A drug may succeed, fail, or cause side effects.

Bayes' Theorem

P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

  • P(AB)P(A|B): Posterior probability, the probability of event A given taht B has occurred.
  • P(BA)P(B|A): Likelihood, the probability of event B occurring given that A is true.
  • P(A)P(A): Prior probability, the initial probability of A being true before considering B.
  • P(B)P(B): Evidence, the total probability of B occurring.
  • examples
    • 1% of people have a certain disease (prior probability). P(Disease)=0.01P(Disease) = 0.01
    • the test for the disease is 99% accurate (likelihood).
      • P(PositiveTestDisease)=0.99P(PositiveTest|Disease) = 0.99
      • P(NegativeTestNoDisease)=0.99P(NegativeTest|NoDisease) = 0.99
      • so P(PositiveTestNoDisease)=0.01P(PositiveTest|NoDisease) = 0.01
    • P(PositiveTest)=P(PositiveTestDisease)P(Disease)+P(PositiveTestNoDisease)P(NoDisease)P(PositiveTest) = P(PositiveTest|Disease) \cdot P(Disease) + P(PositiveTest|NoDisease) \cdot P(NoDisease)
      • =0.990.01+0.010.99=0.0198= 0.99 \cdot 0.01 + 0.01 \cdot 0.99 = 0.0198
    • P(DiseasePositiveTest)=P(PositiveTestDisease)P(Disease)P(PositiveTest)P(Disease|PositiveTest) = \frac{P(PositiveTest|Disease) \cdot P(Disease)}{P(PositiveTest)}
      • =0.990.010.01980.5= \frac{0.99 \cdot 0.01}{0.0198} \approx 0.5
  • the conditional probability P(effectcause)P(effect|cause) quantifies the relationship in the causal direction from cause to effect.
  • but P(causeeffect)P(cause|effect) is often what we really want to know, describing the relationship in the diagnostic direction from effect to cause.
    • In medical dignosis, the doctor knows P(symptomsdisease)P(symptoms|disease) from medical studies, and want to derive a P(diseasesymptoms)P(disease|symptoms) for a particular patient.
    • P(diseasesymptoms)=P(symptomsdisease)P(disease)P(symptoms)P(disease|symptoms) = \frac{P(symptoms|disease) \cdot P(disease)}{P(symptoms)}

General Form of Bayes' Rule

P(YX)=αP(XY)P(Y)P(Y|X) = \alpha \cdot P(X|Y) \cdot P(Y)

  • α\alpha is the normalization constant needed to make the entries in P(YX)P(Y|X) sum to 1 for each value of X.
  • conditional independency
    • two variables X and Y are conditionally independent given a third variable Z if the value of X provides no information about Y once Z is known.
    • P(X,YZ)=P(XZ)P(YZ)P(X, Y|Z) = P(X|Z) \cdot P(Y|Z)

Constructing Bayesian Network

  • constructing a Bayesian network in which resulting joint probability distributions are a good representation of the agent's knowledge.
    1. P(x1,...,xn)=i=1nP(xiparents(xi))P(x_1, ..., x_n) = \prod_{i=1}^{n} P(x_i | \text{parents}(x_i))
    1. rewrite the entries in the joint probability distribution in terms of conditional probabilities using the product rule.
    • P(x1...xn)=P(xnxn1...x1)P(xn1...x1)P(x_1 \land ... \land x_n) = P(x_n | x_{n-1} \land ... \land x_{1}) \cdot P(x_{n-1} \land ... \land x_{1})
    1. repeat step 2 until all variables are included.
    • P(x1,...,xn)=P(xnxn1...x1)P(xn1xn2...x1)P(x2x1)P(x1)P(x_1, ..., x_n) = P(x_n | x_{n-1} \land ... \land x_{1}) \cdot P(x_{n-1} | x_{n-2} \land ... \land x_{1}) \cdots P(x_2 | x_1) \cdot P(x_1)

Knowledge Representation

  • The agent's knowledge can at best provide only a degree of belief in the relevant logical sentences.
  • A probabilistic agent may have a numerical degree of belief between 0 and 1.

Bayesian Network

  • a directed graph
  • each node repressents a random variable and edge with an arrow from X to Y represents that X is a parent of Y.
  • each node is associated with a probability distribution
  • if a node Y has parents X, the Y node has a conditional probability distribution with respect to its parents X, i.e. P(Yparents(Y))P(Y| \text{parents}(Y))
  • if a node has no any parents, it has an unconditional probability distribution.

Burglary Example

  • a new burglar alarm installed at home.
  • it is reliable at detecting a burglary, but also responds on occasion to minor earthquakes
  • you have two neighbors, John and Mary who promised to call you at work when they hear the alarm.
  • given the evidence of who has or has not called, estimate the probability of a burglary.
  • Knowledge representation as the probability of each event that can occur in this case.

P(eventj)=number of ccurrences of event jtotal number of experimentsP(\text{event}_{j}) = \frac{\text{number of ccurrences of event }_j}{\text{total number of experiments}}

  • calulate the full joint probability distribution for all relevant variables.
    • specifying probabilities for all possible events one by one is unnatural and tedious.

P(B,E,A,J,M)=i5P(Xiparents(Xi))=P(B)P(E)P(AB,E)P(JA)P(MA)P(B, E, A, J, M) = \prod_{i}^{5} P(X_i | \text{parents}(X_i)) = P(B) \cdot P(E) \cdot P(A|B,E) \cdot P(J|A) \cdot P(M|A)

P(M,J,A,E,B)=P(MJ,A,E,B)P(J,A,E,B)P(JA,E,B)P(A,E,B)P(AE,B)P(E,B)P(EB)P(B)=P(MJ,A,E,B)P(JA,E,B)P(AE,B)P(EB)P(B)=P(MJ,A,E,B)P(JA,E,B)P(AE,B)P(E)P(B)i.e. P(EB)=P(E) (Earthquake is independent of Burglary)=P(MA)P(JA)P(AE,B)P(E)P(B)i.e. P(JA,E,B)=P(JA) (JohnCalls depends only on Alarm)i.e. P(MJ,A,E,B)=P(MA) (MaryCalls depends only on Alarm)\begin{align*} & P(M, J, A, E, B) = P(M | J, A, E, B) \cdot P(J, A, E, B) \\ & P(J | A, E, B) \cdot P(A, E, B) \\ & \quad P(A | E, B) \cdot P(E, B) \\ & \quad \quad P(E | B) \cdot P(B) \\ & = P(M|J,A,E,B) \cdot P(J|A,E,B) \cdot P(A|E,B) \cdot P(E|B) \cdot P(B) \\ & = P(M|J,A,E,B) \cdot P(J|A,E,B) \cdot P(A|E,B) \cdot P(E) \cdot P(B) \\ & \quad i.e.\space P(E|B) = P(E) \text{ (Earthquake is independent of Burglary)} \\ & = P(M|A) \cdot P(J|A) \cdot P(A|E,B) \cdot P(E) \cdot P(B) \\ & \quad i.e.\space P(J|A,E,B) = P(J|A) \text{ (JohnCalls depends only on Alarm)} \\ & \quad i.e.\space P(M|J,A,E,B) = P(M|A) \text{ (MaryCalls depends only on Alarm)} \end{align*}

P(M,J,A,E,B)=P(B)P(E)P(AB,E)P(JA)P(MA) P(M, J, A, E, B) = P(B) \cdot P(E) \cdot P(A|B,E) \cdot P(J|A) \cdot P(M|A)

Probabilistic reasnoing

  • to explain how to use network models to reason under uncertainty.
  • Exact inference: calculate the exact posterior probabilities.
  • Approximate inference: Stochastic approximation techniques such as likelihood weighting and Markov Chain Monte Carlo
    • can give reasonable estimates of the true posterior probabilities in a network
    • can cope with much larger networks than exact inference algorithms.
  • Static worlds: Each random variable has a single fixed value.
  • Dynamic worlds: view the world as a series of snapshots, or time slices.
    • Each of snapshots contains a set of random variables, some observed and some hidden.
    • The values of these variables can change over time.
    • With the assumption of state sequences are Markov, the future is independent of the past given the present.

Inference in Bayesian Networks

  • Computing the posterior probability distribution of a set of query variables, given some observed events, called some assignment of values to a set of evidence variables.
  • XX: query variable(s)
    • is it raining?
  • EE: the set of evidence variables with observed values E1,...,EmE_{1}, ..., E_{m}
    • weather report says it is cloudy.
    • weather report says there is high humidity.
  • ee: a particular observed event.
    • E1=cloudyE_{1} = \text{cloudy}
    • E2=high humidityE_{2} = \text{high humidity}
  • YY: the non-evidence, non-query variables.
    • the other variables in the network.
  • X={X}EYX = \{X\} \cup E \cup Y
    • the complete set of variables in the network.
  • P(Xe)P(X|e): the posterior probability distribution of the query variable(s) given the evidence.
  • P(Xe)=αyP(X,e,y)P(X|e) = \alpha \sum_{y} P(X, e, y)

Exact inference in Bayesian Networks

  • One exact inference method is called by inference by enumeration.
  • Observed some event A=trueA = true and B=trueB = true, want to compute the posterior probability distribution P(CA,B)P(C|A, B).
    • Use the full conditional probability table
    • Use the Bayesian Network
  • It calculates all the required probability components in each branch in the evaluation tree, and this results in repetitive calculations.
  • To avoid this, we can use variable elimination algorithm and clustering algorithms.
  • Calculation of probability distribution for a query variable is complexity exponential.
  • Enumeration method can become intractable in large, multiple connected networks. -> use approximate inference methods.

For any Bayesian network with given nodes, X={X1,X2,...,Xn}X = \{X_1, X_2, ..., X_n\}, the joint probability distribution is given by: P(X)=P(X1X2Xn)=i=1nP(Xiparents(Xi))P(X) = P(X_1 \land X_2 \land \ldots \land X_n) = \prod_{i=1}^{n} P(X_i | \text{parents}(X_i))

Using the Bayesian network, we can compute the conditional probability.

P(Bevent)=αeaP(B,E,A,j,m) P(B | event) = \alpha \sum_{e}\sum_{a} P(B, E, A, j, m)

  • i=1,P(X1parents(X1))=P(B)i=1, P(X_1 | \text{parents}(X_1)) = P(B)
  • i=2,P(X2parents(X2))=P(E)i=2, P(X_2 | \text{parents}(X_2)) = P(E)
  • i=3,P(X3parents(X3))=P(AB,E)i=3, P(X_3 | \text{parents}(X_3)) = P(A|B,E)
  • i=4,P(X4parents(X4))=P(jA)i=4, P(X_4 | \text{parents}(X_4)) = P(j|A)
  • i=5,P(X5parents(X5))=P(mA)i=5, P(X_5 | \text{parents}(X_5)) = P(m|A)
  • P(B,E,A,j,m)=P(B)P(E)P(AB,E)P(jA)P(mA)P(B, E, A, j, m) = P(B) \cdot P(E) \cdot P(A|B,E) \cdot P(j|A) \cdot P(m|A)
P(Bevent)=αeaP(B,E,A,j,m)=αeaP(B)P(E)P(AB,E)P(jA)P(mA)=αP(B)eP(E)aP(AB,E)P(jA)P(mA)\begin{align*} & P(B | event) = \alpha \sum_{e}\sum_{a} P(B, E, A, j, m) \\ & = \alpha \sum_{e}\sum_{a} P(B) \cdot P(E) \cdot P(A|B,E) \cdot P(j|A) \cdot P(m|A) \\ & = \alpha \cdot P(B) \sum_{e} P(E) \sum_{a} P(A|B,E) \cdot P(j|A) \cdot P(m|A) \end{align*}

Berglary example calculation:

P(bevent)=αP(b)eP(E)aP(Ab,E)P(jA)P(mA)=αP(b)[P(e)aP(Ab,e)P(jA)P(mA)+P(¬e)aP(Ab,¬e)P(jA)P(mA)]aP(A,b,e)P(jA)P(mA)=P(a,b,e)P(ja)P(ma)+P(¬a,b,e)P(j¬a)P(m¬a)aP(Ab,¬e)P(jA)P(mA)=P(a,b,¬e)P(ja)P(ma)+P(¬a,b,¬e)P(j¬a)P(m¬a)=αP(b)[P(e)(P(ab,e)P(ja)P(ma)+P(¬ab,e)P(j¬a)P(m¬a))+P(¬e)(P(ab,¬e)P(ja)P(ma)+P(¬ab,¬e)P(j¬a)P(m¬a))]\begin{align*} & P(b | event) = \alpha P(b) \sum_{e} P(E) \sum_{a} P(A|b,E) \cdot P(j|A) \cdot P(m|A) \\ & = \alpha P(b) \bigg[ P(e) \sum_{a} P(A|b,e) \cdot P(j|A) \cdot P(m|A) + P(\neg e) \sum_{a} P(A|b,\neg e) \cdot P(j|A) \cdot P(m|A) \bigg] \\ & \quad \sum_{a} P(A,b,e) \cdot P(j|A) \cdot P(m|A) \\ & \quad = P(a,b,e) \cdot P(j|a) \cdot P(m|a) + P(\neg a,b,e) \cdot P(j|\neg a) \cdot P(m|\neg a) \\ & \quad \sum_{a} P(A|b,\neg e) \cdot P(j|A) \cdot P(m|A) \\ & \quad = P(a,b,\neg e) \cdot P(j|a) \cdot P(m|a) + P(\neg a,b,\neg e) \cdot P(j|\neg a) \cdot P(m|\neg a) \\ & = \alpha P(b) \bigg[ P(e) \big( P(a|b,e) \cdot P(j|a) \cdot P(m|a) + P(\neg a|b,e) \cdot P(j|\neg a) \cdot P(m|\neg a) \big) \\ & \quad \quad + P(\neg e) \big( P(a|b,\neg e) \cdot P(j|a) \cdot P(m|a) + P(\neg a|b,\neg e) \cdot P(j|\neg a) \cdot P(m|\neg a) \big) \bigg] \\ \end{align*}

FDA +011

· 약 4분

Bias

  • Bias quantifies how much on average the predicted values differ from the actual values.
  • High bias implies the model is under-performing.

Variance

  • Variance quantifies how sensitive the model is to small changes in the training samples.

Ensemble Methods

  • The intuition behind ensemble methods is to decrease bias and variance by using multiple machine learning algorithms.
  • Any machine learning algorithm can be used in ensemble methods.
    • decision trees, neural networks, logistic regressions, etc.
  • Base models are as diverse as possible.
  • Train each base model to predict as accurately as possible.

Sequential ensemble methods

  • Arrange weak learners in a sequence, such that weak learners learn from next learner in the sequence to create better predictive models.
  • Each model fits the residual of its predecessor.

Parallel ensemble methods

  • to use different variations of the same dataset, of the smae classifier, and aggregate the results.

Bootstraping

  • Sampling technique that creates multiple subsets of datasets from the original dataset.
  • when inferring results for a population from results found on a collection of smaller random samples of that population.

Bagging

Bootstrap aggregating

  • Generates subsets (bags) of traning data by sampling from the original training dataset with replacement.
  • To overcome the complexity of models that overfit the training data.

Boosting

  • fits multiple models sequentially
  • each model in the sequence is fitted giving more weight to the data points that were poorly handled by the previous models in the sequence.
  • This process is iterated until the error function does not change, or the maximum limit of the number of estimators is reached.

Random Forest

  • Shallow trees have lower variance and higher bias, whereas deep trees have low bias but high variance.
    • Shallow trees are chosen for sequential ensemble methods
    • Deep trees are chosen for bagging methods (or parallel ensemble methods).
  • a bagging method where deep trees are used and fitted on bootstrap samples, and then combined to produce an output with lower variance.
  • selects randomly a set of features which are then used to decide the best split at each node of the tree.
  • can be applied to both regression and classification tasks.
  • can learn binary features, categorical features and numerical features.
  1. create from the original dataset multiple bootstrap samples.
  2. at each node in the decision tree, a random set of features is considered to decide the most beneficial split.
  3. train a decision tree on each bootstrap sample.
  4. final prediction is computed by averaging the prediction from all decision trees combined.

Advantages of Random Forest

  • generally accurate, quick to train
  • can handle very large datasets
  • can estimate importance of features
  • generates an internal unbiased estimate of accuracy, which can use to know when to stop buildling
  • can handle missing data
  • can handle datasets with imbalanced classes
  • can compute proximity between data points
  • unsupervised for clustering and outlier detection

but don't handle large numbers of irrelevant attributes as well as some other methods.

Applications of Random Forest

IndustryApplicationsPurpose / Advantages
FinanceAssessing high credit-risk customers, detecting fraud, and addressing option pricing problemsPreferred over other algorithms due to its ability to minimize time spent on data management and pre-processing tasks
HealthcareGene expression classification, biomarker discovery, and sequence annotationHelps doctors estimate drug responses to specific medications
E-commerceRecommendation enginesUsed to achieve cross-selling objectives

AdaBoost

Adaptive Boosting

  • simplest boosting algorithm, usaully uses decision trees for modelling
  • multiple sequential models are created, each correcting the errors from the previous model.

GBM

Gradient Boosting Machine

  • works on both regression and classification problems.
  • usually, regression trees are used as base models.

XGBoost

eXtreme Gradient Boosting

  • another implementation of gradient boosting algorithm.
  • has proven to be a highly effective machine learning algorithm.
  • extensively used in machine learning competitions due to its speed and performance.

Combining predictions

Voting

  • Hard voting selection process uses predicted class labels for majority rule voting.
  • Soft voting uses the predicted probabilities given by each base model, and the class label with the maximum sum of its probabilities is selected.
    • 1Ni=1NPi(cx)\frac{1}{N} \sum_{i=1}^{N} P_{i}(c|x)
    • where NN is the number of base models, and Pi(cx)P_{i}(c|x) is the probability predicted by model ii for class cc given input xx.

FSD +009

· 약 5분

Exception

  • Exeptions are throwable objects.
  • Checked Exceptions are checked by the compiler and should be handled (thrown or caught).
  • Unchecked or runtime Exceptions are not flagged by the compiler. Their occurrence during execution interrupts the program.
  • Exceptions can be thrown by developers. Throwing an exception delegates handling the exception to a different class or different level of the program.
  • Thrown exception is left not handled, it will cause runtime error.

Error

  • Java Errors usually indicate problems with the JVM or system resources.
    • it is not meant to be caught or handled by applications.
    • the base class is Error.
  • Python is no separate Error class hierarchy. Errors are represented as exceptions.
    • ValueError, TypeError, MemoryError, SystemError, etc. are subclasses of Exception.

Java vs Pythone Exceiptions

FeatureJavaPython
Syntaxtry-catch-finallytry-except-else-finally
tryattempts to execute a block of codeattempts to execute a block of code
catch/exceptexecute alternative code if exception arisesexecute alternative code if exception arises
finally (optional)executes code regardless of try/catch outcomeexecutes code regardless of try/except outcome
else (optional)Xexecutes code if no exception arises
public class ExceptionExample {
public static void main(String[] args) {
Scanner scanner = new Scanner(System.in);
try {
System.out.print("Enter first number: ");
int a = scanner.nextInt();
System.out.print("Enter second number: ");
int b = scanner.nextInt();
int result = a / b; // This will raise ArithmeticException
System.out.println("Result: " + result);
} catch (ArithmeticException e) {
System.out.println("Caught an exception: " + e.getMessage());
} catch (NumberFormatException e) {
System.out.println("Caught a number format exception: " + e.getMessage());
} finally {
System.out.println("This block always executes.");
}
}
}
def div(a, b):
return a / b

a = input("Enter first number: ")
b = input("Enter second number: ")

try:
result = div(int(a), int(b)) # This will raise ZeroDivisionError
print("Result:", result)
except ZeroDivisionError as e:
print("Caught an exception:", e)
except ValueError as e:
print("Caught a value error:", e)
else:
print("No exception occurred.")
finally:
print("This block always executes.")

Throwing Exceptions

class IncorrectAgeError(Exception):
def __init__(self, age):
self.age = age
self.message = f"Age {age} is not valid. Age must be between 0 and 120."
super().__init__(self.message)

number = int(input("Enter your age: "))

try:
if number < 0 or number > 120:
raise IncorrectAgeError(number)
print(f"Your age is {number}.")
except IncorrectAgeError as e:
print(e)

File Handlers

  • "r": Opens a file for reading, requires the file to exist
  • "w": Opens a file for writing. If the file exists, it is truncated to zero length. If the file does not exist, a new text file is created.
  • "a": Opens file for open for writing. The file is created if it does not exist.
  • "+": Opens a file in updating mode (reading and writing). It can be used with reading (r+) and writing (w+) modes.
  • "t": Opens a file in text mode (default mode).
  • "b": Opens a file in binary mode. It can be used with reading (rb) and writing (wb) modes.

File Handler Methods

os.path.exists("file.txt") # Check if file exists
os.remove("file.txt") # Delete a file
os.remove("dir") # Delete an empty directory
os.chdir("path/dir") # Change current working directory
os.getcwd() # Get current working directory
os.mkdir("path/dir") # Create a new directory

file_handler = open("file.txt", "r") # Open a file in read mode
file_handler.read() # Read the entire file content
file_handler.readline() # Read a single line from the file
file_handler.close() # Close the file

file_handler = open("file.txt", "w") # Open a file in write mode
file_handler.write("Hello, World!") # Write to the file
file_handler.write("\n") # Write a newline character
file_handler.close() # Close the file


import csv
with open("file.txt", "r") as file_handler: # Using 'with' to handle file
csv_object = csv.reader(file_handler)
for row in csv_object:
print(row)
file_handler.close()

with open("file.txt", "a") as file_handler: # Append mode
csv_writer = csv.writer(file_handler)
row = ["1", "gracefullight", "100"]
csv_writer.writerow(row)
file_handler.close()

import json
with open("data.json", "r") as file_handler:
data = json.load(file_handler) # Load JSON data from file
print(data)
file_handler.close()

with open("data.json", "w") as file_handler:
json.dump(data, file_handler) # Write JSON data to file
file_handler.close()
  • InputStream: reads byte-based input from various sources like files, memory, or network connections.
  • OutputStream: writes byte-based output to various network connections.

Unit Testing

  • tests and verifies the smallest unis of an application such as functions, procedures, modules, or objects.
  • the conducted during the development phase of the SDLC.
  • Developers aim to identify defects and code bugs, which can save time and reduce costs in later testing stages.
  • software testing encompasses the fourmain testing phases, and unit testing is at the foundational level of the process.
    • Unit Testing
    • Integration Testing
    • System Testing
    • Acceptance Testing

Python unit testing

  • test cases are defined using classes that inherit from unittest.TestCase
  • assertions are used to validate results self.assertEquals()
  • the unittest.TextTestRunner class can be used to run tests
import unittest

class TestMathOperations(unittest.TestCase):

def test_addition(self):
self.assertEqual(2 + 3, 5)

def test_subtraction(self):
self.assertEqual(5 - 2, 3)

def test_multiplication(self):
self.assertEqual(4 * 3, 12)

def test_division(self):
self.assertEqual(10 / 2, 5)

IAI +010

· 약 11분

Knowledge representation

  • Intelligent agents need knowledge about the world in order to reach good decisions.
  • Declarative knowledge is represented in a form of sentences in a knowledge representation language and stored in a knowledge base.
  • Knowledge base is used by an inference engine to infer a new sentence which will be used for the agent to decide what action to take next.
  • Formal languages are defined by its grammar and semantic rules
    • grammar: defines the syntax of legal sentences
    • semantic rules: defines the meaning.
  • Common knowledge representation formalisms:
    • Propositional logic
    • First-order logic
    • Fuzzy logic
    • Semantic networks
    • Ontologies

Propositional logic

  • a declarative language in which can handle propositions that are known true, known false, or completely (unknown true or false)
  • a BNK (Backus-Naur Form) grammer of sentences in propositional logic.
    • ¬\neg = NOT
    • \land = AND
    • \lor = OR
    •     \implies = IMPLIES
    •     \iff = IF AND ONLY IF
  • Negation: a sentence using ¬\neg is called negation
  • Literal: either an atomic sentence or a negated atomic sentence
  • Conjunction: two sentences connected by \land. Eac hof them is called conjunct
  • Disjunction: two sentences connected by \lor. Each of them is called disjunct
  • Implication: two sentences connected by     \implies. P    QP \implies Q. P is called premise or antecedent and Q is the conclusion or consequent.
    • An implication is if-then statement.

Truth tables

PQ¬\negPP \land QP \lor QP     \implies QP     \iff Q
TTFTTTT
TFFFTFF
FTTFTTF
FFTFFTT

FOL, First-order logic

  • a declarative language
  • Syntax of FOL builds on that of propositional logic
  • terms to represent objects, universal quantifier, and existential quantifier
  • a model in FOL must provide the information required to determine the truth value of every atomic sentence in the language.
  • xP(x)\forall x P(x) means for all x, P(x) is true
  • xP(x)\exists x P(x) means there exists an x such that P(x) is true

Logical equivalence

logical equivalencemeaning
(αβ)(βα)(\alpha \land \beta) \equiv (\beta \land \alpha)Commutativity of \land
(αβ)(βα)(\alpha \lor \beta) \equiv (\beta \lor \alpha)Commutativity of \lor
(α(βγ))((αβ)γ)(\alpha \land (\beta \land \gamma)) \equiv ((\alpha \land \beta) \land \gamma)Associativity of \land
(α(βγ))((αβ)γ)(\alpha \lor (\beta \lor \gamma)) \equiv ((\alpha \lor \beta) \lor \gamma)Associativity of \lor
¬(¬α)α\neg(\neg \alpha) \equiv \alphaDouble negation elimination
(α    β)(¬αβ)(\alpha \implies \beta) \equiv (\neg \alpha \lor \beta)Implication elimination
(α    β)(¬β    ¬α)(\alpha \implies \beta) \equiv (\neg \beta \implies \neg \alpha)Contraposition
(α    β)((α    β)(β    α))\alpha \iff \beta) \equiv ((\alpha \implies \beta) \land (\beta \implies \alpha))Biconditional elimination
¬(αβ)(¬α¬β)\neg(\alpha \land \beta) \equiv (\neg \alpha \lor \neg \beta)De Morgan's law for \land
¬(αβ)(¬α¬β)\neg(\alpha \lor \beta) \equiv (\neg \alpha \land \neg \beta)De Morgan's law for \lor
(α(βγ))((αβ)(αγ))(\alpha \land (\beta \lor \gamma)) \equiv ((\alpha \land \beta) \lor (\alpha \land \gamma))Distributivity of \land over \lor
(α(βγ))((αβ)(αγ))(\alpha \lor (\beta \land \gamma)) \equiv ((\alpha \lor \beta) \land (\alpha \lor \gamma))Distributivity of \lor over \land

x¬P(x)¬xP(x)\forall x \neg P(x) \equiv \neg \exists x P(x)

  • For all x, not P(x) is logically equivalent to "it is not the case that there exists an x such that P(x) is true."

Reasoning

Deductive reasoning

  • a process of reasoning from one or more statements (premises) to reach a logical conclusion.
  • first premise, second premise, therefore conclusion.

Inductive reasoning

  • the process of resoning from specific observations to borader generalizations and theories.
  • also described as a method where one's experiences and observations are synthesized to cmoe up with a general truth.
  • premises and then conclusion

Inference

  • steps in reasoning
  • moves from premises to logical consequences
  • In AI Context, inference is to derive new logical sentences (as the conclusion) from existing logical sentences (as premises).
  • researchers develop automated inference systems to emulate human inference.

Inference Problem

KBγKB \models \gamma

  • KB, Knowledge Base, is a set of propositions that represent what is known about the world.
  • γ\gamma, Query sentence, is the target conclusion which needs to be confirmed based on the given KB.
  • where \models denotes the relation of logical entailment between KB and the sentence γ\gamma, reading as "KB entails γ\gamma" or "if KB is true, then γ\gamma must also be true".
  • αβ    M(α)M(β)\alpha \models \beta \iff M(\alpha) \subseteq M(\beta)
    • M(α)M(\alpha) is the set of all models that satisfy α\alpha.
    • M(β)M(\beta) is the set of all models that satisfy β\beta.
    • The statement αβ\alpha \models \beta means that in every model where α\alpha is true, β\beta is also true. In other words, if α\alpha holds, then β\beta must also hold.
  • Model Checking: enumerates all possible models and checks if the entailment holds in each model.
    • Knowledge base will be used to draw inferences.
    • Query sentence, γ\gamma, is needed to be checked whether it is entailed by the KB.
    • Symbols, a list of all symbols (or atomic propositions) used in the problem context.
    • Models, assignments of truth and false values to those identified symbols.
  • Model Checking Procedure
    1. Identify the propositional symbols involved in the KB sentences and query sentence
    2. Enumerate all possible models by assigning truth values to the identified symbols.
    3. Evaluate the KB sentence in each model and fined the models in which KB is true.
    4. Evaluate the query sentence in the models from step 3 and check if query sentence is true in these models.
    5. Conclude that the KB entails the query sentence γ\gamma if and only if the query sentence is true in all models where the KB is true.

Inference Problem Example

  • Obersvation P: "It is raining"
  • Query sentence Q: "The ground is wet"
  • Knowledge: P    QP \implies Q (If it is raining, then the ground is wet)
  • Knowledge Base KB: P(P    Q)P \land (P \implies Q)
  • Inference Problem: KBQKB \models Q from KB=T, get Q=T
  • Proof:
    1. From the 4 possible models, only Model 1 makes KB true.
    2. M(KB) = model 1
    3. Model 1 also makes Q true.
    4. M(Q) = model 1
    5. M(KB)M(Q)M(KB) \subseteq M(Q), therefore KBQKB \models Q
ModelPQP     \implies QP \land (P     \implies Q)
1TTTT
2TFFF
3FTTF
4FFTF

Inference by theorem proving

to apply rules of inference directly to the sentences in the KB to construct a proof of the desired sentence without consulting models.

  • Proof: a chain of consequences that leads to the desired goal.
  • KB will be used to draw inferences.
  • Desired sentence is needed to be checked whether it is entailed by the KB.
  • The rules of inference are the approved logical equivalences and rules.
AspectModel CheckingTheorem Proving
방식참/거짓으로 실제 계산논리 규칙을 사용해 증명
예시진리표Modus Ponens, Resolution
장점단순함복잡한 문장도 처리 가능
단점계산 많음규칙 익혀야 함

Modus Ponens Rule

α    β,αβ\frac{\alpha \implies \beta, \alpha}{\therefore \beta}

  • whenever any sentences of the form α    β\alpha \implies \beta and α\alpha are given, then the sentence β\beta can be inferred.

Add-Elimination Rule

αβα\frac{\alpha \land \beta}{\therefore \alpha}

  • from a conjunction, one of the conjuncts can be inferred.

Terms to contradiction and resolution

CNF
┌────────────────────────────────────────────────┐
(P ∨ Q)(¬Q ∨ R)
│ ┌──────────────────────┐ ┌───────────────┐ │
│ │ Clause 1 │ │ Clause 2 │ │
│ │ (P ∨ Q) │ │ (¬Q ∨ R) │ │
│ │ ┌───────┬───────┐ │ │ ┌──────┬─────┐ │
│ │ │ P │ Q │ │ │ │ ¬Q │ R │ │
│ │ └───────┴───────┘ │ │ └──────┴─────┘ │
│ └──────────────────────┘ └───────────────┘ │
└────────────────────────────────────────────────┘
  • Literal: an atomic sentence or its negation
  • Complementary literals: a literal and its negation are complementary literals
  • Clause: an expression formed from a collection of finite literals
    • In most l1l2...lnl_1 \lor l_2 \lor ... \lor l_n cases, a clause is a disjunction of finite literals.
    • written as the symbol lil_i.
  • Conjunctive Normal Form: a sentence expressed as a conjunction of clauses.
  • Satisfiability: a sentence is satisfiable if it is true in, or satisfied by, some models.

Propositional satisfiability (SAT) problem

  • to determine the satisfiability of sentences in propositional logic.
  • if there exists a model that satisfies a given logical sentence, then the sentence is satisfiable.
  • Satisfiability can be checked by enumerating the possible models until one is found that satisfies the sentence, or by resolving complementary literals until an empty clause is derived.
  • many problems in CS are really SAT problems.

Resolution rule

Unit resolution rule

(l1l2...li1lili+1...lk, m)l1...li1li+1...lk \frac{(l_1 \lor l_2 \lor ... \lor l_{i-1} \lor l_i \lor l_{i+1} \lor ... \lor l_k,\space m)}{l_1 \lor ... \lor l_{i-1} \lor l_{i+1} \lor ... \lor l_k}

  • where ll is a literal and lil_i and mm are complementary literals.
  • Unit resolution rule takes a clause which is a disjunction of literals, and a literal and produces a new clause as the resolvent.

Full resolution rule

(l1l2...li1lili+1...lk, m1m2...mj1mjmj+1...mn)(l1...li1li+1...lkm1...mj1mj+1...mn) \frac{(l_1 \lor l_2 \lor ... \lor l_{i-1} \lor l_i \lor l_{i+1} \lor ... \lor l_k,\space m_1 \lor m_2 \lor ... \lor m_{j-1} \lor m_j \lor m_{j+1} \lor ... \lor m_n)}{(l_1 \lor ... \lor l_{i-1} \lor l_{i+1} \lor ... \lor l_k \lor m_1 \lor ... \lor m_{j-1} \lor m_{j+1} \lor ... \lor m_n)}

  • where ll is a literal and lil_i and mjm_j are complementary literals.
  • Full resolution rule takes two clauses which are disjuctions of literals and produces a new clause containing all the literals of the two original clauses except for the two complementary literals.

Inference via proof by contradiction through resolution

αβ \alpha \models \beta

  • to prove that αβ\alpha \models \beta, we can show that the sentence α¬β\alpha \land \neg \beta is unsatisfiable.
  • by deriving an empty clause ()() from α¬β\alpha \land \neg \beta using the resolution rule.
  • In order to derive an empty clause, we use resolution which is a process to resolve complementary literals until to find an empty clause.
    1. R1: P (observation)
    2. R2: P     \implies Q (knowledge, raining implies ground is wet)
    3. KB=P(P    Q)KB = P \land (P \implies Q)
    4. Query sentence: Q
    5. Inference problem: KBQKB \models Q, i.e. from KB=TKB=T, get Q=TQ=T
    6. Proof:
    • Let (P(P    Q))¬Q(P \land (P \implies Q)) \land \neg Q valid
    • Convert (P(P    Q))¬Q(P \land (P \implies Q)) \land \neg Q to CNF
    • Apply implication elimination rule, one has (P(¬PQ))¬Q(P \land (\neg P \lor Q)) \land \neg Q
      • C1:PC_1: P
      • C2:¬PQC_2: \neg P \lor Q
      • C3:¬QC_3: \neg Q
    • Apply unit resolution rule to C1C_1 and C2C_2, resolve PP and ¬P\neg P, one has C4:QC_4: Q
    • Apply unit resolution rule to C3C_3 and C4C_4, resolve QQ and ¬Q\neg Q, one has C5:()C_5: ()

Inference in FOL

  • convert the first-order inference to propositional inference using the ruls for quantifiers.
    • universal instantiation (UI)
    • existential instantiation (EI)
  • do inference in propositional logic using the methods about inference in propositional logic.
  • this approach to first-order logic inference via propositionalization is complete, means any entailed sentences can be proved.
    • in most cases, this approach works
    • in some cases, it is slow and only useful when the domain is small.
  • get rid of quantifiers by instantiating them with specific constants or variables.

Universal instantiation (UI)

v αSubst({v/g},α)\frac{\forall v \space \alpha}{\text{Subst}(\{v/g\},\alpha)}

  • to convert sentences with universal quantifiers to sentences without universal quantifiers.
  • it can infer any sentence obtained by substituting a ground term for the universally quantified variable.
  • Subst({θ,α})Subst(\{\theta, \alpha\}) denotes the result of applying the subsitution θ\theta to the sentence α\alpha and gg is a ground term or a constant symbol.
  • universal instantiation can be applied many times to produce may different consequences.
  • xLoves(x,Mary)\forall x Loves(x, Mary)
    • Loves(John,Mary)Loves(John, Mary)
    • Loves(Sue,Mary)Loves(Sue, Mary)
    • Loves(Bill,Mary)Loves(Bill, Mary)
    • ...

Existential instantiation (EI)

  • to convert sentences with existential quantifiers to sentences without existential quantifiers.
  • the quantified variable can be replaced by a single new constant symbol.
  • for any sentence α\alpha, variable vv, and constant symbol kk not appear elsewhere in the knowldge base, the following rule is sound: v αSubst({v/k},α)\frac{\exists v \space \alpha}{\text{Subst}(\{v/k\},\alpha)}
  • Subst{θ,α}\text{Subst}\{\theta, \alpha\} denotes the result of applying the subsitiution θ=v/k\theta={v/k} in the sentence α\alpha and kk is ground term or a new constant symbol.
  • Existential instantiation can be applied only once, and then the existentially quantified sentence can be discarded.
  • xLoves(x,Mary)\exists x Loves(x, Mary)
    • Loves(K,Mary)Loves(K, Mary)
    • where KK is a new constant symbol not appear elsewhere in the knowledge base

CNF, Conjunctive Normal Form

StepRule NameDescriptionExample
1제거    \implies,     \iff 없애기P    QP \implies Q¬PQ\neg P \lor Q
2De Morgan's¬\lnot 분배¬(PQ)(¬P¬Q)\lnot(P \lor Q) \rightarrow (\lnot P \land \lnot Q)
3Double negation이중부정 제거¬(¬P)P\lnot(\lnot P) \rightarrow P
4Distribution\lor over \land 분배(P(QR))(P \lor (Q \land R))(PQ)(PR)(P \lor Q) \land (P \lor R)
5And-Elimination\land 분리(AB)A,B(A \land B) \rightarrow A, B 따로

Vocabulary for AI 011

· 약 5분

Vocabulary & Expressions

Term/ExpressionDefinitionSimpler ParaphraseMeaning
formalisma system of rules and symbols used to represent concepts in a precise and unambiguous waystructured system형식주의, 형식 체계
disjointnot connected or relatedseparate분리된, 연결되지 않은
caveata warning or cautionary detail to considerwarning경고, 주의 사항
instantaneoushappening immediatelyimmediate즉각적인, 순간적인
dwellera person or animal that lives in a specific placeinhabitant거주자, 주민
scantyinsufficient in quantity or qualitymeager부족한, 빈약한
synthesizeto combine different elements to form a coherent wholecombine합성하다, 종합하다
angstroma unit of length equal to one ten-billionth of a meter, used to measure very small distancesunit of length옹스트롬 (10억분의 1미터)
treatya formal agreement between two or more countriesagreement조약, 협정
enticeto attract or tempt by offering something desirableattract유혹하다, 꾀다
simplistictreating complex issues in an overly simple wayoversimplified지나치게 단순화된
Event Calculusa formalism for representing and reasoning about events and their effects over timetemporal reasoning system이벤트 연산
reifyto make something abstract more concrete or realconcretize구체화하다, 실체화하다
reificationthe process of making an abstract concept more concrete or realconcretization구체화, 실체화
ceaseto stop or bring to an endstop중단하다, 멈추다
exogenousoriginating from outside a systemexternal외부의, 외생적인
reignto hold royal office; to rule as a monarchrule통치하다, 군림하다
obscuredhidden or made less visiblehidden가려진, 숨겨진
by virtue ofbecause of; as a result ofbecause of~덕분에, ~때문에
semidecidablea property of a problem where a solution can be verified if given, but not necessarily foundpartially decidable반결정 문제
singlyone at a time; individuallyone at a time하나씩, 개별적으로
negateto nullify or make ineffectivenullify부정하다, 무효화하다
subsumptionthe process of including one concept within another, more general conceptinclusion포섭, 포함
thrustthe main idea, subject, or opinion that is discussed or written aboutmain idea요지, 핵심
tractabilitythe quality of being easy to manage or deal withmanageability계산 가능성
resolventa clause derived from two clauses containing complementary literals, used in resolution-based theorem provingderived clause해석자, 해결책
propositionalzationthe process of converting first-order logic statements into propositional logic statementsconversion to propositional logic명제화
transductionthe process of converting one form of data into anotherdata conversion변환, 전환
transductive learninga type of machine learning where the model makes predictions on specific test cases based on the training data, rather than generalizing to unseen datacase-based learning전이 학습
representationthe way in which information or knowledge is structured and organizeddepiction표현, 나타냄
dispensethe act of distributing or providing somethingdistribution분배, 제공
factorto break down a problem into smaller, more manageable partsbreakdown분해하다
inherentlyin a way that is a permanent, essential, or characteristic attributeintrinsically본질적으로, 내재적으로
precludeto prevent something from happening or to make it impossibleprevent배제하다, 막다
integralnecessary to make a whole complete; essentialessential필수적인, 완전한
eschewto deliberately avoid or abstain from somethingavoid피하다, 삼가다
logarithmicallyin a way that relates to logarithms or logarithmic functionsin terms of logarithms로그적으로
counteractto act in opposition to something to reduce its effectoppose대응하다, 반작용하다
point-wiseapplying a function or operation to each individual element separatelyindividually점별로, 개별적으로
tedioustoo long, slow, or dull; tiresome or monotonousboring지루한, 따분한
multiplicativerelating to multiplication or the process of multiplyingrelating to multiplication곱셈의, 증가하는
leftwardmoving or directed to the leftto the left왼쪽으로
recurrencethe act of returning or happening againreturn재발, 반복
sinusoida mathematical curve that describes a smooth, periodic oscillationwave-like curve사인 곡선
desideratumsomething that is desired or wanteddesired thing바람직한 것, 필요조건
contiguoussharing a common border; touchingadjacent인접한, 접촉하는
dilatedmade larger or expandedexpanded확장된, 팽창된
exhibitto show or display something publiclydisplay전시하다, 나타내다
regimea system or planned way of doing things, especially one imposed from abovesystem체제, 제도
perplexitya state of confusion or uncertaintyconfusion당혹, 혼란
reliancedependence on or trust in someone or somethingdependence의존, 신뢰

Ontology

ConceptWhat it isRole / FunctionExample
OntologyThe representation itself — a structured map of concepts and relationsDescribes the general framework of the worldThe hierarchy (Anything → PhysicalObjects → Humans, etc.)
Ontology SystemThe platform or implementation that holds and reasons about ontologiesStores, queries, and integrates knowledgeCYC, DBpedia, Google Knowledge Graph
Ontological EngineeringThe process of designing and maintaining ontologiesDefines general concepts and their logical structureDefining "PhysicalObject" → later "Robot," "Television," etc.

Semantic Network

  • Semantic networks let us represent logical relationships visually, but their meaning remains purely logical — about objects, categories, and relations.

Tractability

AspectSemantic NetworkDescription LogicFirst-Order Logic
형식성낮음 (그래픽 기반)중간 (논리 기반)높음 (수학적)
표현력제한적중간매우 높음
추론 효율성매우 높음높음 (다항시간 목표)낮음 (NP-hard~undecidable)

IAI +009

· 약 7분

Generative AI

  • Gen AI refers to a category of AI models designed to generte new content, synt99hetic data that resembles a given dataset.
  • Gen AI models create new content, including text, images, audio, and video.

Historcal context of Gen AI

  • 1980s: The development of statistical approaches to AI emerged, focusing on probabilistic models
  • 1990s: Hidden Markov Models (HMMs) became popular in speech recognition and sequence generation tasks, making a shift towards using statistical methods in generative processes.
  • 1990s-2000s: The resurgence of Neural Networks, with the introduction of deep learning techniques. However, hardware and data limitations hampered progress.
  • 2010s: The advent of deep learning algorithms, especially convolutional neural networks (CNNs) and recurrent neural networks (RNNs), transformed the landscape of AI.
  • 2013: VAEs were introduced, providing a probabilistic approach to data generation, allowing for smooth latent space interpolation and structure.
  • 2014: GANs, a novel framework where two neural networks (a generator and a discriminator) are trained simultaneously, allowing for the generation of highly realistic images and other data types.
  • 2015-2020: Generative models began to find applications beyond image synthesis, including text generation, music composition, and even video generation. OpenAI's GPT-2 showcased the potential of transformer-based models for generating coherent text.
  • 2020s: The introduction of larger and more capable models like OpenAI's GPT-3 and subsequent iterations revolutionized natural language processing.
    • DALL-E and Stable Diffusion pushed the boundaries of image generation, allowing users to create images from textual descriptions. This sparked creative exploration and practical applications in marketing, art, and design.
  • Present: The integration of multiple data modalities (text, image, audio) led to the development of models like CLIP and GPT-4, which can understand and generate content across different formats, enhancing the versatility of generative AI.

Variational Autoencoders, VAEs

  • VAEs are from the probabilistic approach to build a generative AI model
  • VAE learns a latent distribution instead of a fixed latent presentation, allowing for the generation of new samples.
  • Latent space
    • sample from a Gaussian distribution
    • μ\mu and σ\sigma are learned during training
    • zz is a new sample from the latent space
  • qϕ(zx)q_\phi(z|x) is the encoder that maps input data xx to a distribution over the latent space zz
  • pθ(xz)p_\theta(x|z) is the decoder that maps a latent variable zz back to the data space xx
  • During training, the model optimizes the reconstruction loss and a regularization term (KL divergence) to ensure the learned latent distribution is close to a prior distribution (usually a standard normal distribution).
  • Latent Representation
    • Dimensionality Reduction: latent space is typically lower-dimensional than the input space
    • Smoothness: In a well-structured latent space, similar inputs will be represented by nearby points.
    • Generative Capabilities: Once trained, can sample from the latent space to generate new data that resembles the training set.
    • Regularization: The VAE incorporates a regularization term in its loss function (KL divergence), which encourages the lerned latent distribution to be close to a standard normal distribution.
  • Decoder
    • The decoder takes the sampled latent representation zz and reconstructs the original input data.
    • The goal is to make the reconstructed data as close as possible to the original input.

VAE Examples

Face Generation Example

  • Trains a VAE on face photos.
  • Latent space learns meaningful features (e.g. hair color, glasses, smile)
  • By sampling and interpolating in latent space, can generate new faces or smoothly morph one face into another.

Anomaly Detection Example

  • Train a VAE on normal sensor data.
  • When fed unusual data, reconstruction error will be high.
  • Use this for fault detection in machines, fraud detction, etc.

Autoencoder

  • Encoder
    • Input Layer: takes in the original data
    • Hidden Layer: These layers progrssively reduce the dimensionality of the input through operations like linear transformations and non-linear activations. (e.g., ReLU-Rectified Linear Unit)
    • Ouput Layer: Produces the final latent represntation. Sometimes Sigmoid or Tanh are used depending on the nature of the input data.
    • Learns to capture meaningful patterns in the data by minimizaing reconstruction loss, which mesures the difference between the original input and the reconstructed output produced by the decoder.
  • Decoder
    • typically mirrors the structure of the encoder but in reverse.
    • During trainng, Mean squared Error (MSE) for continuous data or Binary Cross-Entropy (BCE) for binary data are commonly used to quantify the reconstruction loss.

Image Reconstuction Example

  • input: 28x28 pixel
  • encoder: compresses it to just 16 numbers (latent vector)
  • decoder: expands those 16 numbers back into a 28x28 image
  • result: the reconstructed digit looks similar to the original but not in the training set.

Denoising Images

  • input: a noisy photo of a cat
  • encoder: learns to ignore the noise and compress meaningful features.
  • decoder: rebuilds the image without the noise.
  • output: a clearer cat image.

Loss function

  • Reconstruction loss: Ensure output similar to input
  • KL divergence loss: Push the learned distribution to be close to a standard normal distribution. (can sample new data)

Generative Adversarial Network, GAN

  • Generator: Learns to generate fake (generated) data that resembles data distribution.
  • Discriminator: Learns to distinguish between real data and data generated by the Generator.
  • Steps
    1. Generator creates fake data samples fro mrandom input (noise).
    2. Discriminator evaludates these samples together with real data samples.
    3. Discriminator outputs probabilities indicatcing whether each sample is real or fake.
    4. Based on the prediction of the Discriminator, the Generator and Discriminator are updated using specific loss functions.
  • Applications
    • Labels to Street Scenes
    • Labels to Facade
    • BW to Color
    • Aerial to Map
    • Day to Nihght
    • Edges to Photo
    • Text-to-Image Synthesis

Autoregressive Models

  • a class of generative models where the current value of a time series is expressed as a linear function of its own past values plus some noise.
  • foundational in generative AI and widely used in generative AI, particularly for tasks like text generation, speech synthesis.
  • Examples
    • GPT series: state-of-the-art autoregressive language models used for text generation.
    • WaveNet: an autoregressive model for generating high-quality audio by predicting each sample conditioned on previous samples.

Training Autoregressive Models

  • Pretraining
    • the model sees sequences of tokens and learns to guess the next token.
    • objective: minimize cross-entropy loss between its guess and the true next token.
    • this teaches grammar, facts, reasoning patterns, and style from raw text.
  • Supervised fine-tuning (optional)
    • smaller curated datasets (questions to answer, instructions to respond) teach it to follow directions.
  • Reinforcement learning from Human Feedback (optional)
    • Humans rank multiple model outputs; a reward model is trained on those rankings; GPT is then optimized to produce higher-reward responses (safer, more helpful, less toxic)
  • Text to Tokens via a tokenizer (e.g. BPE, Byte Pair Encoding)
  • Each token becomes a vector (Embedding)
  • Positional encodings inject order information.
  • Transformer layer
    • self-attention: each token looks at all previous token (causal mask) and decides which ones matter, computing weighted combinations.
    • Multiple heads let it attend to different patterns (syntax, long-range links, etc.).
    • feed-forward network: a nonlinear MLP refines each token's representation.
    • residual connections & layer norm stabilize tranining.

Interfence

  • a prompt (the context)
  • GPT computes probabiliites for the next token.
  • A decoding stragety samples a token
    • Decoding knobs
      • Greedy: take the top token
      • Top-K
      • Nucleus (Top-p) sampling (limit to likely options)
      • Temperature scales randomness, controls how random the next token choice is
        • lower = more determistic, higher = more creative
  • append the token and repeat autoregressive generation.

FDA +010

· 약 4분

SVM

a supervised learning framework for finding a boundary between data points belonging to different classes.

Convex Hull

Convex Hull

  • the lines surrounding the outermost points of each class.
  • since the classes are linearly separable, convex hulls do not intersect.

Margins

  • the distance between data points of different classes seprarated by a hyperplane.
  • multiple possible hyperplanes can separate the classes, but the optimal hyperplane maximizes the margin.
  • Maximum Margin Hyperplane
    • a separation line/plane orthogonal to the shortest line connecting the convex hull.
    • the line that is farthest apart from each convex hull.
    • it separates the data points with the widest margin.

Kernels

  • kernel is used to represent kernel functions
    • which are used to convert low-dimensional space to high-dimensional space
    • by applying a function on low-dimensional data points to come up with higher dimensions
      • which can be used to linearly separate the data points of classes using a hyperplane.
  • the function of a kernel is to take data as input and transform it into the required form.
    • different algorithms use different types of kernel functions.
    • linear, non-linear, polynomial, radial basis function (RBF), and sigmoid.

Kernel trick

  • Basic idea of SVM and Kernel trick is to find the plane which can separate, classify or split the data with maximum margin as possible.
  • The distance from a point (x0,y0)(x_0, y_0) to a line Ax+By+C=0Ax + By + C = 0 is given by the formula: Distance=Ax0+By0+CA2+B2\text{Distance} = \frac{|Ax_0 + By_0 + C|}{\sqrt{A^2 + B^2}}
  • The distance between H0H_0, H1H_1 is then: wx+b/w=1/w|w \cdot x + b| / ||w|| = 1/||w||.
    • The total distance between H1H_1 and H2H_2 is 2/w2/||w||.
    • H1:wx+b=1H_1: w \cdot x + b = 1
    • H2:wx+b=1H_2: w \cdot x + b = -1.
    Distance=(+1)(1)w=2w \text{Distance} = \frac{|(+1) - (-1)|}{||w||} = \frac{2}{||w||}
  • to maximiza the margin, we need to minimize w||w||
  • this can be solved through Lagrangian formula or Lagrangian multipliers. L=12w2i=1nαi[yi(wxi+b)1] L = \frac{1}{2} ||w||^2 - \sum_{i=1}^{n} \alpha_i [y_i (w \cdot x_i + b) - 1]
  • Setting the gradient of LL Lw=0w=i=1nαiyixi \frac{\partial L}{\partial w} = 0 \Rightarrow w = \sum_{i=1}^{n} \alpha_i y_i x_i Lb=0i=1nαiyi=0 \frac{\partial L}{\partial b} = 0 \Rightarrow \sum_{i=1}^{n} \alpha_i y_i = 0
  • The dual form of the optimization problem is: Maximize W(α)=i=1nαi12i=1nj=1nαiαjyiyj(xixj) \text{Maximize } W(\alpha) = \sum_{i=1}^{n} \alpha_i - \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_i \alpha_j y_i y_j (x_i \cdot x_j)
  • Capital letters such as A, X, Y denote matrices;
  • Greek letters such as Φ\Phi, κ\kappa denote functions;
  • lower-case bold letters such as aa, bb denote vectors;
  • script letters such as A\mathcal{A}, B\mathcal{B}, V\mathcal{V}, E\mathcal{E} denote sets or spaces;
  • a||a|| denotes the L2L^2 norm of vector;
  • xF||x||_F denotes the Frobenius norm of matrix, and is given by xF=(i,jxij2)1/2||x||_F = \big(\sum_{i,j} |x_{ij}|^2\big)^{1/2};

Linear Kernel

K(xi,xj)=xixjK(x_i, x_j) = x_i \cdot x_j

Gaussian Kernel / RBF Kernel

K(xi,xj)=exp(xixj22σ2)K(x_i, x_j) = \exp\Big(-\frac{||x_i - x_j||^2}{2\sigma^2}\Big)

Polynomial Kernel

K(xi,xj)=(xixj+1)hK(x_i, x_j) = (x_i \cdot x_j + 1)^h

Sequential Minimal Optimization (SMO)

  • an algorithm for solving the quadratic programming (QP) problem that arises during the training of support vector machines (SVMs).

Advantages

  • Overfitting is unlikely
    • SVM uses the maximum margin hyperplane, which is relatively stable.
    • the maximum margin hyperplane is only sensitive to the changes in the support vectors.
    • the variance in the data may have a relatively low effect on the performance
  • Computational complexity
    • since every time we need to classify a new sample we need to examine that sample with all the support vectors.
    • Using the kernel trick can help to alleviate this by calculating the dot product before doing the nonlinear mapping.

Disadvantages

  • not perform very well when the data set has more noise
  • where the number of features for each data point exceeds the number of training data samples, the SVM will underperform.

Applications