Skip to main content

FSD +004

· One min read

match

match term:
case pattern-1:
action-1
case pattern-2:
action-2
case pattern-3:
action-3
# the underscore _ case executes the default code
case _:
action-default

Repetition Statements

  • The count-controlled repetition: a fixed number of times.
  • The sentinel-controlled repetition: a designated value that ends the loop.
  • The infinite repetition: continues until externally stopped.

The For Loop

for <value> in <range of values>:
<code>
sum = 0;

# [1, 2, ..., 19]
# adds values from 1 to 19 to sum
for e in range(1, 20):
sum += e

print(f"The sum is: {sum}")

Loop-And-A-Half

n = 5
sum = 0

while n < 10:
sum += n

if sum > 100:
break

Zotero 초기 세팅

· One min read

동기화 설정

Settings - Sync - Data Syncing

  • 로그인하고 자동 동기화 설정

브라우저 익스텐션 다운로드

Citation 설정

Settings - Cite - APA 7th

  • 등록되어있는지 확인

Export - Item Format - APA 7th

  • 포맷 설정

MS Word Plugin 설치

Settings - Cite - Word Processors

  • Microsoft Word 섹션의 Install/Reinstall Microsoft Word Add-in 클릭
  • 워드 재시작

플러그인 설치

Tools - Plugins

  • 다운로드 받은 플러그인 드래그 앤 드랍

VSCode 플러그인

  • 아직 못 찾음

Trustworthiness in Vision-Language Models Review

· 6 min read

Overview

  • Mitigates exposure of private data, produces harmful outputs, or is vulnerable to attacks.
  • SOTA models: LLaVA, Flamingo, GPT-4

Privacy

Privacy Issues

  • risk escalates significantly with relevant images as optimizing in the pixel domain is easier than in text
  • can unintentionally memorize sensitive data, leading to leaks without knowledge of the model’s specifics
  • Overfitting may also cause retention of sensitive attributes during inference
  • gradient-based and backdoor attacks further jeopardize VLM privacy with open-source data

Privacy Mitigation Methods

  • New metrics have been created to assess a model’s ability to reproduce training instances and facilitate cross-model comparisons
  • models utilizing multiple modalities provide better privacy
  • safety modules can be integrated to boost resilience against violations
  • adversarial training can enhance privacy but risks reducing accuracy
  • New architecture: differentially private CLIP model

Privacy Future Research Directions

  • Cryptography-based Privacy Preservation
    • Secure multi-party computation (SMPC): divides secret information into shares among multiple parties, ensuring that individual shares reveal nothing unless combined
    • Homomorphic encryption (HE): allows computations on encrypted data without decryption, and has also been utilized for privacy preservation in transformers
  • Federated Learning
    • enhances privacy in vision-language models (VLMs) by localizing model training, which protects training data from leakage.
    • challenges such as communication overhead among devices and statistical heterogeneity from diverse data distributions
  • Data Manipulation and Finetunning
    • Data pseudonymization: substitutes sensitive information with synthetic alternatives.
    • Data Sanitization: removes duplicates to reduce memorization and privacy risks.
    • knowledge sanitization-fine-tuning: provide safe responses when leakage risks arise.

Fairness and Bias

Fairness and Bias Issues

  • Bias from training data
    • disproportionately features men and lighter-skinned individuals
    • outdated vocabulary and imbalanced representation
    • clinical models may favor certain patient groups based on gender, language, etc.
  • Bias from Model
    • Gender biases
    • misclassification of race-related elements and biased outputs

Fairness and Bias Mitigation Methods

  • New Datasets and Benchmarks
    • Harvard-FairVLMed, PATA, and BOLD enhance evaluations but often lack the scale of established benchmarks.
    • create synthetic datasets to improve fairness assessments
      • gender-balanced dataset generated with DALL-E-3 and another consisting of gender-swapped images
      • counterfactual image-text pairs that highlight biases in datasets like COCO Captions
    • new metrics
      • gender polarity
      • bias distance in embeddings
    • human evaluation
  • De-biasing
    • adjust model instructions and architectures for improved fairness
    • detecting biased prompts in pre-trained models
    • Post-hoc Bias Mitigation (PBM) effectively reduce bias in image retrieval
    • Re-sampling underperforming clusters can enhance fairness
    • modification of facial features also mitigate biases
    • self-debiasing reduces biased text generation, especially when paired with other methods

Fairness Future Research Directions

  • Optimized De-biasing
    • Additive residual learning: for fairer image representations.
    • Calibration loss: retain semantically similar embeddings.
    • Counterfactual inference framework: help models learn correct responses through cause and effect.
    • Adversarial classifiers: predict image attributes from visual-textual similarities can be combined with instruction tuning to reduce bias.
  • Disentangled Representation Learning (DRL): simplifies complex data by breaking it in to independent feature groups, improving model predictions.
    • Traditional DRL
      • Variational autoencoders (VAEs) for feature encoding based on impact
      • Generative adversarial networks (GANs) for separation.
    • Attention in text encoders can be adjusted for fairer outputs.
    • challenges: varying definitions of "disentanglement", ensuring fairness.
  • Human-in-the-Loop (HITL): integrating human intervention into their training to improve precision and fairness
    • active learning
    • reinforcement learning with human feedback
    • explainable AI
    • challenges: human bias, finance, and ethical and legal issues persist

Robustness

Robustness Issues

  • Out-of-Distribution (OOD) Robustness
    • ChatGPT excels in adversarial tasks but struggles with OOD robustness and informal medical responses
    • MLLMs often fail to generalize beyond training domains due to mapping issues
    • vision-language models face difficulties with open-domain concepts, especially when overfitting during fine-tuning
    • Large pre-trained image classifiers show initial robustness, which diminishes over time
    • Current visual question answering (VQA) models are limited to specific benchmarks, hindering generalization to OOD datasets
    • fine-tuning may impair model calibration in OOD contexts.
  • Adversarial Attack Robustness
    • Studies indicate that open-sourced VLMs show performance gaps in red teaming tasks, highlighting the need for improved safety and security.
    • misalignment between language and vision modalities creates a "modality gap", complicating adversarial vulnerability.

Robustness Mitigation Methods

  • Improving Out-of-Distribution Robustness
    • enhance OOD detection and generalization. A simple maximum logit detector has been shown to outperform complex methods for anomaly segmentation
    • In-context learning (ICL) can also improve multimodal generalization
    • A fine-tuned CLIP excels in unsupervised OOD detection
    • The OGEN method synthesizes OOD features
    • Maximum Concept Matching aligns visual and textual features, and anchor-based finetuning leads to better domain shifts
  • Defense Against Adversarial Attacks
    • VILLA is a two-stage framework for adversarial training of VLMs, featuring task-agnostic adversarial pre-training and task-specific finetuning
      • conducts adversarial training in the embedding space rather than on raw image pixels and text tokens, improving the model’s resilience against adversarial examples
      • SOTA performance across various tasks

Robustness Future Research Directions

  • Data Augmentation
    • MixGen: a data augmentation method that generates new image-text pairs by interpolating images and concatenating text to preserve semantics.
    • creating synthetic images involves extracting text prompts via an image captioning model for use in text-to-image diffusion, then mixing these with real datasets.
    • bimodal augmentation (BiAug): decouples objects and attributes to synthesize vision-language examples and hard negatives, using LLMs and an object detector to generate detailed descriptions and inpaint corresponding images.
  • Improved Cross-Modal Alignment
    • Sharing learnable parameters
    • Applying bidirectional constraints
    • Adjusting cross-modal projections
  • challenges: addressing the modality gap, which impacts robustness to OOD data and adversarial examples

Safety

Safety Issues

  • Toxicity
    • LAION-400M: contains problematic content, including explicit materials and harmful stereotypes
    • Advanced models like GeminiProVision and GPT-4V show inherent biases
    • Assigning personas to ChatGPT can increase toxicity and reinforce harmful stereotypes
  • Jailbreaking Risk
    • Perturbation can be performed effectively, while FigStep converts harmful content into images with an 82.5% attack rate across multiple VLMs
    • replaces captions with malicious prompts, enabling jailbreaks.

Safety Mitigation Methods

  • Safety Fine-Tuning
    • VLGuard
    • fine-tuned on synthetic data, reducing sensitivity to NSFW inputs and enhancing performance in cross-modal tasks
  • Other approach
    • Reinforce-Detoxify: uses reinforcement learning to mitigate toxicity and bias in transformer models
    • simple mitigations improve automatic scores, these methods risk over-filtering marginalized texts and create discrepancies between automatic and human judgments

Safety Future Research Directions

  • Context Awareness
    • integrating Chain-of-Thought for improved reasoning can enhance CAER tasks with Large VLMs.
    • Dual-Aligned Prompt Tuning: combines explicit context from pre-trained LLMs with implicit modeling to create more context-aware prompts
    • Visual In-Context Learning: optimizes image retrieval and summarization to enhance task-specific interactions.
  • Automated Red Teaming (ART)
    • RTVLM: a dataset that benchmarks VLMs across faithfulness, privacy, safety, and fairness
    • Arondight: automates multi-modal jailbreak attacks using reinforcement learning and uncovers significant security vulnerabilities
    • GPT-4 and GPT-4V are more robust against jailbreaks than open-source models
    • limited transferability of visual jailbreak methods compared to textual ones
    • connects unsafe outputs to prompts, improving the detection of vulnerabilities in text-to-image models

Ref

  • Vu, K., & Lai, P. (2025). Trustworthiness in Vision-Language Models. In J. Kertesz, B. Li, T. Supnithi, & A. Takhom, Computational Data and Social Networks Singapore.

Vision-Language Models for Vision Tasks Review

· 16 min read

Overview

Most visual recognition studies rely heavily on crowdlabelled data in DNN

  • Background development of visual recognition paradigms
  • Foundations its architecture
  • Datasets in VLM pre-training and evaluations
  • Review and categorization of existing pre-training methods
  • Benchmarking analysis discussion
  • Reach challenges & potential research direction
  • Training hard
    • New learning paradigm
  • Vision-Language Model Pre-training and Zero-shot Prediction
    • Increasing attention
  • VLMs with transfer learning
    • Prompt tuning
    • Visual adaption
  • VLMs with knowledge distillation
    • distill knowledge from VLMs to downstream tasks

The development of visual recognition paradigms

  • Traditional ML: Hand-crafted features for prediction.
  • Deep Learning: Deep networks (e.g., ResNet) with large-scale labeled data.
  • Supervised Pre-training + Fine-tuning: Learned representations transferred to downstream tasks.
  • Unsupervised / Self-supervised Pre-training + Fine-tuning: Objectives like masked modeling and contrastive learning to learn representations.
  • Vision-Language Models & Zero-shot: Leverage large-scale web data, enabling zero-shot prediction without task-specific fine-tuning.
    • Collecting large-scale informative image-text data
    • Designing high-capacity models for effective learning from Bigdata.
    • Designing new pre-training objectives for learning effective VLMs.

Illustration of development of VLMs for visual recognition

  • CLIP: Image-text contrastive objective and learns by pulling the paired images and texts close and pushing others faraway in the embedding space.
    • enables effective usage of web data and allows zero-shot predictions without task-specific finetuning.

VLM Overview

VLM Overview

  • Given Image-text pairs.
  • Employs a text encoder and an image encoder to extract image and text features.
  • Learns the vision-language correlation with certain pre-training objectives.
  • GAP: Global Average Pooling, a technique used to reduce the spatial dimensions of feature maps while retaining important information.
  • ViT: Vision Transformer: Transformers for image recognition at scale.
  • CNN Based: VGG, ResNet, EfficientNet
    • ResNet: Adopts skip connections between convolutional blocks which mitigates gradient vanishing and explosion and enables DNN training.
    • ResNet-D: Replace global average pooling with transformer multi-head attention.
  • Transformer Based: ViT
    • Adding a normalization layer before the transformer encoder.

VLM pre-training Objectives

Contrastive Objectives

  • Pros
    • Enforce positive pairs to have similar embeddings in contrast to negative pairs.
    • Encourages VLMs to learn discriminative vision and language features, where more discriminative features lead to more confident and accurate zero-shot predictions.
  • Cons
    • Joint optimizing positive and negative pairs is complicated and challenging.
    • Involves a heuristic temperature hyper-parameter for controlling the feature discriminability.

Image Contrastive Learning

  • Forcing a query image to be close with its positive keys (its data augmentations)
  • Faraway from its negative keys (other images)
  • Learn discriminative features in image modality, which often serves as an auxiliary objective for fully exploiting the image data potential.

Image-Text Contrastive Learning

  • Pulling the embeddings of paired images and texts close while pushing others away.
  • Minimizing a symmetrical image-text infoNCE loss
  • Learn vision-language correlation by contrasting image-text pairs.
    • CLIP: A symmetrical image-text infoNCE loss
    • ALIGN: scales up the VLM pre-training with large-scale (but noisy image-text pair with noise-robust contrastive learning)
    • DeCLIP: Nearest-neighbor supervision to utilize the information from similar pairs, enabling effective pre-training on limited data.
    • OTTER: Optimal transport to pseudo-pair images and texts reducing the required training data.
    • ZeroVL: Limited data resource via debiased data sampling and data augmentation with coin flipping mixup.
    • FILIP: Region-word alignment into contrastive learning, enabling to learn fine-grained vision-language corresponding knowledge.
    • Pyramid-CLIP: Multiple semantic levels and performs both cross-level and peer-level contrastive learning for effective VLM pre-training.
    • LA-CLIP, ALIP: LLM to augment synthetic captions for given images while RA-CLIP retrieves relevant image-text pairs for image-text pair augmentation.

CLIP

Image-Text-Label Contrastive Learning

  • Supervised Contrastive Learning into image-text contrastive learning.
  • Learn discriminative and task-specific features by exploiting both supervised labels and unsupervised image-text pairs.
    • UniCL: pre-training allows learning both discriminative and task-specific (image classification) features simultaneously with around 900M image-text pairs.

Image-Text-Label Contrastive Learning

Generative Objectives

  • Encouraging VLMs to learn rich vision, language and vision-language contexts for better zero-shot predictions.
  • Generally adopted as additional objectives above other VLM pre-training objectives for learning rich context information.

Masked Image Modelling

  • Cross-patch correlation by masking and reconstructing images.
  • Learn image context information by masking and reconstructing images
    • MAE, BeiT: certain patches in an image are masked and the encoder is trained to reconstruct them conditioned on unmasked patches.

Masked Image Modelling

Masked Language Modelling

  • Adopted pre-training objectives in NLP.
  • Randomly masking a certain percentage of input tokens and predicting them. (15% in BERT)
  • Learn by masking a fraction of tokens in each input text and training networks to predict the masked tokens.
    • FLAVA: masks out 15% text tokens and reconstructs them from the rest tokens for modelling cross-word correlation.
    • FIBER: adopts masked language modelling as one of the VLM pre-training objectives to extract better language features.

Masked Language Modelling

Masked Cross-Modal Modelling

  • Integrates masked image modelling and masked language modelling.
  • Given an image-text pair, it randomly masks a subset of image patches and a subset of text tokens and then learns to reconstruct them.
  • Learn by masking a certain percentage of image patches and text tokens and training VLMs to reconstruct them based on the embeddings of unmasked image patches and text tokens.
    • FLAVA: 40% image patches and 15% text tokens as in, and employs a MLP to predict masked patched and tokens, capturing rich vision-language correspondence information.

Image-to-Text Generation

  • Generate descriptive texts for a given image for capturing fine-grained vision-language correlation by training VLMs to predict tokenized texts.
    • COCA, NLP, PaLI: train VLMs with the standard encoder-decoder architecture and image captioning objectives.

Image to caption

Alignment Objectives

Align image–text pairs in the embedding space.

  • pros
    • simple, easy to optimize
    • can be easily extended to model fine-grained vision-language correlation
  • cons
    • little correlation information within vision or language modality.
  • adopted as auxiliary losses to other VLM pre-training objectives for enhancing modelling the correlation across vision and language modalities.

Image-Text Matching

  • models the overall correlation between an entire image and an entire sentence. (전역적 상관관계)
  • Image-text matching models global image-text correlation by directly aligning paired images and texts
    • FLAVA: matches the given image with its paired text via a classifier and a binary classification loss.
    • FIBER: follows to mine hard negatives with pair-wise similarities for better alignment between image and text.

Region-Word Matching

  • captures fine-grained correlations between image regions and specific words. (지역적 상관관계)
  • models local fine-grained vision-language correlation by aligning paired image regions and word tokens.
  • benefiting zero-shot dense predictions in object detection and semantic segmentation.
    • GLIP, FIBER, DetCLIP: replace object classification logits by region-word alignment scores.
      • the dot-product similarity between regional visual features and token-wise features.

Region-Word Matching, GLIP

VLM Pre-Training Frameworks

VLM pre-training frameworks

Evaluation

Zero-shot Prediction

  • Image Classification: classify images into pre-defined categories like "prompt engineering".
  • Semantic Segmentation: by comparing the embeddings of the given image pixels and texts.
  • Object Detection: localize and classify objects in images with the object locating ability learned from auxiliary datasets.
  • Image-Text Retrieval
    • Text-to-image retrieval that retrieves images based on texts
    • Image-to-text retrieval that retrieves texts based on images.

Linear Probing

  • freezes the pre-trained VLM
  • trains a linear classifier to classify the VLM-encoded embeddings to assess the VLM representations.

Datasets

  • For Pre-training VLMs
    • CLIP, 2021, 400M, English
    • ALIGN, 2021, 1.8B, English
    • FILIP, 2021, 300M, English
    • WebLi, 2022, 12B, 129 Languages
  • For VLM Evaluation
    • Image Classification
      • PSACAL VOC 2007 Classification, 11-point mAP
      • Oxford-IIIT PETS, Mean Per Class
      • EuroSAT, Accuracy
      • Hateful Memes, ROC AUC
      • Country211, Accuracy
    • Image-Text Retrieval
      • Flickr30k, Recall
      • COCO Caption, Recall
    • Action Recognition
      • UCF101, Accuracy
      • Kinetics700, Mean(top1, top5)
      • RareAct, mWAP, mSAP
    • Object Detection
      • COCO 2017 Detection, box mAP
      • LVIS, box mAP
      • ODinW, box mAP
    • Semantic Segmentation
      • Cityscapes, Mean IoU
      • ADE20K, Mean IoU

VLM Transfer learning

which adapts VLMs to fit downstream tasks via prompt tuning, feature adapter.

  • image and text distributions gap: downstream dataset may have task-specific image styles and text formats
  • training objectives gap: VLMs are generally trained with task-agnostic objectives, while downstream tasks often involve task-specific objectives. (coarse or fine-grained classification, region or pixel-level recognition)

Transfer via Prompt Tuning

Inspired by the "prompt learning" in NLP

  • pros
    • simple, easy-to-implement
    • requires little extra network layer or complex network modifications
    • adapting VLMs in a black-box manner, which has clear advantages in transferring VLMs that involve concerns in intellectual property.
  • cons
    • low flexibility by following the manifold (잠재 공간) of the original VLMs in prompting.

Transfer with Text Prompt Tuning

  • Exploring more effective and efficient learnable text prompts with several labelled downstream samples for each class.
    • supervised and few-shot supervised
      • CoOp: Exploring context optimization to learn context words for a single class name with learnable word vectors.
      • CoCoOp: Exploring conditional context optimization that generates a specific prompt for each image.
      • SubPT: designs subspace prompt tuning to improve the generalization of learned prompts.
      • LASP: regularizes learnable prompts with hand-engineered prompts.
      • VPT: models text prompts with instance-specific distribution with better generalization on downstream tasks.
      • KgCoOp: enhances the generalization of unseen class by mitigating the forgetting of textual knowledge.
      • SoftCPT: fine-tunes VLMs on multiple few-shot tasks simultaneously for benefiting from multi-task learning.
      • PLOT: employs optimal transport to learn multiple prompts to describe the diverse characteristics of a category.
      • DualCoOp, TaI-DP: transport VLMs to multi-label classification tasks.
        • DualCoOp: adopts both positive and negative prompts for multi-label classification
        • TaI-DP: double-grained prompt tuning for capturing both coarse-grained and fine-grained embeddings.
      • DenseCLIP: explores language-guided fine-tuning that employs visual features to tune text prompts for dense prediction.
      • ProTeCt: improves the consistency of model predictions for hierarchical classification task.
    • unsupervised
      • UPL: optimizes learnable prompts with self-training on selected pseudo-labeled samples.
      • TPT: explores test-time prompt tuning to learn adaptive prompts from a single downstream sample.

Text Prompt Tuning

  • V is learnable word vectors that are optimized by minimizing the classification loss with the downstream samples.

Transfer with Visual Prompt Tuning

  • Transfers VLMs by modulating the input of image encoder.
    • VP: adopts learnable image perturbations vv to modify the input image xIx^I by xI+vx^I + v, aiming to adjust vv to minimize a recognition loss.
    • RePrompt: integrates retrieval mechanisms into visual prompt tuning, allowing leveraging the knowledge from downstream tasks.
  • enables pixel-level adaptation to downstream tasks, benefiting them greatly especially for dense prediction tasks.

Visual Prompt Tuning

Transfer with Text-Visual Prompt Tuning

  • modulate the text and image inputs simultaneously, benefiting from joint prompt optimization on multiple modalities.
    • UPT: unifies prompt tuning to jointly optimize text and image prompts, demonstrating the complementary nature of the two prompt tuning tasks.
    • MVLPT: explores multi-task vision-language prompt tuning to incorporate cross-task knowledge into text and image prompt tuning.
    • MAPLE: conducts multi-modal prompt tuning by aligning visual prompts with their corresponding language prompts, enabling a mutual promotion between text prompts and image prompts.
    • CAVPT: introduces a cross attention between class-aware visual prompts and text prompts, encouraging the visual prompts to concentrate more on visual concepts.

Transfer via Feature Adaptation

  • adapt image or text features with an additional light-weight feature adapter
    • Clip-Adapter: inserts several trainable linear layers after CLIP's language and image encoders and optimized them while keeping CLIP architecture and parameters frozen.
    • Tip-adapter: a training-free adapter that directly employs the embeddings of few-shot labelled images as the adapter weights.
    • SVL-Adapter: a self-supervised adapter which employs an additional encoder for self-supervised learning on input images.
  • flexible and effective as its architecture and the insertion manner allow tailoring flexibly for different and complex downstream tasks.
  • requires modifying network architecture and thus can not handle VLMs that have concerns in intellectual property.

Other Transfer Methods

  • Direct fine-tuning, architecture modification, cross attention
    • Wise-FT: combines the weights of a fine-tuned VLM and the original VLM for learning new information from downstream tasks.
    • MaskCLIP: extracts dense image features by modifying the architecture of the CLIP image encoder.
    • VT-CLIP: introduces visual-guided attention to semantically correlate text features with downstream images, leading to a better transfer performance.
    • CALIP: introduces parameter-free attention for effective interaction and communication between visual-guided text features.
    • TaskRes: directly tunes text-based classifier to exploit the old knowledge in the pre-trained VLM.
    • CuPL, VCD: employ large language models like GPT-3 to augment text prompts for learning rich discriminative text information.

Feature Adaptation

VLM Knowledge Distillation

  • distils general and robust VLM knowledge to task-specific models without the restriction of VLM architecture, benefiting task-specific designs while tackling various dense prediction tasks.
  • most VLM knowledge distillation methods focus on transferring image-level knowledge to region- or pixel-level tasks such as object detection and semantic segmentation.

Knowledge Distillation for Object Detection

  • To distill VLM knowledge to enlarge the detector vocabulary
  • To better align image-level and object-level representations
    • ViLD: distills VLM knowledge to a two-stage detector whose embedding space is enforced to be consistent with that of CLIP image encoder.
    • HierKD: hierarchical global-local knowledge distillation.
    • RKD: region-based knowledge distillation for better aligning region-level and image-level embeddings.
    • ZSD-YOLO: self-labeling data augmentation for exploiting CLIP for better object detection.
    • OADP: proposal features while transferring contextual knowledge.
    • BARON: uses neighborhood sampling to distill a bag of regions instead of individual regions.
    • RO-ViT: distills information from VLMs for open-vocabulary detection.
  • VLM distillation via prompt learning
    • DetPro: a detection prompt technique for learning continuous prompt representations for open-vocabulary object detection.
    • PrompDet: regional prompt learning for aligning word embeddings with regional image embeddings.
    • PB-OVD: trains object detectors with VLM-predicted pseudo bounding boxes.
    • XPM: a robust cross-modal pseudo-labeling strategy that employs VLM-generated pseudo masks for open-vocabulary instance segmentation.
    • P3OVD: prompt-driven self-training that refines the VLM-generated pseudo labels with fine-grained prompt tuning.

Knowledge Distillation for Semantic Segmentation

  • Leverage VLMs to enlarge the vocabulary of segmentation models, aim to segment pixels described by arbitrary texts. (i.e., any categories of pixels beyond base classes)
  • Tackling the mismatch between image-level and pixel-level representations.
    • CLIPSeg: a lightweight transformer decoder to extend CLIP for semantic segmentation.
    • LSeg: maximizes the correlation between CLIP text embeddings and pixel-wise image embedding encoded by segmentation models.
    • ZegCLIP: employs CLIP to generate semantic masks and introduces a relationship descriptor to mitigate overfitting on base classes.
    • MaskCLIP+, SSIW: distill knowledge with VLM-predicted pixel-level pseudo labels.
    • FreeSeg: generates mask proposals first and then performs zero-shot classification for them.

Knowledge distillation for weakly-supervised semantic segmentation

  • Leverage both VLMs and weak supervision (e.g., image-level labels) for semantic segmentation.
  • CLIP-ES: employs CLIP to refine the class activation map by designing a softmax function and a class-aware attention-based affinity module for mitigating the category confusion issue.
  • CLIMS: employs CLIP knowledge to generate high-quality class activation maps for better weakly-supervised semantic segmentation.

Performance

  • VLM is largely attributed to three factors: Big data, Big Model, and Task-agnostic learning.
  • Limitations
    • When data/model size keeps increasing, the performance saturates and further scaling up won’t improve performance
    • Adopting large-scale data in VLM pre-training necessitates extensive computation resources
    • Adopting large models introduces excessive computation and memory overheads in both training and inference
  • Transfer Learning
    • can mitigate the domain gaps by learning from task-specific data, being labelled or unlabelled.
    • Supervised > few-shot supervised = unsupervised transfer (overfitting but challenging)
  • Knowledge Distillation
    • brings clear performance improvement on detection and segmentation tasks
    • introduces general and robust VLM knowledge while benefiting from task-specific designs
  • the development of VLM pre-training for dense visual recognition tasks (on region or pixel-level detection and segmentation) lag far behind.
  • require certain norms in term of training data, networks and downstream tasks.
    • VLM transfer: release their codes and do not require intensive computation resources, easing reproduction and benchmarking.
    • VLM pre-training: studied with different data and networks, making benchmarking a very challenging task. also use non-public training data, or require intensive computation resources.
    • VLM knowledge distillation: adopt different task-specific backbones, which complicates benchmarking.

Challenges

  • VLM pre-training
    • Fine-grained vision-language correlation modelling: can better recognize patches and pixels beyond images, greatly benefiting dense prediction tasks
    • Unification of vision and language learning: enables efficient communications across data modalities which can benefit both training effectiveness and training efficiency.
    • Pre-training VLMs with multiple languages: could introduce bias in term of cultures and regions and hinder VLM applications in other language areas.
    • Data-efficient VLMs: instead of merely learning from each image-text pair, more useful information could be learned with the supervision among image-text pairs.
    • Pre-training VLMs with LLMs: employ LLMs to augment the texts in the raw image-text pairs, which provides richer language knowledge and helps better learn vision-language correlation.
  • VLM Transfer Learning
    • Unsupervised VLM transfer: much lower risk of overfitting than few-shot supervised transfer.
    • VLM transfer with visual prompt/adapter: Existing studies focus on text prompt learning. Visual prompt learning or visual adapter, which is complementary to text prompting and can enable pixel-level adaptation in various dense prediction tasks.
    • Test-time VLM transfer: Existing studies conduct transfer by fine-tuning VLMs on each downstream task (i.e., prompt learning), leading to repetitive efforts while facing many downstream tasks. Adapting prompts on the fly during inference can circumvent the repetitive training in existing VLM transfer.
    • VLM transfer with LLMs: Different from prompt engineering and prompt learning, exploit LLMs to generate text prompts that better describe downstream tasks. This approach is automatic and requires little labelled data.
  • VLM knowledge distillation
    • Knowledge distillation from multiple VLMs: harvest their synergistic effect by coordinating knowledge distillation from multiple VLMs.
    • Knowledge distillation for other visual recognition tasks: leverage the knowledge distilled from VLMs to improve performance on other visual recognition tasks. (instance segmentation, panoptic segmentation, person reidentification)

Ref

  • Zhang, J., Huang, J., Jin, S., & Lu, S. (2024). Vision-Language Models for Vision Tasks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8), 5625–5644. https://doi.org/10.1109/TPAMI.2024.3369699

IAI +003

· 5 min read

Local Search Problem

To find the state that gives the optimal/best value of the evaluation function

  • It can be seen as an optimization problem.
  • a computational problem that finds the best solution (a state) that satisfies the given constraints
  • evaluation function === objective function
  • Only cares about the optimal solution/best state without considering the paths to reach the best state (the optimal solution)
  • Not systematic

Feasible region & solution

  • Feasible region: the set of all possible or candidate solutions which are the solutions that satisfies the problem's constraints
  • Feasible solution: a solution in the feasible region

Search Problem vs Local Search Problem

Path-based vs State-based

AspectsSearch ProblemLocal Search Problem
StateAll possible states - state-space landscapeRange of decision variables and constraints
GoalGoal state & goal testEvaluation function & objective function
EvaluationMeasure closeness to goal - distance/fitnessMinimize cost or maximize fitness
Transition/SuccessorTransition functionSuccessor function

Discrete & Continuous Optimization

  • Discrete optimization: optimization problems where the solution space is discrete (e.g., 8 queens problem)
  • Continuous optimization: optimization problems where the solution space is continuous (e.g., real numbers, any value within a range)
  • All possible states: state-space landscape
  • Transition function: To find neighbor or successor state
  • Goal state
  • Objective function: A way to measure how close to the goal state
  • Start state

Search state-space

  • Global Maximum: A state that maximizes the objective function over the entire state space
  • Local Maximum: A state that maximizes the objective function within a small area around it.
  • Plateau: A state such that the objective function is constant in an area around it.
    • Shoulder: A plateau that has uphill edge.
    • Flat: A plateau whose edges go downhill.

Advantages

  • use little memory
  • can often find reasonably good solution in large or infinite search spaces
  • useful for solving pure optimization problems
  • don't need to know the path to the solution.

Hill climbing

keeps track of one current state and on each iteration moves to the neighboring state with highest value.

  • f=max(cost(X))f = max(-cost(X))
  • Steps
    • Evaluate the initial stat
    • If it is equal to the goal state, return. Otherwise, continue.
    • Find a neighboring state
    • Evaluate this state. If it is closer to the goal state than before, replace the initial state with this state.
    • Repeat steps 2-4 until it reaches a goal state (local or global maximum) or runs out of time.
  • No search tree, No backtracking, Don't look ahead beyond the current state.
    • get stuck due to local maxima, plateaus, or ridges.

Variations of HC

  • Simple HC: greedy local search which expands the current state and moves on to the best neighbor.
  • Stochastic HC: choose randomly among the neighbors going uphill.
  • First-choice HC: generate random successor until one is better. Good for states with high numbers of successors.
  • Random restart: conducts a series of hill climbing searches from random initial states until a goal state is found.

Simulated Annealing

based upon the annealing process to model the search process for finding an optimal solution to an optimisation problem

  • annealing schedule, temperature, energy
  • finds the minimal value of the objective function (energy function)
  • starts with a high temperature and then gradually reduces the temperature
  • P=eΔE/kTP = e^{-\Delta E / kT}
    • ΔE\Delta E: how bad the new state is compared to the old state
    • TT: temperature is getting lower over time
    • kk: a scaling factor
  • Swap condition: ΔE<=0\Delta E <= 0 or ΔE/kT>random{-\Delta E / kT} > \text{random}

Evolutionary algorithms

  • Local beam search
  • Stochastic beam search
  • Genetic algorithms

Characteristics

  • size of the population
  • representation of each individual
  • mixing number
  • selection process for selecting the individuals who will become the parents of the next generation
  • recombination procedure
  • mutation rate
  • makeup of the next generation

Genetic algorithm

It uses operators, such reproduction, crossover and mutation, inspired by the natural evolutionary principles.

  • State: is represented by an individual in a population. Traditional representation is a chromosome
  • Objective function: is used to evaluate the fitness of an individual (= fitness function, 적합도 함수)
  • Successor function: consists of three operators: reproduction, crossover, and mutation
  • Solution: is found through evolution from one generation to another generation

Genetic Algorithm

Roulette Wheel Selection

  • Compute total fitness of all individuals.
    • Example: A=30, B=20, C=40, D=10 → Total = 100.
  • Calculate probability of each individual being selected
    • Formula: P(i)=fitness(i)total_fitnessP(i) = \frac{fitness(i)}{total\_fitness}
      • A = 30/100 = 0.30
      • B = 20/100 = 0.20
      • C = 40/100 = 0.40
      • D = 10/100 = 0.10
  • Convert to cumulative probabilities
    • P4 = 0.10
    • P4 + P3 = 0.50
    • P4 + P3 + P2 = 0.90
    • P4 + P3 + P2 + P1 = 1.00
  • Generate a random number between 0 and 1.
  • Select an individual based on the random number and cumulative probabilities.

Roulette Wheel Selection

  • ⚫ random = 0.07 → falls in P4 [0, 0.10)
  • 🔺 random = 0.37 → falls in P3 [0.10, 0.50)
  • ⬟ random = 0.82 → falls in P2 [0.50, 0.90)

Applications of GA

  • Parameter tuning: optimize the parameters in NN
  • Planning: economic dispatch, train timetabling
  • Design & Control problems: robotic control, adaptive control systems
  • Successful use of GA requires careful engineering of the representation

FDA +003

· 8 min read

CRISP-DM

CRISP-DM (Cross-Industry Standard Process for Data Mining)

  1. Business understanding
  2. Data understanding
  3. Data preparation
  4. Modeling
  5. Evaluation

Business understanding

  • Determine business objectives
  • Assess situation
  • Determine data mining goals
  • Produce project plan

Data understanding

  • Collect initial data
  • Describe data
  • Explore data
  • Verify data quality

Data preperation

  • Select data
  • Clean data
  • Consturct data
  • Integrate data
  • Format data

Modeling

  • Select modeling technique
  • Generate test design
  • Build model
  • Assess model

Evaluation

  • Evaludate results
  • Review process
  • Determine next steps

Deployment

  • Plan development
  • Plan monitoring & maintenance
  • Produce final report
  • Review project

Instance & Attributes

  • Instance: the terms associated with specific objects. Instances are described by a set of values for the features.
  • Attributes: the collection of features of the object that are maintained in a dataset.
  • Object: a collection of features about which measurements can be taken.
    • Car: fuel consumption, cylinders, horsepower...

Qualitative & Quantitative data

  • Qualitative data: less structured, non-statistical, measured using other descriptors and identifiers
    • white, heavy, wild...
  • Quantitative data: statistical, measured using hard numbers.
    • 130cm, 400kg, 4 legs...

Discrete & Continuous (Quantitative) data

  • Discrete data: fixed, round numbers, countable
    • number of legs, count of aeroplane depatures, number of times a person commutes for a job in a week
  • Continuous data: measured over time intervals
    • weight, solar irradiation, temperature of a room

Summary

QualitativeQuantitiative (discrete)Quantitiative (continuous)
TitleDurationRating
Production CountryRelease Year
Director
Genres
Description

Categorizing attributees

항목Nominal (categorical)OrdinalIntervalRatio
정의값이 라벨·이름 역할만 함. 순서 없음.값 사이에 순서 있음. 간격은 정의되지 않음.순서 + 고정·동일한 단위(간격). 절대 0 없음.Interval 속성 + 절대적 0 있음. 차이와 비율 모두 의미 있음.
예시머리카락 색 {blonde, brown, ginger}
우편번호
산업코드, 연구분야 코드
Blood type, License number
키: tall > average > short
체중: light < average < heavy
Star ratings, Tshirt sizes
키(cm), 몸무게(kg) (원문 기준)
12시간제 시각(차이 비교)
시간 간격(5분~10분)
Waist size, Time
나이(년)
소득(천 달러)
켈빈 온도
금액, 개수, 질량, 길이, 전류
Body weight, Medicine dosage
예시머리카락 색 {blonde, brown, ginger}, 우편번호, 산업코드/연구분야 코드, Blood type, License number키: tall > average > short, 체중: light < average < heavy, Star ratings, Tshirt sizes키(cm), 몸무게(kg) (원문 기준), 12시간제 시각(차이 비교), 시간 간격(5분~10분), Waist size, Time나이(년), 소득(천 달러), 켈빈 온도, 금액, 개수, 질량, 길이, 전류, Body weight, Medicine dosage
허용 비교=, ≠=, ≠, <, >=, ≠, <, >, +, −=, ≠, <, >, +, −, ×, ÷
연산 / 분석Mode(최빈값)
Entropy(불확실성 측정)
Contingency table(교차표)
Correlation(Chi-squared test of independence)
Chi-squared test
Median
Percentiles
Rank correlation(Spearman)
Run tests(Mann–Whitney U, Wilcoxon)
Sign tests
Mean
Standard Deviation
Pearson correlation
T-test
F-test(ANOVA)
Geometric Mean
Harmonic Mean
Percent variation(CV)
설명통계적 평균·표준편차 무의미순위는 비교 가능하지만 간격·크기 비교 불가.
중앙값·순위기반 통계 적합.
간격 일정 → +, − 가능.
절대 0 없음 → 비율 해석 불가.
절대 0 → 모든 연산 가능.
비율·곱셈 해석 가능.
변수 특징Named variablesNamed & Ordered variablesNamed & Ordered & Distance between variablesNamed & Ordered & Distance between variables & Makes sense to multiply/divide
Analysis MethodFrequencyFrequency
Median and percentiles
Frequency
Median and percentiles
Add or Subtract
Mean, standard deviation, standard error of the mean
Frequency
Median and percentiles
Add or Subtract
Mean, standard deviation, standard error of the mean
Ratio
데이터 유형QualitativeQualitativeQuantitativeQuantitative
Attribute TypeDescriptionExamplesOperations
NominalThe values of a nominal attribute are just different names, i.e. nominal attributes provide only enough information to distinguish one object from another. (=, ≠)post codes, employee ID numbers, eye colour, sex: { male, female }mode, entropy, contingency, correlation, chi squared test
OrdinalThe values of an ordinal attribute provide enough information to order objects. (<, >)hardness of minerals, { good, better, best }, grades, street numbersmedian, percentiles, rank correlation, run tests, sign tests
IntervalFor interval attributes, the differences between values are meaningful, i.e. a unit of measurement exists. (+, −)calendar dates, temperature in Celsius or Fahrenheitmean, standard deviation, Pearson’s correlation, t and F tests
RatioFor ratio variables both differences and ratios are meaningful. (×, ÷)temperature in Kelvin, monetary quantities, counts, age, mass, length, electrical currentgeometric mean, harmonic mean, percent variation

Structured & Unstructured Data

  • Structured Data: which has an associated fixed data structure.
    • Relational table
    • Manageable
  • Unstructured Data: which is expressed in natural language and no specific structure and domain types are defined.
    • Documents and sounds.
  • Semi-structured Data: the format is not fixed and has some degree of flexibility.
    • XML, JSON
    • emails, text data, image, video and sound, zipped files, web pages.

Curse of dimensionality

The explosive nature of increasing data dimensions and its resulting exponential increase in computational efforts required for its processing and/or analysis.

  • Characteristics of structured data
    • Dimensionality: Datasets with higher numbers of attributes have more dimensions, challenging to work with high dimensional data.
    • Sparsity: A dataset termed spare data or having the property of sparsity, which contains many zeros values for most of the attributes.
    • Resolution: The patterns depend on the scale or level of resolution.
  • Real life data is usually in a lower dimensional manifold
    • many dimensions can be either ignored or the dimensionality can be reduced.
  • Local smoothness: small changes in input values give small changes in output values.
    • Local interpolation to make predictions.

Datasets

  • Record Data
    • Data Matrix
    • Document data: a special type of data matrix where the attributes are of the same type and are asymmetric.
    • Transaction data: a special type of record data. Each record involves a set of items. Most often, the attributes are binary, indicating whether or not an item was purchased.
  • Graph data
    • World wide web, Molecular structures (Simplified molecular-inputline-entry system, SMILES)
  • Ordered data: sequence data, this is a sequence of individual entities, such as a sequence of words or letters.
    • Spatial data
    • Temporal data
    • Sequential data

Data collection

Quality

  • Missing values: The data was not collected (e.g. age), or some attributes may not be applicable in all cases (e.g. annual income for children).
  • Empty values: Unlike missing values, an empty value is the one that has no actual value, whereas a missing value has an actual value but it is missing somehow.
  • Noise: The modification of actual values.
  • Outlier: A single or very low frequency occurrence of a value of an attribute that is far from the bulk of attribute values.
  • Duplicate data: The same data is recorded multiple times.
  • Inconsistent formats: When the same set of data appears in multiple tables from different inputs.

Data auditing

  • attributes
  • measured values
  • comments
  • attribute type
  • operations we can do
  • data type (knime/py)
  • missing value
  • any comments about qualities
attributesmeasured valuescommentsattribute typeoperations we can doData type (knime/python)missing valueAny comments about qualities
fixed acidity[3.8, 15.9]continuous numberratioall arithmeticfloatN/A
volatile acidity[0.08, 1.58]continuous numberratioall arithmeticfloatN/A
citric acid[0, 1.66]continuous numberratioall arithmeticfloatN/A
residual sugar[0.6, 65.8]continuous numberratioall arithmeticfloatN/A
chlorides[0.009, 0.611]continuous numberratioall arithmeticfloatN/A
free sulfur dioxide[1, 289]continuous numberratioall arithmeticintN/A
total sulfur dioxide[6, 440]continuous numberratioall arithmeticintN/A
density[0.98711, 1.03898]continuous numberratioall arithmeticfloatN/A
pH[2.72, 4.01]continuous numberintervalorder, arithmeticfloatN/A
sulphates[0.22, 2]continuous numberratioall arithmeticfloatN/A
alcohol[8, 14.9]continuous numberratioall arithmeticfloatN/A
quality[extremely dissatisfied, extremely satisfied, moderately dissatisfied, moderately satisfied, neutral, slightly dissatisfied, slightly satisfied]distributedordinalorder, countingstrN/A
color[white, red]distributednominalcountingstrN/A

Vocabulary for AI +004

· 4 min read

Vocabulary & Expressions

Term/ExpressionDefinitionSimpler ParaphraseMeaning
relaxto make a rule or control less severeto make less strict or severe완화하다, 느슨하게 하다
reconstructto build or form againto rebuild재구성하다
resideto live in a place; to exist or be presentto live; to be located위치하다, 존재하다
lay outto arrange or plan something in a clear and organized wayto arrange배치하다, 설계하다
resembleto look like or be similar to someone or somethingto look like닮다, 유사하다
amnesiaa condition in which a person is unable to remember thingsmemory loss기억상실증
vicinitythe area near or surrounding a particular placenearby area인근, 근처
schematicallyin a way that represents the main features or relationships of something in a simple and clear formin a simplified way도식적으로
superimposeto place or lay something over something elseto overlay겹쳐 놓다, 중첩하다
plateausa state of little or no change following a period of activity or progressa period of stability정체기, 안정기, 고원
wanderto move around without a fixed course, aim, or goalto roam방황하다, 헤매다
consecutivefollowing continuously; in unbroken or logical sequencesequential연속적인
convergeto come together from different directionsto meet수렴하다, 모이다
adagea saying or proverb expressing a common trutha wise saying격언, 속담
porcupinea large rodent with sharp quills on its backa spiny animal호저
stumbleto trip or lose balance while walking or runningto trip비틀거리다, 넘어지다
metallurgythe science and technology of metalsmetal science금속공학
crystallinehaving the structure and form of a crystalcrystal-like결정질의
crevicea narrow opening or fissurea crack틈, 균열
bumpyhaving an uneven or jolting surfaceuneven울퉁불퉁한
dislodgeto remove or force out from a positionto remove제거하다, 떼어내다
exponentiallyin a way that increases rapidly and significantlyrapidly기하급수적으로
haltto stop or pause somethingto stop중단하다, 멈추다
unfruitfulnot producing good resultsunproductive결실이 없는
analogoussimilar in some waycomparable유사한
proportionalcorresponding in size or amount to something elserelative비례하는
retainedkept or continued to havekept유지된
in accordance withfollowing or obeying a rule, law, or wishaccording to~에 따라, ~에 일치하여
constituteto be a part of somethingto form구성하다
permuteto change the order or arrangement of somethingto rearrange순열하다, 배열을 바꾸다
chromosomea thread-like structure of nucleic acids and protein found in the nucleus of most living cellsgenetic structure염색체
auxiliaryproviding supplementary or additional help and supportsupplementary보조의
discriminativeable to distinguish or differentiatedistinguishing구별 가능한
exploitto make full use of and benefit from somethingto utilize활용하다
perturbationa small change or variationa disturbance교란
modulateto adjust or alter the intensity or frequency of somethingto adjust조절하다, 변조하다
retrievalthe process of getting stored information from a computersearch검색
leverageto use something to maximum advantageto utilize활용하다
discrepancya difference or inconsistencydifference불일치
heterogeneitythe quality or state of being diverse in character or contentdiversity이질성
pseudonymizationthe process of replacing private identifiers with fake identifiers or pseudonymsanonymization가명화
denoteto be a sign of somethingto signify나타내다, 의미하다

FSD +003

· 4 min read

Terminology

  • Software: A set of statements written in a programming language to perform tasks
  • Statement: A single instruction in a program that performs an action when executed.
  • Snippet: A block of statements.
  • Software Development: The process of creating a software program.
  • OOP: Program composed of interconnected objects at runtime.
  • Expression: An entity-code component of a statement that can be evaluated to produce a value.
  • Assign: The process of storing the result (a value) of one or more expressions.
  • Value: A data item (literal or computed) that is stored in a variable.
  • Compiler: A special program that translates a programming language's source code into machine code.
    • Compilers complete the conversion process all at once after changes are made to the code and before the code is executed
  • Interpreter: A computer program that directly executes code without requiring it to be previously compiled into machine language.
    • Interpreters complete the conversion process one step at a time while the code is being executed.

Software development

  • Software development process is an iterative approach.
  • java
    • javac Welcome.java: Compiles the Java source file Welcome.java into class binary file.
    • java Welcome: Executes the Java program Welcome.
  • python
    • python welcome.py: Executes the Python script welcome.py.

OOP

  • Object: An object is a thing, tangible and intangible. An object has fields that contain the data and methods to access and modify the data.
  • Class: A class is an abstract definition of objects. A class is a template of a blueprint that defines what data and methods are included in objects.
  • Method: A block of code grouped together to perform an operation. A method has a name, parameters, and a return type.
  • Field: A field is a data attribute of an object. A field value is exposed using object methods.
  • Organizing code into classes improves modularity, reusability, extendability, and scalability.

Java vs Python

Identifier typeJavaPython
ClassUse CamelCase for multi-word classesUse snake_case for multi-word classes
Functionuse verbs or verb phrasesuse lowercase_with_underscores
Procedureuse verbs or verb phrasesuse lowercase_with_underscores
VariablecamelCaselowercase_with_underscores
ConstantAll uppercase words separated by underscoresAll uppercase words separated by underscores
PackageLowercase words separated by dotsLowercase words separated by underscores
  • Java uses the toString() function to return objects' information.
  • Python can refer to attributes directly or use the __str()__ function to return objects' information

Data types

Data TypeSizeDefault valueDescription
byte1 byte08-bit signed integer
short2 bytes016-bit signed integer
int4 bytes032-bit signed integer
long8 bytes064-bit signed integer
float4 bytes0.0f32-bit floating point
double8 bytes0.0d64-bit floating point
boolean1 bitfalsetrue or false
char2 bytes'\u0000'16-bit Unicode character

Non-Primitive Data Types

  • Non-primitive: Arrays, Classes, Interfaces, and Strings.
  • Non-primitive data types are by default set to null in Java, None in Python.

Variables

  • Static: enables the variable to be used without creating an object of its defining class.
  • Final: makes the variable unchangeable.

Operators

Operator CategoryJavaPython
Unaryexpr++ expr--
++expr --expr +expr -expr+expr -expr
Arithmetic* / &* / &
+ -+ -
Relational< > <= >=< > <= >=
== !=== !=
Logical! &&not and
||or
Ternary(expr1) ? <expr2> : <expr3>(expr1) if <expr2> then <expr3>
Assignment= += -= *= /= %== += -= *= /= %= **=
Identity/Membershipis is not in not in
  • Java: boolean q = (5 % 2 != 2) ? true : false
  • Python: q = True if (5 % 2 != 2) else False

Standard Input

import java.util.Scanner;

public class Inputs {
static Scanner in = new Scanner(System.in);

public static void main(String[] args) {
System.out.print("X = ");
int x = in.nextInt();
System.out.println("x squared = " + Math.pow(x, 2));
}
}
import sys

x = int(input("x = "))

print("x squared = ", pow(x, 2))

String

String (java)

Immutable

  • String s1 = "Hello";: initialize using literal syntax
  • String s2 = new String("Hello");: initialize using a constructor
s1 == s1 // false
s1.equals(s2) // true

String Format (Python)

SymbolMeaningExample codeOutput
<Left alignf'[{42:<5}]'[42 ]
>Right alignf'[{42:>5}]'[ 42]
^Center alignf'[{42:^5}]'[ 42 ]
< with fill charLeft align with custom fillf'[{42:-<5}]'[42---]
> with fill charRight align with custom fillf'[{42:->5}]'[---42]
^ with fill charCenter align with custom fillf'[{42:->5}]'[-42--]

Array

Array (java)

int[] x = {2, 4, -1, 11, 3};

  • Declaration: int[] x
  • Instantiation: x = new int[5];
  • Initialization: x[0] = 2; x[1] = 4; x[2] = -1;

IAI +002

· 10 min read

Environment

  • All possible state and information about how the states are related.
  • The costs from one state to each of its adjacent states are also given.

Agent

  • Simulated intelligence knows which state it is in.
  • If it takes an action at a given state, it knows the next state and the corresponding cost.

Characteristics of the environment

  • Fully Observable: The agent always knows the current state of the environment at each point in time.
  • Deterministic: The next state of the environment is completely determined by the current state and the action taken by the agent.
  • Static: The environment is unchanged.
  • Discrete: A limited number of distinct, clearly defined actions.
  • Single agent: An agent operating by itself in an environment.

Search problem

Finding a path from a starting point to a goal point in a space.

  • The initial state
  • State space: The environment or area where the search takes place
  • A set of actions: The possible actions that the agent can take in each state.
    • ACTION (s)
  • A transition model:
    • takes in a state and an action.
    • returns the successor state, which is any state reachable from doing action a in state s.
    • RESULT(s, a)
  • A goal state:
    • The target location or position that needs to be reached.
    • represented by a goal test function
  • A path cost function:
    • The cost associated with a particular path taken through the state space.
    • c(s1, a, s2)

Frontier

  • A set of nodes that are under consideration to be expanded.
  • A set of leaf nodes in the search spanning tree are available for expansion at any given step.
  • A search algorithm determines how to choose a node in the Frontier to grow the search spanning tree.

Search Algorithm

Explored Set

  • The frontier in graph search separates the search-space graph into two regions, the explored region and the unexplored region, so that Every path from the initial state to an unexplored state has to pass through a state in the frontier.

Performance measures

  • Completeness
  • Cost Optimality
  • Time complexity
  • Space complexity

BFS

Queue

BFS Tree

from collections import deque

def bfs_tree(start, goal_test, successors):
"""
start: 시작 상태
goal_test(s): 목표 검사 함수 -> bool
successors(s): 상태 s에서 갈 수 있는 다음 상태들의 리스트 반환

반환: 목표에 도달하는 경로(list) 또는 None
(Tree-search: explored/중복 체크 안 함)
"""
if goal_test(start):
return [start]

# 노드 = (state, parent_index)
nodes = [(start, None)]
frontier = deque([0]) # nodes의 인덱스를 큐에 저장

while frontier:
parent_idx = frontier.popleft()
parent_state, _ = nodes[parent_idx]

for nxt in successors(parent_state):
nodes.append((nxt, parent_idx))
child_idx = len(nodes) - 1

if goal_test(nxt):
# 경로 복원
path, i = [], child_idx
while i is not None:
path.append(nodes[i][0])
i = nodes[i][1]
return list(reversed(path))

frontier.append(child_idx)

return None
from collections import deque

def bfs_graph(start, goal_test, successors):
"""
start: 시작 상태 (예: 'Arad')
goal_test(s): s가 목표면 True
successors(s): 상태 s에서 (다음상태, 비용) 혹은 그냥 다음상태 리스트 반환
아래에서는 다음상태 리스트라고 가정
반환: start -> ... -> goal 경로 리스트, 없으면 None
"""
# 노드 = (state, parent_index)
frontier = deque([(start, None)]) # FIFO 큐
frontier_states = {start} # frontier에 있는 상태 집합 (중복 방지)
explored = set() # 이미 확장한 상태(Closed)

# 경로 복원을 위해 모든 노드를 배열에 따로 저장
nodes = [(start, None)] # nodes[i] = (state, parent_index)
index_in_queue = deque([0]) # frontier에서의 인덱스(=nodes의 인덱스)

if goal_test(start):
return [start]

while frontier:
state, parent = frontier.popleft()
node_idx = index_in_queue.popleft()
frontier_states.discard(state)
explored.add(state)

for nxt in successors(state):
if (nxt not in explored) and (nxt not in frontier_states):
# child 노드 저장
nodes.append((nxt, node_idx))
child_idx = len(nodes) - 1

if goal_test(nxt):
# 경로 복원
path = []
i = child_idx
while i is not None:
path.append(nodes[i][0])
i = nodes[i][1]
return list(reversed(path))

# frontier에 삽입
frontier.append((nxt, node_idx))
index_in_queue.append(child_idx)
frontier_states.add(nxt)

return None
graph = {
"Arad": ["Sibiu", "Timisoara", "Zerind"],
"Sibiu": ["Arad", "Fagaras"],
"Timisoara": ["Arad", "Lugoj"],
"Zerind": ["Arad"],
"Fagaras": [],
"Lugoj": []
}

path = bfs(
start="Arad",
goal_test=lambda s: s == "Lugoj",
successors=lambda s: graph.get(s, [])
)
print(path)
  • Has the shallowest path to every node on the frontier
  • memory-intensive as it stores all nodes.

DFS

Stack

def depth_first_search(initial_state, goal_test, actions):
"""
initial_state: 시작 상태
goal_test(s): 상태 s가 목표면 True
actions(s): 상태 s에서 이동 가능한 다음 상태들의 리스트 반환
반환: start → goal 경로(list) 또는 None
"""

# 모든 노드 저장: nodes[i] = (state, parent_index)
nodes = [(initial_state, None)]

# frontier ← FILO 스택 (여기서는 노드 인덱스만 저장)
frontier = [0]

# frontier에 있는 상태들의 집합 (중복 삽입 방지용)
stacked_states = {initial_state}

# explored ← 이미 확장(자식 생성)한 상태들의 집합
explored = set()

# 시작 상태가 목표라면 바로 반환
if goal_test(initial_state):
return [initial_state]

# DFS 루프 시작
while True:
# frontier가 비면 실패
if not frontier:
return None

# 스택에서 맨 위 노드 꺼내기
node_idx = frontier.pop()
state, parent_idx = nodes[node_idx]

# 스택 상태 집합에서 제거 (이제 확장할 차례)
stacked_states.discard(state)

# 현재 상태에서 가능한 모든 자식 상태 확인
for child_state in actions(state):
# 자식 상태가 explored나 frontier에 없을 때만 처리
if (child_state not in explored) and (child_state not in stacked_states):
# 새 노드 저장 (부모는 현재 노드)
nodes.append((child_state, node_idx))
child_idx = len(nodes) - 1

# 목표 상태면 경로 복원해서 반환
if goal_test(child_state):
path, i = [], child_idx
while i is not None:
path.append(nodes[i][0])
i = nodes[i][1]
return list(reversed(path))

# 목표가 아니면 스택에 push
frontier.append(child_idx)
stacked_states.add(child_state)

# 모든 자식 처리가 끝나면 explored에 추가
explored.add(state)
  • Low memory usage
  • Can get stuck in deep or infinite branches (Not cost-optimal)

UCS

Priority Queue

  • lowest path cost f(n) = g(n)
  • Best-first search with the evaluation function
  • Uniform-cost search is complete and cost optimal
  • Dijkstra's algorithm finds the shortest path from the root node to every other node in a graph with non-negative edge weights.
  • A special case of Dijkstra's algorithm in which the
import heapq

def uniform_cost_search(initial_state, goal_test, actions, step_cost):
"""
initial_state: 시작 상태
goal_test(s): 상태 s가 목표면 True
actions(s): 상태 s에서 가능한 다음 상태 리스트
step_cost(s, s_next): s -> s_next 이동 비용 (양수 가정)

반환: start → goal 경로(list) 또는 None
"""

# 모든 노드 저장: nodes[i] = (state, parent_idx, path_cost)
nodes = [(initial_state, None, 0.0)]

# frontier ← PATH-COST 기준 최소 힙 (원소: (cost, node_idx))
frontier = [(0.0, 0)]
heapq.heapify(frontier)

# frontier에 있는 상태의 현재 최저 비용(멤버십/비용 비교용)
frontier_costs = {initial_state: 0.0}

# explored ← 이미 확장 완료한 상태 집합
explored = set()

# 시작이 곧 목표면 바로 반환
if goal_test(initial_state):
return [initial_state]

# loop do
while frontier:
# node ← POP(frontier) /* 최소 비용 노드 */
cost, node_idx = heapq.heappop(frontier)
state, parent_idx, path_cost = nodes[node_idx]

# 힙에 남아 있는 구버전(더 비싼 버전)이면 건너뛴다
if state in frontier_costs and cost != frontier_costs[state]:
continue

# goal test (슈도코드: pop 직후 검사)
if goal_test(state):
# SOLUTION(node) → 경로 복원
path = []
i = node_idx
while i is not None:
path.append(nodes[i][0])
i = nodes[i][1]
return list(reversed(path))

# add node.STATE to explored
explored.add(state)
# frontier 목록에서 이 상태 제거(더 이상 frontier에 없음)
frontier_costs.pop(state, None)

# for each action in ACTIONS(node.STATE) do
for nxt in actions(state):
new_cost = path_cost + step_cost(state, nxt)

# child.STATE not in explored or frontier ?
in_explored = (nxt in explored)
in_frontier = (nxt in frontier_costs)

# (1) explored/ fronter 어디에도 없으면 새로 삽입
if not in_explored and not in_frontier:
nodes.append((nxt, node_idx, new_cost))
child_idx = len(nodes) - 1
heapq.heappush(frontier, (new_cost, child_idx))
frontier_costs[nxt] = new_cost

# (2) frontier에 있는데, 더 싼 경로를 찾았다면 "교체"
elif in_frontier and new_cost < frontier_costs[nxt]:
nodes.append((nxt, node_idx, new_cost))
child_idx = len(nodes) - 1
heapq.heappush(frontier, (new_cost, child_idx))
# 현재 최저비용을 갱신 → 이전 힙 항목은 나중에 팝될 때 비용불일치로 자동 무시
frontier_costs[nxt] = new_cost

# if EMPTY?(frontier) then failure
return None
  • f(n) = h(n)
  • h(n)=hSLDh(n) = h_{SLD}, where SLDSLD for the Straight-Line Distance
  • It expands the node with the lowest h(n)h(n) value at each step
from heapq import heappush, heappop

def gbfs_path(G, start, goal, heuristic):
"""
Greedy Best-First Search (GBFS)
G: 인접 리스트 dict, G[u] = 이웃들의 리스트/이터러블
heuristic(x, goal): 추정거리 h(x)
반환: start -> ... -> goal 경로(list) 또는 None
"""

# 우선순위 큐 원소: (h(state), state, path)
pq = []
heappush(pq, (heuristic(start, goal), start, [start]))

visited = set() # 이미 꺼내서 확장한 노드(재방문 방지)
in_frontier = {start} # 큐에 들어간 노드(중복 삽입 방지)

while pq:
# 휴리스틱이 가장 작은 노드를 꺼냄
_, vertex, path = heappop(pq)
in_frontier.discard(vertex)

# 이미 확장했다면 스킵
if vertex in visited:
continue
visited.add(vertex)

# 목표면 경로 반환
if vertex == goal:
return path

# 이웃을 휴리스틱 순으로 큐에 추가
for neighbor in G.get(vertex, []):
if neighbor in visited or neighbor in in_frontier:
continue
heappush(pq, (heuristic(neighbor, goal), neighbor, path + [neighbor]))
in_frontier.add(neighbor)

return None
  • f(n) = g(n) + h(n)
  • The most common informed search algorithm.
  • The tree-search version of A* is optimal if h(n) is an admissible heuristic.
  • The graph-search version is optimal if h(n) is consistent.
def astar_path(G, start, goal):
"""
Find a path from start to goal using A* Search.
G: NetworkX Graph
start: 시작 노드
goal: 목표 노드
"""
pq = PriorityQueue()
# 시작 노드를 경로 리스트와 함께 큐에 추가, f = 0
pq.push((start, [start]), 0)
visited = set()

while pq:
(vertex, path) = pq.pop()

# 이미 방문했다면 스킵
if vertex in visited:
continue
visited.add(vertex)

# 목표 도착 시 경로 반환
if vertex == goal:
return path

# 인접 노드 탐색
for neighbor in G[vertex]:
if neighbor in visited:
continue
# g(n) = 현재 경로까지의 실제 비용
g_cost = nx.path_weight(G, path + [neighbor], 'weight')
# h(n) = 휴리스틱(목표까지의 추정 비용)
h_cost = heuristic(cities[neighbor], cities[goal])
f_cost = g_cost + h_cost

pq.push((neighbor, path + [neighbor]), f_cost)

return None

Admissibility

  • h(n)h(n)h(n) \leq h^*(n)
  • Never overestimate the cost to reach the goal
  • A straight line distance between a node and the goal node is an admissible heuristic as it is always shorter than the actual distance between this node to the goal node.
  • With an admissible heuristic, A* is cost-optimal.

Consistency

  • h(n)c(n,a,n)+h(n)h(n) \leq c(n, a, n') + h(n')
  • h(n) is consistent if the estimated cost is always less than or equal to the actual cost.

Admissible vs Consistent

  • Consistent ⇒ Admissible (모든 consistent 휴리스틱은 admissible)
  • Admissible ⇏ Consistent (거꾸로는 성립 안 함)
  • The tree search version of A* is optimal if h(n) is admissible
  • The graph search version of A* is optimal if h(n) is consistent

Summary

Measure / CriteriaBFSDFSUniform CostA*
Complete?YesNoYesYes
Time complexityO(bd)O(b^d)O(bm)O(b^m)O(b1+C/ϵ)O\left(b^{1 + \lfloor C^* / \epsilon \rfloor}\right)O(bd)O(b^d)
Space complexityO(bd)O(b^d)O(bm)O(bm)O(b1+C/ϵ)O\left(b^{1 + \lfloor C^* / \epsilon \rfloor}\right)O(bd)O(b^d)
Cost optimal?YesNoYesYes
  • ϵ\epsilon is the smallest positive cost of any single step (edge) in the search problem.

FDA +002

· 8 min read

Business Intelligence

KDD

knowledge discovery in databases (KDD) refers to the comprehensive process of finding knowledge in data.

  • Learning from the application domain
  • Creating a target dataset
  • Data cleansing/pre-processing
  • Data reduction/projection
  • Choosing the function of data mining
  • Choosing the data mining algorithm
  • Data mining
  • Interpretation
  • Using discovered knowledge

CRISP-DM

The Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology provides a structured approach to planning a data mining project. As it is a cross-industry standard, it is widely used by practitioners who need a repeatable approach for data mining projects and can be used in a variety of machine learning projects.

  • Business understanding: Set up a business problem and understand what you want to accomplish from a business perspective.
  • Data understanding: Identify, collect and review the required data.
  • Data preparation: Prepare your data for modeling.
  • Modeling: Analyze possible approaches and develop the model.
  • Evaluation: Evaluate results against business needs.
  • Deployment: Deploy the model.

Statistical Analysis

Data analytics has borrowed from statistical analysis, which involves collecting data, counting, probabilities, and hypothesis testing.

The two main approaches that are relevant to data analytics are:

Descriptive Statistics

  • Purpose: Analyze past events using historical data
  • Data Source: Stored data from previous activities
  • Application: Assists companies to make informed decisions based on statistical analysis of historical patterns
  • Focus: "What happened?" - Understanding past performance and trends

Predictive Statistics

  • Purpose: Predict future events based on currently available data
  • Data Source: Present and historical data combined with analytical models
  • Application: Provides statements or predictions about events that have not yet occurred
  • Focus: "What will happen?" - Forecasting future outcomes and behaviors

Data analytics results

The presentation of data analytics results needs to be understandable by humans, easily used, and accurate on computers.

The effectiveness of different data analytics methods can be evaluated across two dimensions:

  • X-axis: Computer accuracy (how accurate the method is)
  • Y-axis: Human understandability (how easily humans can interpret the results)

Ethical Principles

Despite the proliferation of ethical principles, some are especially significant for data analytics and AI solutions and must be implemented as mandatory ethical principles.

The five core mandatory ethical principles for AI and data analytics are:

1. Transparency

The need to describe, inspect and reproduce the mechanisms through which AI systems make decisions and learn to adapt to their environment.

Key Components (EU AI HLEG):

  • Traceability: Understanding the data flow and decision path
  • Explainability: Providing reasonable explanations for AI outputs
  • Communication: Clear information sharing with stakeholders

Stakeholder Requirements:

  • Users: Understanding what the system is doing and why
  • Creators: Validation and certification of AI systems
  • Operators: Understanding processes and input data
  • Investigators: Accident investigation capabilities
  • Regulators: Investigation and compliance support
  • Legal System: Evidence and decision-making support
  • Public: Building confidence in technology

2. Fairness

A complex, multi-faceted concept ensuring AI systems do not discriminate against individuals or groups.

Types of Fairness:

  • Process Fairness: Ethical methods regardless of outcome
  • Outcome Fairness: Ensuring algorithmic outputs don't perpetuate bias

Ethical Perspectives:

  • Equity: Discretion and fairness in applying justice
  • Social Justice: Equality and solidarity in society
  • Distributive Justice: Appropriate distribution of benefits
  • Procedural Justice: Fair allocation procedures
  • Interactional Justice: Appropriate interpersonal treatment

EU AI HLEG Components:

  • Avoidance of bias
  • Accessibility and universal design
  • Stakeholder participation

3. Accountability

Clear acknowledgement and assumption of responsibility for AI actions, decisions, and impacts.

Three Types of AI Accountability:

  1. System-Level: AI's ability to explain and justify decisions
  2. Individual/Group: Determining who is responsible for AI impacts
  3. Sociotechnical: Broader system accountability for development and deployment

EU AI HLEG Components:

  • Auditability: Systems can be examined and verified
  • Impact Reporting: Minimizing and documenting negative effects
  • Trade-off Documentation: Recording decision rationales
  • Redress Ability: Mechanisms for addressing harm

4. Privacy

The right to control how personal data is collected, stored, modified, used, and exchanged.

Seven Types of Privacy (Finn et al.):

  1. Privacy of the Person: Body functions and characteristics (biometrics, genetics)
  2. Privacy of Behaviour: Sensitive activities (political, religious, sexual preferences)
  3. Privacy of Communication: Private communications protection
  4. Privacy of Data and Image: Control over personal data and images
  5. Privacy of Thoughts and Feelings: Mental privacy rights
  6. Privacy of Location and Space: Movement without tracking
  7. Privacy of Association: Freedom to associate without monitoring

Key Considerations:

  • GDPR Compliance: EU data protection regulations
  • Data Minimization: Using only necessary data
  • Consent Management: Clear user permissions
  • Data Security: Protecting against breaches

EU AI HLEG Components:

  • Respect for privacy and data protection
  • Quality and integrity of data
  • Access to data

5. Community Benefit

AI should deliver clear community or government benefits and maximize social value.

Core Requirements:

  • Public Good: Solutions must serve broader community interests
  • Benefit Maximization: Optimizing positive social impact
  • Alternative Consideration: Evaluating AI against other analysis tools
  • Default Principle: Should be standard for all AI solutions

KNIME

KNIME