Eunkwang Shin

Owner

Full Stack JavaScript Developer | Half-time Open Sourcerer.

모든 저자 보기

OOP vs Procedural Programming

OOP

a programming paradigm built around the concept of objects, which contain data and code to manipulate data.
The idea to model real-world entities and their interactions.
Global Data (fields) are enclosed in the objects.
Program components/tasks are easily divided across the development team / Requires more planning and design preparation
Easier to manage and maintain dependencies between objects / OOP programs are much larger and complex
Objects export the interface and hide the implementation and data / Tend to use more memory and GPU
Code is highly reusable and easy to scale and distribute / Making changes in one class potentially impact others, which can complicate the development of the code.

Procedural Programming

the concept of procedure calls by structuring the program around procedures. (or functions/subroutines)
a sequential manner unless directed otherwise.
Global data (elements) is exposed to all the functions.
Easier to compile and interpret / Difficult to scale or extend
Straightforward and simpler to code / Dependencies between elements are unclear and not well-structured.
Less memory requirements / Data is exposed and insecure due to its exposure across the whole program
Easy to track the program flow / Hard to divide the work among programmers in a team.

Classes

A class is a template/blueprint used to create objects

java	python
a pure OOP language	supports OOP
code must be written in classes	classes are optional
executable class must have `main()`	scripts run without including a class
Encapsulation can be enforced by declaring fields as private	fields (global variables) are public by default
Visibility is managed through access modifiers	N/A ("_" to identify private data attributes, but still accessible)

class <class-name> (<extend - superclass>):
    <variable-name> = <value> #Class fields - data members

    def __init(self, <parameters>): #class constructor - object sbuilder
        <code>

    <method-name> (self, <parameters>): #methods
        <code>

Classes Py

Keywords	Functions
`class`	`__init__()`
`self`: keyword used to refer to object properties	del: the function is used to delete an object
`pass`: keyword used to occupy no-code placement in a function	`__str__()`: The function is used to return string representation of instances
`cls`: keyword used to refer to class properties	`super()`: the function is used call a parent method in a child class

Accessors: functions (with no parameters) in a Python class that provide access to the data attributes of an object.
- known as getter methods, are named starting with the verb get, followed by the field name, which should start with an uppercase letter.
Mutators: procedures (with parameter) in a Python class that enable the developer to modify the values of object attributes.
- known as setter methods, are named starting with the verb set, followed by the field name, which should start with an uppercase letter.

def get<Variable> ():
    return self.<field>

def set<Variable> (self, value):
    self.<field> = value

Classes Java

public class Bank {
  private Customer customer;
  private String branch;

  public Bank() {
    customer = new Customer();
  }

  public Bank(String name) {
    this();
    this.branch = name;
  }

  public boolean find(Bank bank) {
    return this.branch.equals(bank.branch);
  }
}

Packages

Packages Java

used to group related classes
like folders containing files (classes)
either Java defined or user-defined
used to write maintainable and portable code and to avoid class name conflicts.

Modules Py

used to grou prelated functio nand classes together
normal Python scripts that are used into other scripts
either Python defined or user-defined
used to write maintainable and portable code to improve reusability

π0

Problem & Motivation

Achieving real-world generality in robot learning is blocked by data scarcity, generalization, and robustness limits.
Human intelligence most outpaces machines in versatility—solving diverse, physically situated tasks under constraints, language commands, and perturbations.
In NLP/CV, foundation models pre-trained on diverse multi-task data, then fine-tuned (aligned) on curated datasets, outperform narrow specialists; the same paradigm is hypothesized for robotics.

Core Proposal

A novel flow-matching architecture built on a pre-trained Vision-Language Model (VLM) to inherit Internet-scale semantics.
Further training adds robot actions, turning the model into a Vision-Language-Action (VLA) policy.
Use cross-embodiment training to combine data from many robot types (single/dual-arm, mobile), despite differing configuration/action spaces.
Employ action chunking + flow matching (diffusion variant) to model complex, continuous, high-frequency actions.
Introduce an Action Expert (separate weights for action/state tokens), akin to a Mixture-of-Experts, augmenting the standard VLM.

Training Recipe (Pre- vs Post-Training)

Pre-training on highly diverse data builds broad, general physical abilities.
Post-training on curated, task-specific data instills fluent, efficient strategies.
Rationale: high-quality-only training lacks recovery behaviors; low-quality-only training lacks efficiency/robustness; combining both yields desired behavior.

Data & Backbone

~10,000 hours of demonstrations + the OXE dataset; data spans 7 robot configurations and 68 tasks.
VLM backbone initialized from PaliGemma (3B); add ~300M parameters for the action expert (total ~3.3B).
Pre-training mixture: weighted combination of internal datasets + full OXE; n^0.43 weighting to down-weight overrepresented task-robot pairs.
Unify interfaces: zero-pad qt/at to the largest robot dimension (18); mask missing image slots; late-fusion encoders map images/states to the same token space as language.

Modeling Details

Conditional flow matching models the continuous distribution over action chunks.
Train with a diffusion-style loss on individual sequence elements (instead of cross-entropy), with separate weights for diffusion-related tokens.
Flow path uses a linear-Gaussian schedule; sample noisy actions with ε∼N(0, I); predict denoising vector field; Euler integration from τ=0→1 at inference.
Efficient inference by caching K/V for the observation prefix; action tokens recomputed per integration step.

High-Level Language Policy

Because the policy consumes language, a high-level VLM can decompose tasks (e.g., bussing) into intermediate language subgoals (SayCan-style planning), improving performance on complex, temporally extended tasks.

Evaluation Setup & Baselines

Out-of-box (direct prompting), fine-tuning on downstream tasks, and with high-level VLM providing intermediate commands.
Compare against OpenVLA (7B, autoregressive discretization; no action chunks/high-frequency control) and Octo (93M; diffusion), trained on the same mixture.
Include a compute-parity π0 (160k steps vs 700k) and a π0-small variant (no VLM init).

Key Results

Out-of-box: π0 outperforms all baselines; even compute-parity π0 beats OpenVLA/Octo; π0-small still surpasses them—highlighting the benefits of expressive architectures + diffusion/flow matching + VLM pre-training.
Language following: π0 clearly exceeds π0-small across conditions:
- π0-flat: only overall task command.
- π0-human: human-provided intermediate steps.
- π0-HL: high-level VLM-provided steps (fully autonomous).
- Better language-following accuracy directly translates into stronger autonomous performance with high-level guidance.
New dexterous tasks (e.g., bowls stacking, towel folding, microwave, drawer items, paper towel replacement):
- Fine-tuned π0 generally outperforms OpenVLA, Octo, and small-data methods ACT / Diffusion Policy.
- Pre-training helps most when tasks resemble pre-training data; pretrained π0 often beats from-scratch by up to 2×.
Complex multi-stage tasks (laundry folding, table bussing, box building, to-go box, eggs):
- π0 solves many tasks; full pre-training + fine-tuning performs best.
- Gains from pre-training are especially large on harder tasks; absolute performance varies with task difficulty and pre-training coverage.

Takeaways & Limitations

π0 mirrors LLM training: pre-train for knowledge, post-train for alignment (instruction-following and execution).
Limitations/open questions:
- Optimal composition/weighting of pre-training data remains unclear.
- Not all tasks work reliably; difficult to predict how much/what kind of data is needed for near-perfect performance.
- Uncertain positive transfer across very diverse tasks/robots and to distinct domains (e.g., driving, navigation, legged locomotion).

Ref

Black, K., Brown, N., Driess, D., Esmail, A., Equi, M., Finn, C., Fusai, N., Groom, L., Hausman, K., Ichter, B., Jakubczak, S., Jones, T., Ke, L., Levine, S., Li‑Bell, A., Mothukuri, M., Nair, S., Pertsch, K., Shi, L. X, … Zhilinsky, U. (2025, June 21). π₀: A vision‑language‑action flow model for general robot control Robotics: Science and Systems (RSS), Los Angeles, CA, United States. https://roboticsconference.org/program/papers/10/

VIMA

Unified Multimodal Prompts: Reformulates diverse robot tasks (language, images, video) into a single sequence modeling problem.
Object-Centric Tokenization: Uses object-level tokens (Mask R-CNN + ViT) instead of raw pixels, improving data efficiency and semantic generalization.
Cross-Attention Conditioning: Conditions the policy on prompts via cross-attention, maintaining strong zero-shot performance even with small models or novel tasks.

Motivation

Robot task specification comes in many forms: one-shot demonstrations, language instructions, and visual goals.
Traditionally, each task required distinct architectures and pipelines, leading to siloed systems with poor generalization.

VIMA Architecture

Key Contributions

Multimodal Prompting
- A novel formulation that unifies diverse robot manipulation tasks into a sequence modeling problem.
- Prompts are defined as interleaved sequences of text and images, enabling flexibility across task formats.
VIMA-BENCH
- A large-scale benchmark with 17 tasks across six categories (object manipulation, goal reaching, novel concept grounding, video imitation, constraint satisfaction, visual reasoning).
- Provides 650K expert trajectories and a four-level evaluation protocol for systematic generalization.
VIMA Agent
- A transformer-based visuomotor agent with encoder-decoder architecture and object-centric design.
- Encodes prompts with a pre-trained T5 model, parses images into object tokens via Mask R-CNN + ViT, and decodes actions autoregressively using cross-attention.

Design Insights

Object-Centric Representation: Passing variable-length object token sequences directly to the controller is more effective than pixel-based tokenization.
Cross-Attention Conditioning: Stronger prompt focus and efficiency compared to simple concatenation (e.g., GPT-style).
Robustness: Minimal degradation under distractors or corrupted prompts, aided by T5 backbone and object augmentation.

Results

Performance:
- Outperforms baselines (VIMA-Gato, VIMA-Flamingo, VIMA-GPT) by up to 2.9× success rate in hardest zero-shot generalization.
- With 10× less training data, still 2.7× better than best competitor.
Scaling:
- Sample-efficient: with just 1% of data, matches baselines trained with 10× more.
- Generalization holds across L1–L4 evaluation, with smaller regression than alternatives.

Conclusion

VIMA demonstrates that multimodal prompting is a powerful unifying framework for robot learning.
It achieves strong scalability, data efficiency, and generalization, establishing a solid starting point for future generalist robot agents.

Ref

Jiang, Y., Gupta, A., Zhang, Z., Wang, G., Dou, Y., Chen, Y., Fei-Fei, L., Anandkumar, A., Zhu, Y., & Fan, L. (2023). VIMA: Robot Manipulation with Multimodal Prompts Proceedings of the 40th International Conference on Machine Learning, Proceedings of Machine Learning Research. https://proceedings.mlr.press/v202/jiang23b.html

RoboFlamingo Review

2025년 8월 31일 · 약 2분

Eunkwang Shin

Owner

RoboFlamingo

RoboFlamingo decouples vision-language understanding and control, using OpenFlamingo for perception and a lightweight policy head for sequential decision-making.
Unlike prior VLM-based approaches, it requires only small-scale imitation fine-tuning on language-conditioned manipulation data, without large-scale co-fine-tuning.
This design enables data-efficient, zero-shot generalizable, and deployable robot manipulation policies on modest compute resources.

Key Idea

Proposes RoboFlamingo, a simple framework to adapt existing VLMs for robotic manipulation with lightweight fine-tuning.
Built on OpenFlamingo, decoupling vision-language understanding from decision-making.
Pre-trained VLM handles language and visual comprehension, while a dedicated policy head models sequential history.
Fine-tuned only on language-conditioned manipulation datasets using imitation learning.

Advantages

Requires only a small amount of demonstrations to adapt to downstream manipulation tasks.
Provides open-loop control capability → deployable on low-performance platforms.
Can be trained/evaluated on a single GPU server, making it a cost-effective and accessible solution.

Benchmarks

Evaluated on CALVIN benchmark (34 tasks, 1000 instruction chains).
RoboFlamingo achieves 2× performance improvements over previous state-of-the-art methods.

Performance

Imitation Learning: Outperforms all baselines across all metrics.
Zero-shot Generalization:
- Vision: Stronger generalization in ABC→D setting.
- Language: Robust to GPT-4 generated synonymous instructions.
Ablation Studies:
- Ignoring history (MLP w/o hist) gives worst results.
- LSTM and GPT-based policy heads perform best (LSTM chosen as default).
- VL pre-training is crucial for downstream manipulation.
- Larger VLMs show better data efficiency.
- Instruction fine-tuning improves both seen and unseen tasks.

Flexibility of Deployment

Supports open-loop control by predicting entire action sequences with a single inference → reduces latency and test-time compute.
Direct open-loop use without retraining can degrade performance; mitigated with jump-step demonstrations.

Conclusion

Demonstrates that pre-trained VLMs enable data efficiency and strong zero-shot generalization in robotic manipulation.
RoboFlamingo is presented as an intuitive, efficient, and open solution, with high potential when combined with large-scale real robot data.

Ref

Li, X., Liu, M., Zhang, H., Yu, C., Xu, J., Wu, H., Cheang, C., Jing, Y., Zhang, W., & Liu, H. (2024). Vision-language foundation models as effective robot imitators. International Conference on Learning Representations (ICLR 2024), Vienna, Austria.

OpenVLA Review

2025년 8월 29일 · 약 3분

Eunkwang Shin

Owner

OpenVLA

OpenVLA is a 7B open-source VLA model built on Llama2 + DINOv2 + SigLIP, trained on 970k demos, achieving stronger generalization and robustness than closed RT-2-X (55B) and outperforming Diffusion Policy.
It introduces efficient adaptation via LoRA (1.4% params, 8× compute reduction) and 4-bit quantization (half memory, same accuracy), enabling fine-tuning and inference on consumer GPUs.
Limitations remain (single-image input, <90% reliability, limited throughput), but OpenVLA provides the first open, scalable framework for generalist robot policies.

OpenVLA Architecture

Motivation

Training robot policies from scratch struggles with robustness and generalization.
Fine-tuning vision-language-action (VLA) models offers reusable, generalizable visuomotor policies.
Barriers: prior VLAs are closed-source, lack best practices for adaptation, and need server-class hardware.

Model & Training

OpenVLA: 7B parameters, open-source.
Built on Llama 2 with fused DINOv2 + SigLIP vision encoders.
Trained on 970k robot demonstrations from Open-X Embodiment dataset.
Represents robot actions as tokens (discretized into 256 bins, replacing unused Llama tokens).
Standard next-token prediction objective.

Architecture & Approach

End-to-end fine-tuning of VLM to generate robot actions as tokens.
Differs from modular methods (e.g., Octo) that stitch separate encoders/decoders.
Vision features are obtained by encoding the same input image with both SigLIP and DINOv2, then channel-wise concatenated and passed through an MLP projector. This preserves SigLIP’s semantic alignment with language and DINOv2's spatial reasoning, giving the VLM richer multimodal context for manipulation tasks.
Uses Prismatic VLM backbone with multi-resolution features (spatial reasoning + semantics).

Performance

Outperforms closed RT-2-X (55B) by +16.5% task success with 7× fewer parameters.
Beats Diffusion Policy (from-scratch imitation learning) by +20.4% on multi-task language-grounded settings.
Demonstrates robust behaviors (distractor resistance, error recovery).

Efficiency

Introduces parameter-efficient fine-tuning:
- LoRA updates only 1.4% of parameters yet matches full fine-tuning.
- Can fine-tune on a single A100 GPU in ~10–15 hours (8× compute reduction).
Quantization:
- 4-bit inference matches bfloat16 accuracy while halving memory footprint.
- Runs at 3Hz on consumer GPUs (e.g., A5000, 16GB).

Evaluations

Tested across 29 tasks and multiple robots (WidowX, Google robot, Franka).
Strong generalization on:
- Visual (unseen backgrounds/distractors).
- Motion (new object positions/orientations).
- Physical (new object shapes/sizes).
- Semantic (unseen tasks, instructions).
First generalist open-source VLA achieving ≥50% success rate across all tested tasks.

Design Insights

Fine-tuning the vision encoder (vs. freezing) crucial for robotic control.
Higher image resolution (384px vs. 224px) adds 3× compute without performance gains.
Training required 27 epochs, far more than typical VLM runs, to surpass 95% action token accuracy.

Limitations & Future Work

Supports only single-image observations (no proprioception, no history).
Inference throughput (~6Hz on RTX 4090) insufficient for high-frequency control (e.g., ALOHA at 50Hz).
Success rates remain below 90% in challenging tasks.
Open questions:
- Impact of base VLM size on performance.
- Benefits of co-training with Internet-scale data.
- Best visual features for VLAs.

Contributions

First open-source generalist VLA with strong performance.
Scalable end-to-end training pipeline (action-as-token).
Demonstrates LoRA + quantization for consumer-grade GPU adaptation.
Provides code, checkpoints, and data curation recipes to support future research.

Ref

Kim, M. J., Pertsch, K., Karamcheti, S., Xiao, T., Balakrishna, A., Nair, S., Rafailov, R., Foster, E. P., Sanketi, P. R., Vuong, Q., Kollar, T., Burchfiel, B., Tedrake, R., Sadigh, D., Levine, S., Liang, P., & Finn, C. (2025). OpenVLA: An Open-Source Vision-Language-Action Model Proceedings of The 8th Conference on Robot Learning, Proceedings of Machine Learning Research. https://proceedings.mlr.press/v270/kim25c.html

데이터 시각화 의사 결정 트리

2025년 8월 28일 · 약 3분

Eunkwang Shin

Owner

Color Legend

Icon	Category	Description	Example
🟡	Distribution	분포를 보여주고 싶을 때	Histogram, Density plot
⚫	Correlation	상관관계를 보여주고 싶을 때	Scatterplot, Correlogram
🟢	Ranking	순위를 보여주고 싶을 때	Bar chart, Lollipop chart
🔴	Part of a whole	전체 중 일부를 보여주고 싶을 때	Pie chart, Treemap
🔵	Evolution	시간에 따른 변화를 보여주고 싶을 때	Line chart, Area chart
🟣	Maps	지도를 활용해서 공간적 정보를 보여줄 때	Choropleth map, Bubble Map
🟤	Flow	흐름(흐름도, 이동 경로 등)을 보여줄 때	Flow map, Sankey-like

Categoric

One Variable
- ⚫ Waffle
- 🟢 Bar Plot
- 🟢 Lollipop
- 🟢 Word Cloud
- 🔴 Circular Packing
- 🔴 Doughnut
- 🔴 Pie
- 🔴 Treemap
Two or More Variables
- Two Independent Lists
  - 🔴 Venn Diagram
- Nested
  - 🟢 Bar Plot
  - 🔴 Circular Packing
  - 🔴 Dendrogram
  - 🔴 Sunburst
  - 🔴 Treemap
- Subgroup
  - ⚫ Grouped Scatter
  - ⚫ Heatmap
  - 🟢 Lollipop
  - 🟢 Parallel Plot
  - Spider
  - 🔴 Grouped Bar Plot
  - 🔴 Grouped Bar Plot
  - 🟤 Sankey Diagram
- Adjacency
  - 🟤 Arc
  - 🟤 Chord
  - 🟤 Network
  - 🟤 Sankey
  - ⚫ Heatmap

Relational

Network
- ⚫ Heatmap
- 🟢 Hive
- 🟤 Arc
- 🟤 Chord
- 🟤 Network
- 🟤 Sankey
Nested
- No Value
  - 🔴 Circular Packing
  - 🔴 Dendrogram
  - 🔴 Sunburst
  - 🔴 Treemap
  - 🟤 Sankey
- Value for Leaf
  - 🔴 Circular Packing
  - 🔴 Dendrogram
  - 🔴 Sunburst
  - 🔴 Treemap
  - 🟤 Sankey
- Value for Edges
  - 🔴 Dendrogram
  - 🟤 Chord
  - 🟤 Sankey
- Value for Connection
  - Edge Bundling

Map

🟣 Bubble Map
🟣 Choropleth
🟣 Connected Map
🟣 Map
🟣 Map Hexbin

Time Series

One Series
- 🟡 Box Plot
- 🟡 Violin
- 🟡 Ridge Line
- 🔵 Area
- 🔵 Line Plot
- 🟢 Bar Plot
- 🟢 Lollipop
Several Series
- 🟡 Box Plot
- 🟡 Violin
- 🟡 Ridge Line
- ⚫ Heatmap
- 🔵 Line Plot
- 🔵 Stacked Area
- 🔵 Stream Graph

Categoric and Numeric

One Numeric + One Categoric
- One Observation, per Group
  - ⚫ Waffle
  - 🟢 Bar Plot
  - 🟢 Lollipop
  - 🟢 Word Cloud
  - 🔴 Circular Packing
  - 🔴 Doughnut
  - 🔴 Pie
  - 🔴 Treemap
- Several Observations, per Group
  - 🟡 Box Plot
  - 🟡 Violin
  - 🟡 Ridge Line
  - 🟡 Density
  - 🟡 Histogram
One Category, Several Numeric
- No Order
  - 🟡 Box Plot
  - 🟡 Violin
  - ⚫ Grouped Scatter
  - ⚫ 2D Density
  - ⚫ PCA
  - ⚫ Correlogram
- A Numeric is Ordered
  - ⚫ Connected Scatter
  - 🔵 Area
  - 🔵 Line Plot
  - 🔵 Stacked Area
  - 🔵 Stream Graph
- One Value Per Group
  - ⚫ Grouped Scatter
  - ⚫ Heatmap
  - 🟢 Lollipop
  - 🟢 Parallel Plot
  - 🟢 Spider Plot
  - 🔴 Grouped Bar Plot
  - 🔴 Grouped Bar Plot
  - 🟤 Sankey Diagram
Several Categories, One Numeric
- Subgroup
  - One Observation. per Group
    - ⚫ Grouped Scatter
    - ⚫ Heatmap
    - 🟢 Lollipop
    - 🟢 Parallel Plot
    - 🟢 Spider Plot
    - 🔴 Grouped Bar Plot
    - 🔴 Grouped Bar Plot
    - 🟤 Sankey Diagram
  - Several Observations, per Group
    - 🟡 Box Plot
    - 🟡 Violin
- Nested
  - One Observation. per Group
    - 🟢 Bar Plot
    - 🔴 Circular Packing
    - 🔴 Dendrogram
    - 🔴 Sunburst
    - 🔴 Treemap
  - Several Observations. per Group
    - 🟡 Box Plot
    - 🟡 Violin
- Adjacency
  - ⚫ Heatmap
  - 🟤 Arc
  - 🟤 Chord
  - 🟤 Network
  - 🟤 Sankey

Numeric

One Numeric Variable
- 🟡 Density
- 🟡 Histogram
Two Numeric Variables
- Not Ordered
  - Few Points
    - 🟡 Box Plot
    - 🟡 Histogram
    - ⚫ Scatter Plot
  - Many Points
    - 🟡 Density
    - 🟡 Violin
    - ⚫ 2D Density
    - 🔵 Marginal Distribution
- Ordered
  - ⚫ Connected Scatter
  - 🔵 Area Plot
  - 🔵 Line Plot
Three Numeric Variables
- Not Ordered
  - 🟡 Box Plot
  - 🟡 Violin
  - ⚫ Bubble Plot
  - ⚫ 3d Scatter or Surface
- Ordered
  - 🔵 Area
  - 🔵 Line Plot
  - 🔵 Stacked Area
  - 🔵 Stream Graph
Several Numeric Variables
- Ordered
  - 🔵 Area
  - 🔵 Line Plot
  - 🔵 Stacked Area
  - 🔵 Stream Graph
- Not Ordered
  - 🟡 Box Plot
  - 🟡 Ridge Line
  - 🟡 Violin
  - ⚫ Correlogram
  - ⚫ Heatmap
  - ⚫ PCA
  - 🔴 Dendrogram

Ref

from Data to Viz

Vocabulary for AI +006

2025년 8월 28일 · 약 2분

Eunkwang Shin

Owner

Vocabulary & Expressions

Term/Expression	Definition	Simpler Paraphrase	Meaning
prevalence	The state of being widespread or common	Commonness	유행, 널리 퍼짐
instantiation	The act of creating a specific instance of something	Creation of a specific example	구체적인 값의 생성
triviality	The quality of being trivial or unimportant	Unimportance	사소함, 하찮음
intermediary	A person or thing that acts as a link between two others	Middleman	중개자, 매개체
dreaded	Regarded with great fear or apprehension	Feared	두려운, 걱정되는
i.i.d.	Independent and identically distributed	Same distribution, no dependence	독립적이고 동일한 분포
posteriori	Relating to knowledge gained through experience or empirical evidence	Based on observation	경험적, 관찰에 기초한
posterior	Relating to the back or rear	Back	뒤쪽의, 후방의
resemblance	The state of resembling or being alike	Similarity	유사성, 닮음
stipulate	To demand or specify a requirement	Specify	규정하다, 명시하다
rectify	To correct or make right	Correct	수정하다, 바로잡다
schematic	Relating to a diagram or representation	Diagrammatic	도식적인, 다이어그램의
proposition	A statement or assertion that expresses a judgment or opinion	Proposal	제안, 명제
cavity	A cavity is a hollow place in a tooth caused by decay	Tooth decay	충치
tautological	Relating to or involving tautology (the saying of the same thing twice in different words)	Redundant	동의어 반복의, 중복적인
retrospectively	Looking back on or dealing with past events or situations	Looking back	회고적으로, 과거를 돌아보며
perturbations	Disturbances or deviations from a normal state	Disturbances	교란, 변동
deformable	Capable of being changed in shape or form	Changeable	변형 가능한
Consolidation	The process of combining multiple elements into a single, more effective whole	Integration	통합
oscillation	Fluctuation or variation in a state or condition	Fluctuation	진동, 변동
homogeneous	Of the same kind; alike	Uniform	동질의, 균일한
nonstationary	Not stationary; changing over time	Changing	비정상적인, 시간에 따라 변하는
whereupon	Immediately after which	After which	그 후에, 그 다음에
magnitude	The great size or extent of something	Size	크기, 규모
maneuver	A movement or series of moves requiring skill and care	Move	조작, 움직임

Developing ML Systems

2025년 8월 28일 · 약 5분

Eunkwang Shin

Owner

Problem formulation (문제 정의)

The first step is to figure out what problem you want to solve.
1. “사용자에게 어떤 문제를 해결해주고 싶은가?” → 모호하지 않고 구체적으로 정의해야 함.
2. “그 문제 중 어떤 부분을 머신러닝으로 풀 수 있는가?” → 예: 사진을 라벨로 매핑하는 함수 학습.
이를 구체화하려면 ML 컴포넌트에 대해 loss function 을 지정해야 한다.
문제를 쪼개보면 일부는 전통적 SW 엔지니어링으로 해결 가능하고, 일부만 ML로 다뤄야 할 수 있다.
학습 유형은 지도·비지도·강화·준지도(semisupervised)까지 연속선상에 있음.
- Semisupervised learning: 일부 라벨만 활용해 비라벨 데이터에서 더 많은 정보 추출.
- Weakly supervised learning: 부정확·노이즈 라벨을 사용.
결론: Noise와 label 부족은 “지도 ↔ 비지도” 사이의 연속체를 형성한다.

Data collection & management (데이터 수집/관리)

데이터는 직접 제작, 크라우드소싱, 사용자 행동에서 수집 가능.
부족할 때는 transfer learning 활용.
Privacy 검토와 동의, 공정성, federated learning 등 고려 필요.
Data provenance(출처 관리): 데이터 정의, 값의 범위, 생성 주체, 중단 여부, 정의 변경 이력 등 추적 → 파이프라인 안정성이 알고리즘보다 중요.
항상 자문: “이 데이터는 내 문제를 풀기에 적절한가? 입력과 출력 모두 충분히 담고 있는가?”
Learning curve 로 데이터 확장 효과/학습 plateau 확인.
방어적 태도 필요: 입력 오류, 누락, 적대적 사용자, 철자 불일치 등 처리.
Data augmentation (회전, 이동, 노이즈 추가 등)으로 모델 강건성 향상.
불균형 데이터는 undersampling, oversampling, SMOTE/ADASYN, boosting 등으로 완화.
아웃라이어는 로그 변환 등으로 영향 축소, 트리 모델은 상대적으로 강건.

Feature engineering (특징 엔지니어링)

Quantization: 연속값을 구간(bin)으로 강제.
One-hot encoding: 범주형 속성을 다중 Boolean으로 변환.
도메인 지식 기반 새 특성 추가 (예: 날짜 → 주말/공휴일 여부).
“At the end of the day, some ML projects succeed and some fail… the most important factor is the features used.” (Pedro Domingos)

Exploratory data analysis (EDA) & visualization

목표: 예측/검증이 아닌 데이터 이해.
Histograms, scatter plots 로 분포/결측/오류/이상치 확인.
클러스터링 → 프로토타입 시각화, 이상치 탐지 (“고양이 vs 사자 옷 입은 고양이”).
차원 축소 (예: t-SNE)로 고차 데이터를 2D/3D로 시각화.

Model selection & training

데이터가 정리되면 모델 구축 단계.
Random forests → 범주형 특징 많고 일부 무관할 때.
Nonparametric methods → 데이터 많고 지식 부족, 특징 선택 고민 줄이고 싶을 때.
Logistic regression → 선형 분리 가능(또는 feature engineering 후).
SVM → 데이터 크기 작고 차원 높을 때.
Deep neural nets → 패턴 인식(이미지·음성).
하이퍼파라미터는 경험 + 탐색으로 조율.
검증 데이터 남용 시 validation overfitting 위험 → 여러 검증셋 필요.
성능 평가: ROC curve, AUC, confusion matrix.
중요한 건 아이디어–실험–검증 반복 사이클을 빠르게 하는 것.

Trust, interpretability, explainability

단순히 지표 성능만으로는 신뢰 부족 → 규제·언론·사용자도 신뢰성 원함.
Accountability: 오류 발생 시 책임 주체와 항소 절차 필요.
Interpretability: 모델 내부를 직접 이해 (트리, 선형회귀).
- 핵심 질문: “If I change x, how will the output change?”
Explainability: 블랙박스 모델 + 별도 모듈로 설명 (예: LIME).
단순 설명이 잘못된 확신을 줄 수 있음. → 테스트와 실제 성능이 더 큰 신뢰를 준다.
“안전하다고 설명만 있는 실험기 vs 100회 무사비행한 비행기” 비유.

Operation, monitoring, maintenance

운영 단계에서는 롱테일 입력(long tail) 문제 등장 → 예상 못한 입력 지속 발생. → 실시간 모니터링과 사람 평가자 필요.
Nonstationarity: 세상과 사용자 행동 변화 → 최신 데이터 vs 안정적 모델 트레이드오프.
신선도 요구 다름: 어떤 문제는 매일/매시간 새 모델, 어떤 문제는 수개월 동일 모델.
배포 자동화 → 작은 변경은 자동 승인, 큰 변경은 리뷰.
Online vs Offline model: 기존 모델 점진적 수정 vs 매번 처음부터 재학습.
데이터 자체가 바뀔 수도 있음 (스팸 이메일 → 스팸 문자, 음성, 영상 등).

Checklist

Tests for Features and Data

Feature expectations are captured in a schema.
All features are beneficial.
No feature’s cost is too much.
Features adhere to meta-level requirements.
The data pipeline has appropriate privacy controls.
New features can be added quickly.
All input feature code is tested.

Tests for Model Development

Every model specification undergoes a code review.
Every model is checked in to a repository.
Offline proxy metrics correlate with actual metrics.
All hyperparameters have been tuned.
The impact of model staleness is known.
A simpler model is not better.
Model quality is sufficient on all important data slices.
The model has been tested for considerations of inclusion.

Tests for Machine Learning Infrastructure

Training is reproducible.
Model specification code is unit tested.
The full ML pipeline is integration tested.
Model quality is validated before attempting to serve it.
The model allows debugging by observing the step-by-step computation of training or inference on a single example.
Models are tested via a canary process before they enter production serving environments.
Models can be quickly and safely rolled back to a previous serving version.

Monitoring Tests for Machine Learning

Dependency changes result in notification.
Data invariants hold in training and serving inputs.
Training and serving features compute the same values.
Models are not too stale.
The model is numerically stable.
The model has not experienced regressions in training speed, serving latency, throughput, or RAM usage.
The model has not experienced a regression in prediction quality on served data.

Ref

Breck, E., Cai, S., Nielsen, E., Salib, M., & Sculley, D. (2016). What’s your ML test score? A rubric for ML production systems. NIPS Workshop on Reliable Machine Learning in the Wild.

Nonparametric Models

2025년 8월 28일 · 약 2분

Eunkwang Shin

Owner

Nearest-neighbor Models

쿼리점 $x_q$ $x_{q}$ 에 대해 가장 가까운 $k$ $k$ 개의 이웃을 찾아 분류 또는 회귀에 사용한다.
- 분류: 다수결
- 회귀: 평균, 중앙값, 혹은 국소적 선형회귀
거리 척도: Minkowski 거리
- $L_p(x_j, x_q) = \left( \sum_i |x_{j,i} - x_{q,i}|^p \right)^{1/p}$
- $p=2$ → 유클리드 거리
- $p=1$ → 맨해튼 거리
- 불리언 속성 → 해밍 거리
- 공분산 고려 → 마할라노비스 거리
차원의 저주 (curse of dimensionality):
- 평균 이웃 부피: $\ell^n = k/N \;\;\Rightarrow\;\; \ell = (k/N)^{1/n}$
- $n$ 이 커질수록 $\ell$ 값이 커져 이웃이 “멀어진다”.
- 대부분의 점은 고차원 공간에서 경계(껍질)에 몰린다.
- 저차원: 보간(interpolation) 가능
- 고차원: 외삽(extrapolation)이 많아져 일반화 어려움

k-d trees

데이터를 차원별로 분할해 만든 이진 트리.
각 노드에서 특정 차원의 중앙값 $m$ 을 기준으로 $x_i \le m$ 여부에 따라 좌/우로 분할한다.
탐색: 쿼리점 기준으로 한쪽 브랜치로 내려가며 후보를 찾되, 경계와 가까우면 반대편 서브트리도 확인해야 한다.
효율 조건: 데이터 수가 차원 수보다 훨씬 많아야 하며, 최소 $2^n$ 개 이상 필요하다.
실용 범위:
- 약 10차원 이하에서는 수천 개 데이터
- 약 20차원 이하에서는 수백만 개 데이터

Support Vector Machines (SVM)

최대 마진 분리자(maximum margin separator)를 찾는다.
목표: 경험적 손실 최소화 대신 일반화 손실 최소화
결정 경계: $\{x : w \cdot x + b = 0\}$
학습은 이차계획법(QP) 최적화 문제로 정식화된다.
- 이중 표현(dual form):
  $\arg\max_\alpha \sum_j \alpha_j - \tfrac{1}{2} \sum_{j,k} \alpha_j \alpha_k y_j y_k (x_j \cdot x_k)$
- 제약조건: $\alpha_j \ge 0,\; \sum_j \alpha_j y_j = 0$
최적 해에서 대부분 $\alpha_j = 0$ 이고, 경계 근처의 점들(서포트 벡터)만 $\alpha_j > 0$ 이다.
예측 함수:
$h(x) = \text{sign}\Big(\sum_j \alpha_j y_j (x \cdot x_j) - b \Big)$
장점:
- 서포트 벡터만 유지하면 되므로 효율적
- 비모수적 유연성 + 모수적 안정성(과적합 억제)

The Kernel Trick

커널 트릭: 실제 고차원 특징 공간 $F(x)$ $F (x)$ 를 계산하지 않고, 내적만을 커널 함수로 대체한다.
- $K(x,z) = F(x)\cdot F(z)$
대표 커널 함수:
- 다항 커널: $K(x,z) = (1 + x \cdot z)^d$
- 가우시안 커널 (RBF): $K(x,z) = e^{-\gamma \|x-z\|^2}$
소프트 마진 분류기: 일부 오분류 허용, 오분류된 점을 올바른 쪽으로 이동시키는 거리만큼 패널티를 부여한다.
커널 기법은 내적에만 의존하는 다른 알고리즘에도 적용 가능하다.
Mercer's theorem: “합리적인” 커널 함수는 항상 어떤 특징 공간에서의 내적에 해당한다.

Logistic regression

2025년 8월 28일 · 약 3분

Eunkwang Shin

Owner

단변량 선형 회귀 (Univariate Linear Regression)

입력이 하나 $x$ 인 경우, 가설: $h(x) = w_1x + w_0$
손실 함수: 제곱 오차 (Squared Error)
경사 하강법으로 최적의 $(w_0, w_1)$ $(w_{0}, w_{1})$ 찾기
- $w_0 \leftarrow w_0 + \alpha (y - h(x))$
- $w_1 \leftarrow w_1 + \alpha (y - h(x)) \cdot x$
손실 함수가 볼록(Convex) → 전역 최소값(Global Minimum) 보장

배치 / 확률적 경사 하강법 (Batch vs SGD)

배치 경사 하강법(Batch GD): 모든 데이터 사용 → 정확하지만 느림, 대규모 데이터 비효율적
SGD(Stochastic GD): 무작위 예시 하나(또는 작은 minibatch)만으로 업데이트 → 빠르고 효율적
미니배치(Minibatch): 속도 + 안정성 균형 가능
학습률 $\alpha$ 감소 스케줄 → 수렴 보장

다변량 선형 회귀 (Multivariable Linear Regression)

입력이 $n$ 차원인 경우, 가설: $h(x) = w \cdot x = \sum_i w_i x_i$
정규 방정식 (Normal Equation): $w^* = (X^TX)^{-1}X^Ty$
$(X^TX)^{-1}X^T$ = 유사역행렬(Pseudoinverse)
고차원에서는 과적합 위험이 크므로 정규화 필요

정규화 (Regularization)

비용 함수: $Cost(h) = Loss(h) + \lambda \cdot Complexity(h)$
복잡도 함수: $Complexity(h_w) = \sum_i |w_i|^q$
$q = 1$ → L1 정규화 (희소 모델, 많은 $w_i = 0$ )
$q = 2$ → L2 정규화 (가중치 제곱합 최소화)
L1 → 회전 불변성 없음 (축이 중요한 경우 적합)
L2 → 회전 불변성 있음 (축이 임의적일 때 적합)

퍼셉트론 학습 규칙 (Perceptron Learning Rule)

선형 함수 + Hard Threshold → 선형 분류기
가중치 업데이트: $w_i \leftarrow w_i + \alpha (y - h(x)) \cdot x_i$
선형 분리 가능(linearly separable) → 완벽한 분리자로 수렴
분리 불가능한 경우 → 수렴 보장 없음, $\alpha$ 스케줄 필요

로지스틱 회귀 (Logistic Regression)

Hard Threshold 문제
- 불연속, 미분 불가능 → 학습 불안정
- 항상 0 또는 1 확정 예측 → 경계 근처 비효율적
해결책: 로지스틱 함수 $g(z) = \frac{1}{1 + e^{-z}}$
가설: $h_w(x) = g(w \cdot x) = \frac{1}{1 + e^{-w \cdot x}}$
출력 $\in (0,1)$ → 확률로 해석 가능, soft boundary 형성
경계 중앙에서 0.5, 멀어질수록 0 또는 1에 가까움

로지스틱 함수의 도함수 성질

로지스틱 함수: $g(z) = \frac{1}{1+e^{-z}}$
미분: $g'(z) = \frac{e^{-z}}{(1+e^{-z})^2}$
$1 - g(z) = \frac{e^{-z}}{1+e^{-z}}$
따라서 $g(z)(1-g(z)) = \frac{e^{-z}}{(1+e^{-z})^2}$
결론: $g'(z) = g(z)(1-g(z))$

로지스틱 회귀 가중치 업데이트 유도 과정

손실 함수: $Loss(w) = (y - h_w(x))^2$
$\frac{\partial}{\partial w_i} Loss(w) = \frac{\partial}{\partial w_i}(y - h_w(x))^2$
$= 2(y - h_w(x)) \cdot \frac{\partial}{\partial w_i}(y - h_w(x))$
$= -2(y - h_w(x)) \cdot \frac{\partial}{\partial w_i} h_w(x)$
$h_w(x) = g(w \cdot x)$ 이므로 $\frac{\partial}{\partial w_i} h_w(x) = g'(w \cdot x) \cdot x_i$
$g'(w \cdot x) = h_w(x)(1-h_w(x))$
최종: $\frac{\partial}{\partial w_i} Loss(w) = -2(y - h_w(x)) \cdot h_w(x)(1-h_w(x)) \cdot x_i$
경사 하강법 업데이트:
$w_i \leftarrow w_i - \alpha \cdot \frac{\partial}{\partial w_i} Loss(w)$
따라서:
$w_i \leftarrow w_i + \alpha (y - h_w(x)) \cdot h_w(x)(1-h_w(x)) \cdot x_i$

결론

발전 흐름: 선형 회귀 → 경사 하강법 → 다변량 확장 → 정규화 → 퍼셉트론 → 로지스틱 회귀
L1 vs L2 정규화
- L1: 희소 모델 (축 중요)
- L2: 회전 불변 (축 임의적)
퍼셉트론: 선형 분리 가능할 때만 완벽 동작
로지스틱 회귀: soft boundary 제공 → 확률적 예측 + 현실 데이터에 강함

OOP vs Procedural Programming​

OOP​

Procedural Programming​

Classes​

Classes Py​

Classes Java​

Packages​

Packages Java​

Modules Py​

π0​

Problem & Motivation​

Core Proposal​

Training Recipe (Pre- vs Post-Training)​

Data & Backbone​

Modeling Details​

High-Level Language Policy​

Evaluation Setup & Baselines​

Key Results​

Takeaways & Limitations​

Ref​

VIMA​

Motivation​

Key Contributions​

Design Insights​

Results​

Conclusion​

Ref​

RoboFlamingo​

Key Idea​

Advantages​

Benchmarks​

Performance​

Flexibility of Deployment​

Conclusion​

Ref​

OpenVLA​

Motivation​

Model & Training​

Architecture & Approach​

Performance​

Efficiency​

Evaluations​

Design Insights​

Limitations & Future Work​

Contributions​

Ref​

Color Legend​

Categoric​

Relational​

Map​

Time Series​

Categoric and Numeric​

Numeric​

Ref​

Vocabulary & Expressions​

Problem formulation (문제 정의)​

Data collection & management (데이터 수집/관리)​

Feature engineering (특징 엔지니어링)​

Exploratory data analysis (EDA) & visualization​

Model selection & training​

Trust, interpretability, explainability​

Operation, monitoring, maintenance​

Checklist​

Tests for Features and Data​

Tests for Model Development​

Tests for Machine Learning Infrastructure​

Monitoring Tests for Machine Learning​

Ref​

Nearest-neighbor Models​

k-d trees​

Support Vector Machines (SVM)​

The Kernel Trick​

단변량 선형 회귀 (Univariate Linear Regression)​

배치 / 확률적 경사 하강법 (Batch vs SGD)​

다변량 선형 회귀 (Multivariable Linear Regression)​

정규화 (Regularization)​

퍼셉트론 학습 규칙 (Perceptron Learning Rule)​

로지스틱 회귀 (Logistic Regression)​

로지스틱 함수의 도함수 성질​

로지스틱 회귀 가중치 업데이트 유도 과정​

OOP vs Procedural Programming

OOP

Procedural Programming

Classes

Classes Py

Classes Java

Packages

Packages Java

Modules Py

π0

Problem & Motivation

Core Proposal

Training Recipe (Pre- vs Post-Training)

Data & Backbone

Modeling Details

High-Level Language Policy

Evaluation Setup & Baselines

Key Results

Takeaways & Limitations

Ref

VIMA

Motivation

Key Contributions

Design Insights

Results

Conclusion

Ref

RoboFlamingo

Key Idea

Advantages

Benchmarks

Performance

Flexibility of Deployment

Conclusion

Ref

OpenVLA

Motivation

Model & Training

Architecture & Approach

Performance

Efficiency

Evaluations

Design Insights

Limitations & Future Work

Contributions

Ref

Color Legend

Categoric

Relational

Map

Time Series

Categoric and Numeric

Numeric

Ref

Vocabulary & Expressions

Problem formulation (문제 정의)

Data collection & management (데이터 수집/관리)

Feature engineering (특징 엔지니어링)

Exploratory data analysis (EDA) & visualization

Model selection & training

Trust, interpretability, explainability

Operation, monitoring, maintenance

Checklist

Tests for Features and Data

Tests for Model Development

Tests for Machine Learning Infrastructure

Monitoring Tests for Machine Learning

Ref

Nearest-neighbor Models

k-d trees

Support Vector Machines (SVM)

The Kernel Trick

단변량 선형 회귀 (Univariate Linear Regression)

배치 / 확률적 경사 하강법 (Batch vs SGD)

다변량 선형 회귀 (Multivariable Linear Regression)

정규화 (Regularization)

퍼셉트론 학습 규칙 (Perceptron Learning Rule)

로지스틱 회귀 (Logistic Regression)

로지스틱 함수의 도함수 성질

로지스틱 회귀 가중치 업데이트 유도 과정