FDA +002

2025년 8월 6일 · 약 8분

Eunkwang Shin

Owner

Business Intelligence

KDD

knowledge discovery in databases (KDD) refers to the comprehensive process of finding knowledge in data.

Learning from the application domain
Creating a target dataset
Data cleansing/pre-processing
Data reduction/projection
Choosing the function of data mining
Choosing the data mining algorithm
Data mining
Interpretation
Using discovered knowledge

CRISP-DM

The Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology provides a structured approach to planning a data mining project. As it is a cross-industry standard, it is widely used by practitioners who need a repeatable approach for data mining projects and can be used in a variety of machine learning projects.

Business understanding: Set up a business problem and understand what you want to accomplish from a business perspective.
Data understanding: Identify, collect and review the required data.
Data preparation: Prepare your data for modeling.
Modeling: Analyze possible approaches and develop the model.
Evaluation: Evaluate results against business needs.
Deployment: Deploy the model.

Statistical Analysis

Data analytics has borrowed from statistical analysis, which involves collecting data, counting, probabilities, and hypothesis testing.

The two main approaches that are relevant to data analytics are:

Descriptive Statistics

Purpose: Analyze past events using historical data
Data Source: Stored data from previous activities
Application: Assists companies to make informed decisions based on statistical analysis of historical patterns
Focus: "What happened?" - Understanding past performance and trends

Predictive Statistics

Purpose: Predict future events based on currently available data
Data Source: Present and historical data combined with analytical models
Application: Provides statements or predictions about events that have not yet occurred
Focus: "What will happen?" - Forecasting future outcomes and behaviors

Data analytics results

The presentation of data analytics results needs to be understandable by humans, easily used, and accurate on computers.

The effectiveness of different data analytics methods can be evaluated across two dimensions:

X-axis: Computer accuracy (how accurate the method is)
Y-axis: Human understandability (how easily humans can interpret the results)

Ethical Principles

Despite the proliferation of ethical principles, some are especially significant for data analytics and AI solutions and must be implemented as mandatory ethical principles.

The five core mandatory ethical principles for AI and data analytics are:

1. Transparency

The need to describe, inspect and reproduce the mechanisms through which AI systems make decisions and learn to adapt to their environment.

Key Components (EU AI HLEG):

Traceability: Understanding the data flow and decision path
Explainability: Providing reasonable explanations for AI outputs
Communication: Clear information sharing with stakeholders

Stakeholder Requirements:

Users: Understanding what the system is doing and why
Creators: Validation and certification of AI systems
Operators: Understanding processes and input data
Investigators: Accident investigation capabilities
Regulators: Investigation and compliance support
Legal System: Evidence and decision-making support
Public: Building confidence in technology

2. Fairness

A complex, multi-faceted concept ensuring AI systems do not discriminate against individuals or groups.

Types of Fairness:

Process Fairness: Ethical methods regardless of outcome
Outcome Fairness: Ensuring algorithmic outputs don't perpetuate bias

Ethical Perspectives:

Equity: Discretion and fairness in applying justice
Social Justice: Equality and solidarity in society
Distributive Justice: Appropriate distribution of benefits
Procedural Justice: Fair allocation procedures
Interactional Justice: Appropriate interpersonal treatment

EU AI HLEG Components:

Avoidance of bias
Accessibility and universal design
Stakeholder participation

3. Accountability

Clear acknowledgement and assumption of responsibility for AI actions, decisions, and impacts.

Three Types of AI Accountability:

System-Level: AI's ability to explain and justify decisions
Individual/Group: Determining who is responsible for AI impacts
Sociotechnical: Broader system accountability for development and deployment

EU AI HLEG Components:

Auditability: Systems can be examined and verified
Impact Reporting: Minimizing and documenting negative effects
Trade-off Documentation: Recording decision rationales
Redress Ability: Mechanisms for addressing harm

4. Privacy

The right to control how personal data is collected, stored, modified, used, and exchanged.

Seven Types of Privacy (Finn et al.):

Privacy of the Person: Body functions and characteristics (biometrics, genetics)
Privacy of Behaviour: Sensitive activities (political, religious, sexual preferences)
Privacy of Communication: Private communications protection
Privacy of Data and Image: Control over personal data and images
Privacy of Thoughts and Feelings: Mental privacy rights
Privacy of Location and Space: Movement without tracking
Privacy of Association: Freedom to associate without monitoring

Key Considerations:

GDPR Compliance: EU data protection regulations
Data Minimization: Using only necessary data
Consent Management: Clear user permissions
Data Security: Protecting against breaches

EU AI HLEG Components:

Respect for privacy and data protection
Quality and integrity of data
Access to data

5. Community Benefit

AI should deliver clear community or government benefits and maximize social value.

Core Requirements:

Public Good: Solutions must serve broader community interests
Benefit Maximization: Optimizing positive social impact
Alternative Consideration: Evaluating AI against other analysis tools
Default Principle: Should be standard for all AI solutions

Business Intelligence​

KDD​

CRISP-DM​

Statistical Analysis​

Descriptive Statistics​

Predictive Statistics​

Data analytics results​

Ethical Principles​

1. Transparency​

2. Fairness​

3. Accountability​

4. Privacy​

5. Community Benefit​

KNIME​

Business Intelligence

KDD

CRISP-DM

Statistical Analysis

Descriptive Statistics

Predictive Statistics

Data analytics results

Ethical Principles

1. Transparency

2. Fairness

3. Accountability

4. Privacy

5. Community Benefit

KNIME