본문으로 건너뛰기

FDA +002

· 약 8분

Business Intelligence

KDD

knowledge discovery in databases (KDD) refers to the comprehensive process of finding knowledge in data.

  • Learning from the application domain
  • Creating a target dataset
  • Data cleansing/pre-processing
  • Data reduction/projection
  • Choosing the function of data mining
  • Choosing the data mining algorithm
  • Data mining
  • Interpretation
  • Using discovered knowledge

CRISP-DM

The Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology provides a structured approach to planning a data mining project. As it is a cross-industry standard, it is widely used by practitioners who need a repeatable approach for data mining projects and can be used in a variety of machine learning projects.

  • Business understanding: Set up a business problem and understand what you want to accomplish from a business perspective.
  • Data understanding: Identify, collect and review the required data.
  • Data preparation: Prepare your data for modeling.
  • Modeling: Analyze possible approaches and develop the model.
  • Evaluation: Evaluate results against business needs.
  • Deployment: Deploy the model.

Statistical Analysis

Data analytics has borrowed from statistical analysis, which involves collecting data, counting, probabilities, and hypothesis testing.

The two main approaches that are relevant to data analytics are:

Descriptive Statistics

  • Purpose: Analyze past events using historical data
  • Data Source: Stored data from previous activities
  • Application: Assists companies to make informed decisions based on statistical analysis of historical patterns
  • Focus: "What happened?" - Understanding past performance and trends

Predictive Statistics

  • Purpose: Predict future events based on currently available data
  • Data Source: Present and historical data combined with analytical models
  • Application: Provides statements or predictions about events that have not yet occurred
  • Focus: "What will happen?" - Forecasting future outcomes and behaviors

Data analytics results

The presentation of data analytics results needs to be understandable by humans, easily used, and accurate on computers.

The effectiveness of different data analytics methods can be evaluated across two dimensions:

  • X-axis: Computer accuracy (how accurate the method is)
  • Y-axis: Human understandability (how easily humans can interpret the results)

Ethical Principles

Despite the proliferation of ethical principles, some are especially significant for data analytics and AI solutions and must be implemented as mandatory ethical principles.

The five core mandatory ethical principles for AI and data analytics are:

1. Transparency

The need to describe, inspect and reproduce the mechanisms through which AI systems make decisions and learn to adapt to their environment.

Key Components (EU AI HLEG):

  • Traceability: Understanding the data flow and decision path
  • Explainability: Providing reasonable explanations for AI outputs
  • Communication: Clear information sharing with stakeholders

Stakeholder Requirements:

  • Users: Understanding what the system is doing and why
  • Creators: Validation and certification of AI systems
  • Operators: Understanding processes and input data
  • Investigators: Accident investigation capabilities
  • Regulators: Investigation and compliance support
  • Legal System: Evidence and decision-making support
  • Public: Building confidence in technology

2. Fairness

A complex, multi-faceted concept ensuring AI systems do not discriminate against individuals or groups.

Types of Fairness:

  • Process Fairness: Ethical methods regardless of outcome
  • Outcome Fairness: Ensuring algorithmic outputs don't perpetuate bias

Ethical Perspectives:

  • Equity: Discretion and fairness in applying justice
  • Social Justice: Equality and solidarity in society
  • Distributive Justice: Appropriate distribution of benefits
  • Procedural Justice: Fair allocation procedures
  • Interactional Justice: Appropriate interpersonal treatment

EU AI HLEG Components:

  • Avoidance of bias
  • Accessibility and universal design
  • Stakeholder participation

3. Accountability

Clear acknowledgement and assumption of responsibility for AI actions, decisions, and impacts.

Three Types of AI Accountability:

  1. System-Level: AI's ability to explain and justify decisions
  2. Individual/Group: Determining who is responsible for AI impacts
  3. Sociotechnical: Broader system accountability for development and deployment

EU AI HLEG Components:

  • Auditability: Systems can be examined and verified
  • Impact Reporting: Minimizing and documenting negative effects
  • Trade-off Documentation: Recording decision rationales
  • Redress Ability: Mechanisms for addressing harm

4. Privacy

The right to control how personal data is collected, stored, modified, used, and exchanged.

Seven Types of Privacy (Finn et al.):

  1. Privacy of the Person: Body functions and characteristics (biometrics, genetics)
  2. Privacy of Behaviour: Sensitive activities (political, religious, sexual preferences)
  3. Privacy of Communication: Private communications protection
  4. Privacy of Data and Image: Control over personal data and images
  5. Privacy of Thoughts and Feelings: Mental privacy rights
  6. Privacy of Location and Space: Movement without tracking
  7. Privacy of Association: Freedom to associate without monitoring

Key Considerations:

  • GDPR Compliance: EU data protection regulations
  • Data Minimization: Using only necessary data
  • Consent Management: Clear user permissions
  • Data Security: Protecting against breaches

EU AI HLEG Components:

  • Respect for privacy and data protection
  • Quality and integrity of data
  • Access to data

5. Community Benefit

AI should deliver clear community or government benefits and maximize social value.

Core Requirements:

  • Public Good: Solutions must serve broader community interests
  • Benefit Maximization: Optimizing positive social impact
  • Alternative Consideration: Evaluating AI against other analysis tools
  • Default Principle: Should be standard for all AI solutions

KNIME

KNIME