Skip to main content

FDA +009

· 2 min read

Linear Separability

  • the data is linearly separable if
    • it can be separated by a point on a single dimension line of data points
    • by a line on a two-dimensional representation of data points
    • by a plane (a two-dimensional surface) in a three-dimensional representation of the data points
  • If it is non-linearly separable, look at other options for classification.

Hyperplane

  • the conceptual divide between data
  • Weight vector: represented and generated in weight space.
  • Choosing the hyperplane
    • Minimum distance between samples
    • Least-squares method
    • Gradient Descent

Artificial Neural Networks, ANN

  • Strength: for high dimensionality problems, the complex relations between variables
  • Weaknesses: theoretically complex, computationally intensive, needs large data sets, complicated to implement
  • Kinds of ANN
    • Perceptrons, Multilayer Perceptrons
    • Deep learning neural networks
    • Kohonen networks
    • Convolutional neural networks
    • Radial Basis Functions
    • Recurrent neural networks
    • Support Vector Machines
    • Competitive learning
    • Boltzmann machines

Multilayer Perceptrons, MLP

  • Challenges
    • Decide on the network topology.
      • how many hidden layers are needed
      • how many neurons in each of the hidden layers
    • Find values for the weights which make the network produce the correct output values for the given input values.
  • Neural networks only accept numeric data.
    • need to convert the categorical into numeric.
    • One-Hot encoding, Thermometer encoding.
  • high values may need to be scaled into a similar range as neural networks
    • need to do a log transform to pull the values into a target range.
    • [-1, +1] or [0, 1]
  • input neurons should be as small as possible.
    • adding neurons -> more parameters and weights -> amplify any bias. (overtrain the network)
  • one categorical attribute may have many attribute values
    • each adding a parameter -> adding risk of overtraining

Resilient Propagation, RProp

  • directly adjust the weight step based on the local gradient information
  • introduces a weight update value uiju_{ij} for each weight wijw_{ij}
  • updates it based on the sign of the partial derivative of the error with respect to the weight
  • update value uiju_{ij}
    • if sign changes (i.e. jumped over local minima) -> uiju_{ij} slightly decreased.
    • if sign remains the same -> uiju_{ij} slightly increased.
  • weight wijw_{ij}
    • if derivative is +ve+ve (i.e. error increasing) -> wijw_{ij} decreased by uiju_{ij}
    • if the derivative is negative (i.e. error decreasing) -> wijw_{ij} increased by uiju_{ij}
    • if the derivative changes sign, the last weight update is reverted. (backtracks the last weight update)

KNIME

  • RProp MLP Learner + MultiLayerPerceptron Predictor
  • MultilayerPerceptron + Weka Predictor (back propagation with momentum)