If you have any doubts in the below, contact us by dropping a mail to the Kung Fu Panda.
We will get back to you very soon.
Artificial Neural Networks (ANNs)
 used where the input and output data are simple, but the processing that leads to the output from the input is complex.
 try to mimic the brain to form a network of neurons to enable complex decision making.
 input signals arrive at a node which has an activation function and can lead to output signal(s)
 human brain ~85 billion neurons, rat brain ~1 billion neurons
Turing Test
A machine is intelligent if we cannot distinguish its behaviour from a living creature's.
Usages
 handwriting recognition/speech recognition.
 sophisticated models of weather and climate patterns.
 self driving cars/self piloting drones.
Key characterstics of ANN
 Activation Function
 function at each node which transforms a set of input signals/data into the output signal/data.
 Network Topology
 describes a way in which the different neurons are connected to each other.

 trains the neurons to setup the appropriate weights so that the neurons can predict the output from the input.
Activation Function
 choice of activation function makes the ANN fit some kind of data.
 like for binary decisions, step function makes more sense.(self)
 Activation Threshold function
 if the input meets a threshold criteria, the signal fires, otherwise it does nothing.
Different Activations Functions
 Sigmoid Activation Function
 also called logistic sigmoid function.
 most commonly used activation function.
 varies from 0 to 1
 called the squashing function because it squashes and all signals less than 5 and > 5 are squashed to 0.
 Linear Activation Function
 varies from 1 to 1
 mostly results in model using Linear Regression Model.
 Saturated Linear Activation Function
 Hyperbolic Tangent Activation Function
 Gaussian Activation Function
 mostly results in model using radial basis function(RBF)
Network Topology
determines how the neural network is able to learn.
Characterstics of Network Topology
 No of layers
 direction of information flow
 no of nodes within each layer of the network.
No of layers in the network
 each nodes in a layer calculates the output based on params which are independent.
 if there are multiple layers, then that means that the output of the second layer depends on the output of the first.
 most multi layer networks are fully connected, which means that every node in one layer is connected to every node in the next.
 nodes in the next level are hidden nodes, because they don't specify a feature but some feature based on logic of previous features.
Direction of information flow
 whether the signals in the network can travel only forward or forward and backwards.
 feed forward networks: network where information flows in only one direction, ie forward.
 even in single directional networks, many optimizations can be applied.
 multiple outcomes at each level
 multiple hidden layers at each level => DEEP Neural Network.
 recurrent network/feedback network
 information flows both ways.
 short term memory/delay is useful in a recurrent network.
 can be used for stock market prediction, speech recognition, weather prediction.
 rarely used in practice.
 multilayered feedforward networks are called MultiLayer Perceptron(MLP) are most common neural networks.
DEEP NEURAL NETWORK
 A neural network with multiple hidden levels is called DEEP NEURAL NETWORK
 A Deep Neural Network is trained using Deep Learning.
No of nodes in each layer
 determined by the number of features
 no of nodes in next layer determined by the number of possible outcomes
 no of hidden nodes need to be decided before training the data.
 no of hidden nodes depend on
 no of input nodes
 amount of training data
 noise in the data
 complexity of learning etc
 too many nodes often leads to high overfitting.
 fewer nodes leading to good performance is better than too many nodes with only slightly better performance.
Training with Backpropagation
 network topology without training is not useful.
 trained so that the bonds in the topology get weakened/strengthened and weights get assigned to the bonds.
Backpropagation strategy
 signals propagate back and forth and finally reach a decision on the weightage/activation function.
 the error from the last layer is propagated to all layers back so that the weights of the edges are changed.
 the above process is recursive.
 uses gradient descent to determine how much weight should be changed.
learning rate in gradient descent
 the amount with which the algo changes the weights that reduce the error.
 more learning rate => algo will finish faster but can miss some best optima.
Pros of ANN
 can be used for classification or numeric problems
 can identify even complex patterns.
Cons of ANN
 extremely slow to train.
 may lead to overfitting .
 difficult to interpret results.
coding for ANN
 in R, mostly a "neuralnet" package is used for ANN.
 no of hidden nodes = 1 means that it is similar to linear regression.
 increasing the no of hidden nodes can really increase the accuracy of a model.
code
 data=read.csv("data.csv")
 str(data)
 normalize=function(x){return (xmin(x))/(max(x)min(x))}
 data_norm=as.data.frame(lapply(data,normalize))
 summary(data_norm$result) => should range from 0 to 1.
 data_train=data_norm[1:1000, ]
 data_test=data_norm[1001:1500, ]
 install.packages("neuralnet")
 library(neuralnet)
 model=neuralnet(target ~ predictors, data=data_train, hidden=1)
 model_result=compute(model,data_test)
 model_predictions=model_result$dependentVariable
 cor(model_predictions, data_test$dependentVariable) => high correlation means high accuracy.
 plot(model) => shows the neural network