## What is a ANN?

This is a method of classification using a “neural network” setup. They are inspired by the neural networks in our brains. The setup consists of layers like in a graph. They are statistical models. It is a kind of supervised learning. Each layer represents how the pixels of the image or parts of data are being classified.

## Parts of an ANN

The **input layer** has the same no. of nodes as the size of the image(no. Of pixels). If image size is m*n, no. of input nodes are m*n. The images are read pixel by pixel. All images inputted should be of the same size so the number of input nodes are equal. Each node of the input layer shows the grey scale or illumination value of the pixel. It is between 0 and 1. It shows how black or white the pixel is.

The **dense layer** has the same no. of nodes as the total possible number of outcomes. In the digit recognition problem, it would be 10 for the set of {0,1,2,3,4,5,6,7,8,9}. Each node of the final layer has the probability that the answer is a particular class. The max of these probabilities is the answer.

The **hidden/intermediate layers **are the most important. They are the layers which help in the actual classification. More of these layers can usually lead to better accuracy. These layers in simply words help “decompose” the image more. However too many hidden layers can also lead to **over-fitting**.

The nodes are also called **neurons**, hence giving the name – neural network.

**Edges** in the network as like activation values. In layman’s terms, they help “carry” features between layers. These activation values are usually determined by various activation functions which help in finding the output of a neural network. Sigmoid functions are commonly used as activation functions. They help mapping the probability of what has been learnt between 0 and 1. Eg: tanh sigmoid function. They are used in both training and testing.

## Example of an ANN

Consider the common digit recognition problem where you have an image and have to classify the digit among the set of {0,1,2,3,4,5,6,7,8,9}. Let’s say the image has 6 and it has been converted to gray scale. Below is an approximate representation of what the ANN would do:

Each node of the ANN corresponds to a curve as such. 6 will already have a predefined set of grey scales. In the figure, the **predefined node “weights”** of 8 are shown. The probability of the image being an 8 is basically how similar the node weights match up with the uniquely predefined node weights. Using this, the neural network classifies the data.

The number of curves or parts that the image is divided into depends on the type of problem.

Neurons are illuminated based on probability. The more the model is trained, a **unique illumination** is formed over nodes.

ANN is also called** feed-forward **because the data doesn’t “feed” back and it only moves forward. It can be compared to an **acyclic** graph.

### Types of ANN

Types of ANNs include Convolutional neural network(CNN), Recursive neural networks, recurrent neural networks etc.

### Probability Node Weights

The probability node weights of a neural network are often saved and are also available for download as .xml files usually. Using these nodes, we can classify data as well. This can by done in python simply using the CascadeClassifier in the open-cv module. It works on the basis of the Haar Cascade algorithm.

## Famous Digit Recognizer Problem

**Note**: For the code below, modules such as numpy, matplotlib and keras are needed. These might not come by default with your python download. Please ensure these modules are downloaded before implementing the code below.

This problem is actually an example of a Convolutional Neural Network which is a type of ANN. I’ll be explaining more about CNNs in my next blog. For this code, we will be using the keras module in python which is the python deep learning library. It contains the famous and common datasets which include the MNIST data set which is a large collection of handwritten digits.

We begin our code with importing the modules needed.

```
import numpy as np
from matplotlib import pyplot as plt
#For plotting our data
from keras.datasets import mnist
#MNIST is our data set
from keras.models import Sequential
#Sequential is our baseline model
from keras.layers import Dense
#For adding our dense layers
from keras.utils import np_utils
#For converting our data to categorical and numpy arrays.
```

Let us now plot our data from MNIST to see how the data looks. Look here for more information on how to plot data in python.

```
(X_train, y_train), (X_test, y_test) = mnist.load_data()
#X_train are the images for training and y_train are the labels for it.
#X_test are the testing images and y_test are their labels.
plt.subplot(221)
plt.imshow(X_train[0], cmap=plt.get_cmap('gray'))
plt.subplot(222)
plt.imshow(X_train[1], cmap=plt.get_cmap('gray'))
plt.subplot(223)
plt.imshow(X_train[2], cmap=plt.get_cmap('gray'))
plt.subplot(224)
plt.imshow(X_train[3], cmap=plt.get_cmap('gray'))
cmp = plt.get_cmap('gray')
#plt.get_cmap('gray') is just a color code used for the colour of the axis.
plt.show()
```

The output is given below:

Now that we have loaded our data, we cannot classify them while they are still images. We must convert them into numpy arrays and bring them to a uniform span of values between 0 and 1.

```
num_pixels = X_train.shape[1]*X_train.shape[2] #height*width
X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype('float32')
X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype('float32')
#flatting the images which are 28x28 into a 1-Dimensional vector of length 28x28=784 and type float
#normalize your inputs from 0-255 to 0-1(bringing all to uniform span of values)
X_train = X_train/255
X_test = X_test/255
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
#Converts the data into a binary matrix.
#if the matrix[i][j] = 1, it means that the ith picture falls into category of j
num_classes = y_test.shape[1] #no. of classes in y
```

Let us now make our model using the empty model Sequential() provided by keras library. We will then add dense layers and compile it using the Dense module we imported.

```
def digit_model():
model = Sequential() #empty model
model.add(Dense(num_pixels,input_dim=num_pixels,kernel_initializer="normal", activation="relu"))
#dense layer added. input dimension=num_pixels.kernet intializer if u want kernel change or anything, default is normal
#activation function is relu(persons name and type of activation function)
#Relu(x) = max(0,x) (removes negative values and all)
model.add(Dense(num_classes, kernel_initializer="normal", activation="softmax"))
#Training on lots of classes and bringing it straight down to num_classes isn't fair
#Eg:Going from 10,000 classes to 5 classes is not good.
#The above dense layer addition ensures a more gradual reduction in no. of classes.
#Eg: From 10,000 to 5,000, to 2,000 and so on :)
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
#actually binds and brings model together.
#You will lose some features when passing through each layer.
#You can define how loss is measures(categorical_crossentropy). It is used for single label categorization.
#Some optimizers like Adam etc are there to help,they aren't needed as such.
#You can change them and check the variance in accuracy(minute variance).
#metrics: on what grounds do u want to evaluate model. Our case its accuracy(based on elements).
#Other options could include categorical_accuracy(based on classes), top k categorical accuracy(usually for complex qns)
return model
```

Now that we have created our model, we can train it and test it now find out our accuracy.

```
#_main_
model = digit_model()
#Create model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
#Trains the model
#epoch is each instance the model is passed over training set. More epochs are better mostly.
#If training data is bad, then too many epochs can cause overfitting
#batch_size is how much data u want to load at one time.
#Especially if large amt of training data, taking all the data at once is not viable.
#So the batch_size decides no. of samples loaded at once.
#verbose can be 0,1,2. verbose=2 means that the accuracy and other details are shown after each epoch
#Final evaluation
scores = model.evaluate(X_test, y_test, verbose=0)
#Scores is array which has the total loss and the accuracy in scores[1]
print("Baseline Error: %2f%%"%(100-scores[1]*100))
```

The output when the above code is run is below:

The model can also be used to test your own images and data. You can do this by opening the image in python and flattening it as we did earlier. The model will return the label predicted by using the function `model.predict(img)`

where img is the array of images you want to test.

The code for the Digit Recognizer Problem can also be found here: https://github.com/PyProjectsIsFun/Machine-Learning/blob/master/digit_recog_cnn.py

As mentioned above, probability node weights can also be used for classification. The face and eye detection code using Haar Cascades is here: https://github.com/PyProjectsIsFun/Machine-Learning/blob/master/face_detect.py

**Note**: Please note you have downloaded the face and eye detection haar cascade .xml files beforehand. These files are freely available online as well. Other haar cascade files can be found here: https://github.com/opencv/opencv/tree/master/data/haarcascades

Very Nicely articulated !

Hello! Would you mind if I share your blog with my twitter group?

There’s a lot of folks that I think would really appreciate your content.

Please let me know. Cheers

Also visit my page … Royal CBD

Yes, please… thanks a lot! Any help is appreciated!