Deep Learning Demystified
What is Deep Learning?
Traditional Machine Learning used handwritten features and modality-specific machine learning to classify images, text, or recognize voices. Deep learning / Neural network identifies features and finds different patterns automatically. Time to build these complex tasks has been drastically reduced and accuracy has exponentially increased because of advancements in Deep learning. Neural networks have been partly inspired by how 86 billion neurons work in a human and become more of a mathematical and a computer problem. We will see by the end of the blog how neural networks can be intuitively understood and implemented as a set of matrix multiplications, cost functions, and optimization algorithms.
Biological Analogy of Neural Network
Mathematical/Functional Analogy of a Neural Network
Note: A neuron is also called a Unit or a function since a human neuron in the real sense is much more complex in nature than the one we use in our Neural Networks.
History of Deep Learning
Though DL is being adopted across various enterprises in the last few years, the theory and techniques back to 1940. DL was called by various names.
Deep learning 2006-*
GPU from the gaming industry, Computation power on demand, and advancements in optimization algorithms have made it the right time for building applications using Deep learning.
- DL at a high level can be seen as a mathematical function and the nature of DL is that there exists a neural network for every problem. This is often called universality. Though there is no guarantee that we can find one. (More on: An intuitive explanation from Michael Nielsen)
- An algorithm which fine tunes the parameters of the function. Thanks to Back Propagation and Gradient descent and various versions of it which precisely does that.
- The above 2 steps which turns out to be a set of matrix multiplications and derivative calculations, which can be executed much faster on a GPU. Thanks to the Gaming industry for popularizing and a giving us faster GPU’s which was previously used for performing matrix operations to manipulate the pixels on the game screens.
- Thanks to cloud providers like Amazon, Microsoft for allowing customers to use GPU-based instances to build deep learning models on demand.
The Growth of Deep Learning
A recent talk by Jeff Dean has highlighted the massive increase in Deep learning at google. At google, DL is being used in various departments and usage has increased exponentially in the last few years.
DL is also being applied across multiple impactful applications by different teams across the globe implying its ease of use. Few of them are
- Classification of Cucumber: It’s astonishing to know how a farmer from Japan used deep learning and TensorFlow to build a machine that uses image recognition to segregate cucumbers. Which previously used to take his mother 8 hours of work for few days.
- Classification of Skin Lesions: Using a pre-trained Convolutional Neural Network, the group created a model that could accurately classify skin lesions by 60%, a four-fold increase over the previous benchmark of 15.6%. That’s a massive improvement!
- Heart Condition Diagnosis: Two hedge fund analysts built this model to diagnose heart diseases with accuracy matching that of doctors.
- Clothes Classification: This group successfully built a model that could recognize articles of clothing, as well as their style.
The applications that are being built has the ability to transform health sector, fashion industry and much more.
From Andrew Ng’s point of view. “AI with deep learning is the new electricity, Just as 100 years ago electricity transformed industry after industry, AI will now do the same.”
Popular Tools that Ease the Use of Deep Learning
There are many tools that are available today that are helping companies adopt Deep learning faster. Few of them are
- Keras: We love keras, it allows us in building and executing our DL models very quickly. It provides a clean API in Python and runs either on TensorFlow and Theano. It was built with an aim of building DL models easier and faster. It comes with a support for algorithms like Convolutional Networks, Recurrent Networks and both.
- TensorFlow: TensorFlow is an open source software from google for numerical computing using data flow graphs. It helps researchers in experimenting and building new machine learning algorithms and new specialized neural network models. In addition to that, it provides a higher level tool called TF-learn which used for building deep neural networks quicker. Google in 2016 has open sourced the distributed version of tensor flow making it scalable. TensorFlow provides you for shipping models to android/io devices thus reducing the time to deploy. It also comes with a serving layer which exposes the DL models to other systems.
- Theano: Theano has similar capabilities like Tensor Flow. We have observed a close tie between TensorFlow and Theano capabilities. Time has to say which is going to be better.
Types of Neural Networks
Let us start with understanding how a single neural network works.
In the image above we have a neuron which takes an inputs X along with weights W and passes it to function f which acts as a classifier. It is very similar to logistic regression or SVM. We have the three important parts of building a neural network.
- Score Function (f): The score function applies the activation function on the inputs along with weights. In this example, the activation function is a sigmoid which outputs values between 0 to 1, which acts as probabilities for our binary classifier network.
- Cost Function: To evaluate how the neural network has performed against the ground truth we need to use an evaluation metric. We can use binary cross entropy for the same which tells us how good our network has performed.
- Optimization Algorithm: Our function has input variables which are going to remain constant and the goal is to learn the weights to reduce the cost thus improving accuracy. There are different ways like searching the weights randomly or using algorithms like SGD (stochastic gradient descent) which helps in finding the optimal weights for the particular problem.
All the above steps are common to most of the machine learning algorithms. Let’s discuss few more jargons which are typically used in the deep learning domain before looking at different types of network.
- Hidden Layer: The name sounds too crazy when I heard it first but it is quite simple. Any layer that is between input and output is called hidden layers.
- Deep Networks: Any network that has more than 1 layer is called deep networks and as the depth increases the computational complexity also increases.
- Activation Function: We have different activation functions available like sigmoid, tanh, Rectified Linear unit(relu) — applied max(0,-), leaky relu to fix some drawbacks that come with relu, max out. The most predominantly used ones are relu and max out usage has also picked up in the recent months.
Feed Forward Neural Network
Let us take an example of predicting whether a customer is planning a trip. We build multiple ML models which capture different pattern in the data. Then we use the output from these models to run another ML model to discover patterns which were not captured by the previous models. It becomes more complex when we keep on adding more models to the prediction system. Let us look how the stacking of multiple models could look like.
Feed forward neural network does this exactly. It helps in stacking up multiple models together and discovers hidden patterns. When observed closely the above network can be represented as nested matrix multiplication and can be represented as below
- Input Layer — X, Weights to hidden layer 1 — W where W,X are represented in the form of a matrix.
- Output to hidden layer 2 — Activation function f1(WX)
- 2nd Hidden layer can be represented as f2(f1(wx))
When the network becomes deep and that is more than one layer, a new challenge arises. How do I associate the error with the cost function to multiple layers? The back propagation algorithm helps in propagating the error to all the weights in the previous layers. The best part is most of today’s deep learning tools take care of the back propagation automatically. All we need to specify is to build the topology of the network that is the number of layers, the number of nodes, what activation function to be used, and how many epochs/iterations to be executed.
Convolutional Neural Network
Feed forward neural network comes with its limitations when applied to problems like image recognition. Computers understand an image as a matrix with dimensions as height, width, channels (RGB). To use images in FFN we need to flatten the matrix into a vector of pixels. On simple image recognition like MNIST data set this gives accuracy around 95% but for complex image recognition the accuracy drops drastically. When images are flattened as vectors the spatial information about the data is lost. Convolutional networks help in capturing the spatial information.