Common Applications of Deep Learning

This article reviews some of deep learning's common applications.

Introduction

So far, we have gone from single-layer neural networks to multi-layer models with many hidden layers. We have also reviewed how these neural networks can serve as powerful tools for both classification and regression tasks. However, at this point, we still have only just scratched the surface; pushing beyond image classification, or simple regression use cases, deep learning approaches are now ascendant in fields ranging from cybersecurity and transportation to games and robotics.

Critically, deep learning hinges on one fundamental idea: given a dataset and a loss function, descend the gradient. As a result, we can generalize neural networks to solve a variety of different problems. However, specific domains present both unique challenges and useful assumptions.

In this article, we will discuss many common applications for deep learning, and highlight how neural networks have been adapted to these respective tasks.

Classification and Prediction in Challenging Domains

Neural networks excel at recognizing complex patterns in data, especially when that data is plentiful. It follows that deep learning is most commonly applied to datasets with many input features or where those features interact in complicated ways. As a result, neural networks have been wildly successful at tackling complex prediction and classification problems in domains including medicine and agriculture.

Consider the task of diagnosing a patient. While doctors may have an entire dossier of patient health records, only a select few of these data will be useful for making a correct prediction. Similarly, meaningful features can be extracted from combinations of raw data. For example, old age, a smoking habit, and a persistent cough together are much more predictive of lung cancer than those three features in isolation.

Because neural networks fantastically perform feature selection and extraction, they have successfully tackled many problems in medical subdomains, and show promise in others. Practitioners use deep learning to detect abnormalities in medical scans, predict health outcomes, or even combine clinical notes and medical codes to make a diagnosis.

Convolutional Neural Networks can detect abnormalities in chest scans, like pneumonia. This photo shows a chest x-ray, with a discoloration around the bottom left of the ribcage. There is a box around the discoloration, labeling it "pneumonia."

Agronomics, the science of agriculture, is another field where deep nets are blossoming. Agriculture represents a precarious environment: harvests can be derailed by disease, drought, or plant infestations, which emerge from the complex interactions between crops, soil, and climate. In this setting, neural networks can provide powerful forecasting and classification tools to monitor and identify these threats. Additionally, neural networks can provide tools to assist farmers with scaling-up production.

A drone hovering over a field of crops.

Neural networks can ingest the rich data coming from sources like moisture sensors, drone footage, satellite imagery, and climate data, and extract relevant features for soil management, crop yield prediction, classification of plant species, livestock monitoring, and drought forecasting.

Sequential Data

Many different tasks can be described as: “Given a sequence of data, how can we predict its next item(s)?” For example, consider the task of predicting the next word in a sentence. In computer science, this specific task is referred to as language modeling.

Language generation has undergone exceptional progress since the inception of deep learning. This has largely been because there is a lot of language data available (from the complete works of Shakespeare to our text messages) and because the rules of language are very complex (so alternative approaches to deep learning are hard to come by).

Human written prompt: In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. Model takes over:  The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved. Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow. Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez. Pérez and his friends were astonished to see the unicorn herd. These creatures could be seen from the air without having to move too much to see them – they were so close they could touch their horns.(Source: OpenAI: Better Language Models and their Implications)

Some of the neural network architectures that perform the best on language modeling tasks often exploit language’s sequential nature. One of the most widely applied models for sequential data is called a Recurrent Neural Network (RNN).

A recurrent neural network. At each time step, an input feature is fed into the model. This feature updates the hidden state of the network, which then returns a new output. Information is passed across timesteps, so each new output is a function of the different previous inputs. We can visualize RNNs as a box with an arrow coming out and going back into it (representing the hidden state). We can also visualize an RNN as "unrolled": different boxes applied to each different timestep's input, with arrows moving left to right connecting each box.

Rather than just concatenating our input words together, RNNs process them sequentially. For every input word in order, we pass that word to our model. At each timestep, the input is then used to update the model’s hidden state. If we just want to predict the last word, we simply feed all but the last word into the model, then use the final hidden state to predict the next word. Alternatively, if we want to generate the entire sentence, we can feed a starting word in, update the hidden state, generate a new word, then pass that back into the new model, and so on.

Of course, this approach isn’t just useful for text generation. RNNs can be applied to sequential problems ranging from time series forecasting (like predicting stock prices), to even music generation.

Another popular use of neural networks is for translation between sequences. For example, to translate from Hindi to English, we can use one RNN to encode the Hindi sentence. Then can pass these encoded representations to another RNN, which decodes out the corresponding French sentence. This is called an Encoder-Decoder network.

An Encoder-Decoder architecture applied to Hindi-English translation. The Hindi sentence is encoded sequentially by the decoder. This visual shows each word being fed in. Then the encoded information is passed to the decoder, which generates the translated English sentence. The visual shows arrows going from the different encoder timesteps to each decoder timestep, resulting in each predicted word.

Encoder-Decoder architectures are used for a variety of tasks, including language translation and summarization. We can even replace the RNN Encoder with a convolutional neural network, utilize the resulting model for image captioning!

Image captioning model. A picture of people crossing the street is encoded by a Convolutional Network, and this is passed on to an RNN which decodes out "People crossing the street <end>"

Any time we apply neural networks, we are sacrificing interpretability for effectiveness. This may be most true when working with sequential data, where entire sequences of information are combined into single vectors. This lack of interpretability raises a few problems: these models can secretly pick up on spurious correlations (false patterns that don’t capture the true meaning of language), or even worse, secretly encode bias from the text datasets used to train these models.

Autoencoders and Anomaly Detection

Let’s imagine we train an Encoder-Decoder architecture to encode an image into a hidden state, then decode out that very same input. If we do this, that hidden, intermediate vector of information will learn to encode the information from that input necessary for re-generating that same picture. In other words, that single intermediate vector will store the “meaning” of the input data.

Now, what happens if we make the intermediate hidden state smaller? In this case, we must compress the information in our input further, while still trying to preserve the features that matter. In order to do this, the encoder must also throw away features that don’t matter.

In this image, the input is fed into two hidden layers that shrink it down to a small box, "the code". Two increasingly big layers then scale it back up to its original size. The first two layers are labeled the "Encoder". The last two layers are labeled the "Decoder"

This is the big idea of autoencoders:

  • The Encoder encodes input, and compresses it into a smaller latent representation, referred to as the Code.
  • The Decoder tries to reconstruct the input.
  • Our loss is the difference between our output and the original input. This is called the reconstruction loss.

So far in this course, we have focused on supervised learning approaches, where we train our model to map an input to a label. Autoencoders are our first example of unsupervised learning: approaches where we learn the patterns and structure of our data without labels.

Autoencoders have many uses. For one, they can be used as a preprocessing step, to compress documents and images. These smaller vectors can also be used in downstream tasks like classification, clustering, or information retrieval.

Without any additional labeled data, autoencoders can also be used for anomaly detection: the identification of rare, or suspicious data points (e.g. fake documents or credit card fraud).

GIF depicting how data is encoded by an autoencoder. Four different data points, representing by horizontal blocks are fed into our encoder. We visualize the resulting encoded representation in grid containing data points. Outliers in the data are encoded farther away from normal inputs. We mark the last datapoint, which is far from the others in the grid, as an "outlier."

As we noted earlier, autoencoders throw away information not needed to reconstruct regular training data. As a result, if we give the autoencoder an anomalous data point as input, it will be very different from the average training data, and will be harder to reconstruct. This means that your model will have a higher reconstruction error! This approach is used to accurately detect anomalies in datasets ranging from accounting data to brain scans.

Reinforcement Learning

For many games, the best player in the world isn’t human. DeepMind’s AlphaZero dominates the best human players in both Chess and Go. In Atari, Deep Q Learning has produced agents that are as good — or better — than any human gamer.

In the reinforcement learning framework, an agent takes actions in an environment and receives rewards. These are positive when the agent does good things (e.g. scores a point) and negative when the agent does bad things (e.g. loses health or dies). Using neural networks, combined with reinforcement learning loss functions, we can teach agents complex behaviors from scratch.

Simulation environments are bridging the divide between these games and real-world applications. For example, deep learning models are being trained via reinforcement learning to drive cars in a virtual setting. These models can then be fine-tuned in the real world!

GANs

Let’s say we want a model that generates pictures of cats. One possibility would be to use reinforcement learning. For example, we could have a human give our model a positive reward or negative reward: positive for good cat generations, negative for bad cat generations. However, getting those human labels would reduce training to a snail’s pace.

Alternatively, what if we train another model to determine whether our network is generating cat-like photos? Now, rather than just a cat-generator network, we introduce a real cat/generated cat classifier, called a discriminator network, and task this model with sorting out generated cats from real ones. This other model can then provide the training signal for the generator.

That’s the big idea behind Generative Adversarial Networks (GANs).

A visualization of a GAN architecture. A square of random noise is fed into our generator, which turns the the noise into a generated, blurry cat. There are are a collection of real cat images, one of which is selected. There are arrows going from both this real cat image, and the generated cat image into the discriminator. After the discriminator there is a check and an "X", indicating the discriminator's decision for each input ("Real" or "Not Real"). This arrow then finally goes to the loss.

More formally, we train a generator network to generate images (by constantly fine-tuning its parameters) that will be classified as ‘real’ by a discriminator network. At the same time, we train our discriminator to take the latest generations from our generator, along with real images, and to classify them as real or fake.

Here’s how it all fits together, in the case of “cat generation”:

  1. The generator network takes in a random noise vector, and transforms it into a candidate cat image.
  2. The discriminator network is fed both generated images and real images.
  3. The discriminator is trained to differentiate generated cats from real cats.
  4. The generator tries to maximize how much it fools the discriminator.

And voila! GANs.

Excitingly, GANs can work very well, with no hand-engineered features. However, it’s important to note that training can be finicky, and selecting the right hyper-parameters is difficult. Sometimes your discriminator will get too good (and the generator won’t learn), and sometimes your generator will learn to only generate a single cat (mode collapse).

However, GAN-approaches have proved wildly successful, from image editing to face generation and style transfer. GANs have also been used to generate additional samples to augment existing datasets, including in medical domains.

Yet, GANs also introduce several ethical challenges. First, they have more sinister applications. Researchers have used GANs to generate seemingly genuine footage and audio of politicians, stoking fears that GANs offer dangerous tools for production of fake news content: so called Deep Fakes.

There are other, less explicit dangers to GANs. Because generators are training to replicate existing data, they can reproduce, or exacerbate biases in those data sets. For example, researchers found that Snapchat’s GAN-based filters, trained on imbalanced data, regularly whiten the skin of users.

In sum, GANs are representative of both the power and perils of neural networks. As deep learning practitioners, we should not only appreciate the utility of our models, but also be wary of the implications of our work.

Conclusion

In this article, we have discussed how neural networks are applied to hard classification and prediction problems, tasks involving sequential data, unsupervised anomaly detection, reinforcement learning, and GANs.

These applications represent just a sampling of the ever-developing, vibrant field of deep learning.