Deep learning / Neural Networks

Table of contents

Child pages

Related pages



  • ← This looks really good. It's free.
    • How to Learn Deep Learning (when you’re not a computer science PhD)
    • Deep Learning Certificate Part I - Lecture 1
      • This was a great lecture.
      •  Click here to expand...

        He says deep learning reminds him most of the "fad" in the early 90s that he was raving about: the Internet.
        1:50 - "I think deep learning is going to be more important and more transformational than the Internet."
        2:10 - I want to talk about the way people have viewed computers in the past decade or two.
        - MI: People joked about the idea of computers that could do human things like replying to emails or translating.
        3:30 - But things have changed.

        6:00 - Let's talk about machine learning. Deep learning is a way of doing machine learning.

        It's been around a while. (He gives an example from 1956 of a guy who programmed a computer to play against itself to learn how to win).

        But up until now the programming of the learning process has been an involved process. You needed to hire a sophisticated programmer to do "feature engineering".

        Now we have the three pieces that "at least in theory ought to make machine learning universal": an infinitely flexible function, all-purpose parameter fitting, and fast and scalable. Deep learning is an algorithm for doing machine learning which has these three characteristics.

        The infinitely flexible function is the neural network, which has been around for a long time. The all-purpose parameter fitting is back-propagation, which has been around since 1974 but first got serious attention in 1986. But until very recently we didn't have the third piece (fast and scalable). The fast-and-scalable piece is a result of modern GPUs, a "wider variety of data", and "vital" improvements to the algorithms.

        9:00 - He shows a graph from a Jeff Dean talk showing a graph of the number of directories in Google's codebase that contain deep-learning model description files.

        10:00 - He gives an example of how Google mapped all of the house numbers in France in an hour by training a model to recognize and read house numbers.

        11:00 - He gives the example of how Yahoo at one point dominated the internet, and was then beaten by Google's machine learning algorithm (PageRank), and how Amazon's recommendations helped it beat Barnes and Noble.

        13:00 - He gives an example of Baidu allowing you to upload an image and see similar images (which takes into account the content of the image), and how they have a system that's better at speech recognition than humans.

        14:00 - He gives an example where Microsoft can take an image with parts missing and generate a convincing guess at what the missing parts of the image look like.

        14:50 - He shows an example where you can take four images and show various combinations of them, and how you can take a painting or picture and use deep learning to change where the person is looking. And all of it looks totally convincing.

        16:10 - A lot of people think deep learning is just about big data, but it's not.
        He gives an example of someone who trained on 50 labeled training cases of digits and got a model that was 99% accurate.

        17:00 - He shows a program that lets a user draw a crude doodle and select an art style, and it'll generate an impressive-looking painting.

        17:40 - He shows output of a model that can generate descriptions of images that describe the objects and what they're doing (how they relate to each other)

        18:00 - Three years before the talk (so...2013) he left Kaggle and spent a year researching what the best opportunities were. He concluded the #1 opportunity at that time was medicine.

        He started a company with three other experts and after two months they had a model that could classify cancer better than a panel of four of the world's best radiologists.

        20:00 - He makes the point that for entrepreneurs, you don't need to have any expertise about a domain in order to dominate it with deep learning. He had zero medical expertise.

        21:10 - You might be wondering: Where can you learn more? A: Here at the Data Institute.

        21:30 - He then switches into the first lesson.

        22:20 - "One of the things that I strongly believe is that deep learning is easy. It is made hard by people who put way more math into it than is necessary, and also by what I think is a desire for exclusivity amongst the deep learning specialists. They make up crazy new jargon about things that are really very simple."

        22:45 - I want to show you how simple it can be.

        We're going to look at MNIST, a very famous dataset of handwritten numbers. We're going to use the Jupyter notebook.

        He loads the data in: 55,000 images of size 28x28.

        In every dataset you have two things: information you're given, and information you need to derive. In this case we're given the images and the labels.

        24:30 - He makes a matrix called "top" that's 3x3, and it has a black line at the top, a white line in the middle, and a grey line at the bottom. He then shows a diagram of the values you would get if you moved that 3x3 matrix across every possible position on an input image, multiplied the values of the pixels at those points by the values in the 3x3 matrix, and then added them all together. (NW: _Very_ cool visualizations.)

        27:00 - He shows on one of the MNIST numbers what the "top" matrix does: it finds all the top edges.

        28:00 - Before deep learning, this was part of what is called "feature engineering". So an engineer would ask, " can we recognize this number? ... Maybe one of the things we can check for is where the edges are."

        28:30 - It's very easy to rotate our matrix to end up with four matrices that show different rotations of the original matrix.

        28:45 - You're going to hear "convolutional neural networks" a lot. Basically all image recognition today uses that. I think "convolutional" is one of those overly-complex words. It means the same thing as "corellation", except that you take the original filter and rotate it by 180 degrees." He shows this by using the "convolve" function with the original "top" matrix rotated by 180 degrees, and it produces the exact same image of edge-detection.

        29:30 - We can do the same thing for diagonal edges. And I can try taking our first image and correlating it with each one of those diagonal edges.

        29:45 - Why have we done all this? A: This is a kind of feature engineering; we've found eight different ways of thinking about this rendition of the number "7". In ML we want to create a fingerprint of "what does a 7 tend to look like, on average". The way we do that in deep learning is with something called "max polling". This is another of these complex-sounding things that is actually ridiculously easy; in Python it's a single line of code.

        30:20 - So what we do for this 28x28 image is we take 7x7 areas (so there'll be four of these across the top and four down the side, or 16 total), and for each area we find the value of the brightest pixel. And we do this for each of the edge-detection diagrams we had before. And this becomes kind of like a fingerprint.

        31:00 - So I'm going to now use this to find the difference between 8s and 1s.

        31:30 - He shows the first five 8s and the first five 1s, and then creates the fingerprints for the eights and shows what the top-edge fingerprint looks like for the first five eights.

        32:00 - He then creates the average edge-fingerprints across all of the eights and shows what those look like (there are four diagonal fingerprints and four non-diagonal fingerprints).

        33:00 - He then does all of that same work for the 1s and shows the average fingerprints for the 1s. It shows little activation on the diagonal-edge fingerprints but strong activation on the vertical-edge fingerprints.

        He reiterates that the hope is that we can now use these fingerprints to distinguish between an 8 and a 1.

        Next he correlates the fingerprints of all of the images of the 8s with his final "average" fingerprints of an 8.

        34:45 - He then generates a 4x4 grid of true positives, false positives, true negatives, and false negatives.

        35:40 - So that's the entirety of generating a simple deep learning model.

        35:50 - So how can we make it better? I'm sure you can think of lots of ways we can make it better.

        36:15 - There are a lot of ways we could improve the features, especially the 3x3 matrices, which in deep learning are called "filters".

        Another thing we can do is weight the filters differently depending on which are more important.

        36:40 - It'd be really nice if we had features that looked for more complex shapes, like corners.

        36:55 - Deep learning is something that does all this. It does it by something called "optimization". The way it works is that instead of starting with eight specific filters, we start with eight (or a hundred) *random* filters, and we set up a process that makes those filters better and better and better.

        37:15 - To demonstrate this, we're going to do linear regression "the deep learning way".

        38:30 - So we're going to generate thirty points along a line (where we set "a" and "b" in the equation "y = ax + b"), and then use deep learning to try to figure out what "a" and "b" are.

        38:45 - Figuring out what "a" and "b" are is the equivalent of figuring out what the optimal set of filters is for the image recognition problem. It's exactly the same thing, the only difference is that in the image recognition case we have eight filters, and in the line case we have two (a and b).

        39:10 - Once you know how to do this process for this simple case, you'll know how to do it for the more-complicated case of the image recognition problem.

    • 2017.07.17 - New course: Computational Linear Algebra







Major sites

Learning resources





  • 2015.11.09 - Slate - What Is “TensorFlow,” and Why Is Google So Excited About It?
  • 2015.11.13 - Wired - Google's TensorFlow alone will not revolutionize AI
    • by Erik Mueller (from IBM's Watson team)
  • 2015.11.13 - Indico - The indico Machine Learning Team’s Take on TensorFlow
    • Before we call out some of the features from TensorFlow that are particularly relevant to deep learning, it is worth emphasizing that the most compelling thing about TensorFlow is the usability and architecture of the project. Even if no individual piece were revolutionary, the fact that all of the pieces work together to let us compose, compute, and visualize models is a real differentiating factor. Much the same way that Amazon’s EC2 itself isn’t revolutionary, the fact that it comes bundled with the suite of AWS niceties makes it an excellent product.

      Here are some especially promising features:

      • Resource allocation.

        Using the abstraction of computation graphs, TensorFlow maps the required computations onto a set of available devices. Graph and queue abstractions are powerful here, and there are many ways to solve the problem of allocating resources to the computation. TensorFlow implements what looks like a pretty sophisticated simulation and greedy allocation algorithm with methods to minimize communication overhead between resources. Other open source libraries, if they even allow you to use more than one compute device, tend to rely on the user to allocate resources statically.

        Why is TensorFlow more promising? The obvious thing is scalability across distributed (possibly heterogeneous) resources, as in a datacenter or cloud deployment. The more subtle consequence is the allocation algorithm frees us from having to manage CPU vs. GPU devices, even on a single workstation…they’re all just resources to be used as greedily as possible.

      • Queues that allow portions of the graph to execute asynchronously.

        This looks particularly useful for pre-fetching the next batch of data while the previous batch is computing. For example, using Titan X GPUs (indico’s weapon of choice) disk I/O is often the limiting factor for some of our models. Although we work around this limitation using threaded I/O, the TensorFlow approach looks even more robust. In addition to being conceptually simpler, putting data manipulation on the computation graph allows for better device utilization.

      • Visualization with TensorBoard.

        As models get more complex, it is all too easy to skimp on model inspection and the practice of validating intuition. We believe visualization is really fundamental to the creative process and our ability to develop better models. So, visualization tools like TensorBoard are a great step in the right direction. We hope this will encourage the machine learning community in general to validate model internals, and drive towards new ways to train models and inspect performance.

      • Computations expressed as stateful dataflow graphs.

        This abstraction allows models to be deployed across heterogeneous resources without rewriting models. Using a single workstation, we can exploit both CPUs and GPUs. This has the added benefit of making it easier to deploy to a heterogeneous compute environment (cloud, datacenter, etc).

      • Mobile Deployment.

        TensorFlow is designed to work on a wide variety of hardware platforms ranging from high end multi-GPU rigs to smart phones. This enables developers to build and deploy machine learning applications on mobile devices. Advanced neural network applications such as language translation can be available without an internet connection.

  • 2015.11.29 - LinkedIn Pulse - Google TensorFlow simple examples -- Think, Understand, IMPLEMENT :-)
  • 2015.11.30 - FastML - What you wanted to know about TensorFlow






Misc ideas

  • One thing the ML algorithm could do is to try to constantly predict what is going to happen next, and update its beliefs when its prediction is either confirmed or contradicted.
    • I suspect that's how human brains work.