Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 48 Current »

Table of contents

Child pages

Related pages

Demos

Courses

  • Fast.ai ← This looks really good. It's free.
    • How to Learn Deep Learning (when you’re not a computer science PhD)
    • Deep Learning Certificate Part I - Lecture 1
      • This was a great lecture.
      •  Click here to expand...

        He says deep learning reminds him most of the "fad" in the early 90s that he was raving about: the Internet.
        1:50 - "I think deep learning is going to be more important and more transformational than the Internet."
        2:10 - I want to talk about the way people have viewed computers in the past decade or two.
        - MI: People joked about the idea of computers that could do human things like replying to emails or translating.
        3:30 - But things have changed.

        6:00 - Let's talk about machine learning. Deep learning is a way of doing machine learning.

        It's been around a while. (He gives an example from 1956 of a guy who programmed a computer to play against itself to learn how to win).

        But up until now the programming of the learning process has been an involved process. You needed to hire a sophisticated programmer to do "feature engineering".

        Now we have the three pieces that "at least in theory ought to make machine learning universal": an infinitely flexible function, all-purpose parameter fitting, and fast and scalable. Deep learning is an algorithm for doing machine learning which has these three characteristics.

        The infinitely flexible function is the neural network, which has been around for a long time. The all-purpose parameter fitting is back-propagation, which has been around since 1974 but first got serious attention in 1986. But until very recently we didn't have the third piece (fast and scalable). The fast-and-scalable piece is a result of modern GPUs, a "wider variety of data", and "vital" improvements to the algorithms.

        9:00 - He shows a graph from a Jeff Dean talk showing a graph of the number of directories in Google's codebase that contain deep-learning model description files.

        10:00 - He gives an example of how Google mapped all of the house numbers in France in an hour by training a model to recognize and read house numbers.

        11:00 - He gives the example of how Yahoo at one point dominated the internet, and was then beaten by Google's machine learning algorithm (PageRank), and how Amazon's recommendations helped it beat Barnes and Noble.

        13:00 - He gives an example of Baidu allowing you to upload an image and see similar images (which takes into account the content of the image), and how they have a system that's better at speech recognition than humans.

        14:00 - He gives an example where Microsoft can take an image with parts missing and generate a convincing guess at what the missing parts of the image look like.

        14:50 - He shows an example where you can take four images and show various combinations of them, and how you can take a painting or picture and use deep learning to change where the person is looking. And all of it looks totally convincing.

        16:10 - A lot of people think deep learning is just about big data, but it's not.
        He gives an example of someone who trained on 50 labeled training cases of digits and got a model that was 99% accurate.

        17:00 - He shows a program that lets a user draw a crude doodle and select an art style, and it'll generate an impressive-looking painting.

        17:40 - He shows output of a model that can generate descriptions of images that describe the objects and what they're doing (how they relate to each other)

        18:00 - Three years before the talk (so...2013) he left Kaggle and spent a year researching what the best opportunities were. He concluded the #1 opportunity at that time was medicine.

        He started a company with three other experts and after two months they had a model that could classify cancer better than a panel of four of the world's best radiologists.

        20:00 - He makes the point that for entrepreneurs, you don't need to have any expertise about a domain in order to dominate it with deep learning. He had zero medical expertise.

        21:10 - You might be wondering: Where can you learn more? A: Here at the Data Institute.

        21:30 - He then switches into the first lesson.

        22:20 - "One of the things that I strongly believe is that deep learning is easy. It is made hard by people who put way more math into it than is necessary, and also by what I think is a desire for exclusivity amongst the deep learning specialists. They make up crazy new jargon about things that are really very simple."

        22:45 - I want to show you how simple it can be.

        We're going to look at MNIST, a very famous dataset of handwritten numbers. We're going to use the Jupyter notebook.

        He loads the data in: 55,000 images of size 28x28.

        In every dataset you have two things: information you're given, and information you need to derive. In this case we're given the images and the labels.

        24:30 - He makes a matrix called "top" that's 3x3, and it has a black line at the top, a white line in the middle, and a grey line at the bottom. He then shows a diagram of the values you would get if you moved that 3x3 matrix across every possible position on an input image, multiplied the values of the pixels at those points by the values in the 3x3 matrix, and then added them all together. (NW: _Very_ cool visualizations.)

        27:00 - He shows on one of the MNIST numbers what the "top" matrix does: it finds all the top edges.

        28:00 - Before deep learning, this was part of what is called "feature engineering". So an engineer would ask, "Hmm...how can we recognize this number? ... Maybe one of the things we can check for is where the edges are."

        28:30 - It's very easy to rotate our matrix to end up with four matrices that show different rotations of the original matrix.

        28:45 - You're going to hear "convolutional neural networks" a lot. Basically all image recognition today uses that. I think "convolutional" is one of those overly-complex words. It means the same thing as "corellation", except that you take the original filter and rotate it by 180 degrees." He shows this by using the "convolve" function with the original "top" matrix rotated by 180 degrees, and it produces the exact same image of edge-detection.

        29:30 - We can do the same thing for diagonal edges. And I can try taking our first image and correlating it with each one of those diagonal edges.

        29:45 - Why have we done all this? A: This is a kind of feature engineering; we've found eight different ways of thinking about this rendition of the number "7". In ML we want to create a fingerprint of "what does a 7 tend to look like, on average". The way we do that in deep learning is with something called "max polling". This is another of these complex-sounding things that is actually ridiculously easy; in Python it's a single line of code.

        30:20 - So what we do for this 28x28 image is we take 7x7 areas (so there'll be four of these across the top and four down the side, or 16 total), and for each area we find the value of the brightest pixel. And we do this for each of the edge-detection diagrams we had before. And this becomes kind of like a fingerprint.

        31:00 - So I'm going to now use this to find the difference between 8s and 1s.

        31:30 - He shows the first five 8s and the first five 1s, and then creates the fingerprints for the eights and shows what the top-edge fingerprint looks like for the first five eights.

        32:00 - He then creates the average edge-fingerprints across all of the eights and shows what those look like (there are four diagonal fingerprints and four non-diagonal fingerprints).

        33:00 - He then does all of that same work for the 1s and shows the average fingerprints for the 1s. It shows little activation on the diagonal-edge fingerprints but strong activation on the vertical-edge fingerprints.

        He reiterates that the hope is that we can now use these fingerprints to distinguish between an 8 and a 1.

        Next he correlates the fingerprints of all of the images of the 8s with his final "average" fingerprints of an 8.

        34:45 - He then generates a 4x4 grid of true positives, false positives, true negatives, and false negatives.

        35:40 - So that's the entirety of generating a simple deep learning model.

        35:50 - So how can we make it better? I'm sure you can think of lots of ways we can make it better.

        36:15 - There are a lot of ways we could improve the features, especially the 3x3 matrices, which in deep learning are called "filters".

        Another thing we can do is weight the filters differently depending on which are more important.

        36:40 - It'd be really nice if we had features that looked for more complex shapes, like corners.

        36:55 - Deep learning is something that does all this. It does it by something called "optimization". The way it works is that instead of starting with eight specific filters, we start with eight (or a hundred) *random* filters, and we set up a process that makes those filters better and better and better.

        37:15 - To demonstrate this, we're going to do linear regression "the deep learning way".

        38:30 - So we're going to generate thirty points along a line (where we set "a" and "b" in the equation "y = ax + b"), and then use deep learning to try to figure out what "a" and "b" are.

        38:45 - Figuring out what "a" and "b" are is the equivalent of figuring out what the optimal set of filters is for the image recognition problem. It's exactly the same thing, the only difference is that in the image recognition case we have eight filters, and in the line case we have two (a and b).

        39:10 - Once you know how to do this process for this simple case, you'll know how to do it for the more-complicated case of the image recognition problem.

    • 2017.07.17 - New fast.ai course: Computational Linear Algebra

Tutorials

Books

People

Technologies

Papers

Articles

Videos

Websites

Misc ideas

  • One thing the ML algorithm could do is to try to constantly predict what is going to happen next, and update its beliefs when its prediction is either confirmed or contradicted.
    • I suspect that's how human brains work.


  • No labels