MLPR tut1(answers) - Machine Learning and Pattern Recognition

$$\notag \newcommand{\ba}{\mathbf{a}} \newcommand{\be}{\mathbf{e}} \newcommand{\bphi}{{\boldsymbol{\phi}}} \newcommand{\bv}{\mathbf{v}} \newcommand{\bw}{\mathbf{w}} \newcommand{\bx}{\mathbf{x}} \newcommand{\by}{\mathbf{y}} \newcommand{\bz}{\mathbf{z}} \newcommand{\g}{\,|\,} \newcommand{\la}{\!\leftarrow\!} \newcommand{\te}{\!=\!} \newcommand{\tp}{\!+\!} \newcommand{\ttimes}{\!\times\!} $$

MLPR Tutorial Sheet 1 (Answers)

We expect those wanting to do really well on the course to work through everything here before the tutorial. There are two core questions, with entry boxes on the website, that everyone should attempt. Unlike the in-note questions, you can edit answers until we release answers after the final tutorial group meets. Almost all of you should try more than just these parts.

We strongly recommend that you discuss your tutorial work — and the course in general — with your peers. If you can’t do a part, that’s normal; skip it and move on! After attempting what you can, try to meet another student from the class and pool your understanding. Tutorials run more smoothly if you have agreed with someone what you want to talk about. If you haven’t met people in the class yet, exchange details in your first tutorial!

Your tutorial session usually won’t discuss every question. Tutorials are for discussion, not to confirm all the answers: detailed answers will be made available after the week’s tutorials, and can be discussed further on Hypothesis.

import numpy as np import matplotlib.pyplot as plt def rbf_tut1_q3(xx, kk, hh): """Evaluate RBF number kk with bandwidth hh on input points xx (shape N,)""" c_k = (kk - 51) * hh / np.sqrt(2) return np.exp(-(xx - c_k)**2 / hh**2) # plotting code K = 101 hh = 0.2 xx = np.linspace(-10, 10, 1000) plt.clf() for kk in range(1, K+1): plt.plot(xx, rbf_tut1_q3(xx, kk, hh), '-') plt.show()

For any function that Alice might pick, can Bob pick the same function? Can Bob represent any functions that Alice can’t represent? Bob and Alice are both trying to pick a function to minimize square error. Given the options Bob has, why will he neither pick a different function that Alice could have picked, nor a function Alice couldn’t have picked?

This course doesn’t ask for proofs often, and doesn’t require a really formal presentation. But it does require some clear thinking and explanation, and reasoning about when model additions might help is important. If you don’t manage this question, you won’t be alone, but you will need to be able to adapt the reasoning to new situations.↩
https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix#Real_symmetric_matrices
The square-root of the central diagonal matrix can be absorbed into the matrices either side.↩
If $M$ is only positive semi-definite, then there are directions in which the weight vector can grow arbitrarily big without being penalized.↩