Week 9, 16–20 November

Welcome to week 9! This week we look at Autoencoders, PCA, and a related application.

As usual your mark this week comes from: completing the discussion task (10%), attempting the in-note questions (20%), and this week’s assessed questions (70%). Full details on the assessments page including rules you must know.

Office hours:

You can meet us on MS Teams in the Meet-up channel of the MLPR 2020/21 Chat team on Friday at 9:30 AM and 4:30 PM UK time (GMT). One of Arno or Iain will be there. If you want to discuss something individually, please contact us by email: Arno aonken@inf.ed.ac.uk or Iain i.murray@ed.ac.uk.

Here is what you need to do in Week 9:

Any catch-up: If there are any threads on hypothesis that didn’t get resolved (allow 48 hrs), email Arno and Iain.
Lecture notes: Work through the Week 9 notes, answering all the questions. It’s fine to make mistakes here, but an honest attempt at these by Friday at 4pm (UK time) is required.
Question sheet: Do the week 9 question sheet. This question sheet is assessed and forms the bulk of this week’s marks.
Tutorial group discussions: You should post at least one thing that you’d find it useful to go over with your group and/or tutor. Or, if you’re on top of everything, state in advance that you would be happy to answer questions from the group.

This week’s discussion is about recommendation systems. Create a short summary of your conclusions for your tutor, probably as bullet points. This summary is (lightly) assessed! See the group instructions for details on how to submit the group discussion report.

In the 2006 Netflix prize challenge, the goal was to predict customers’ integer movie ratings from 1-5. The w9b note sketches an “SVD-like” approach. If user \(n\) has parameters \(\bu^{(n)}\), movie \(m\) has parameters \(\bv^{(m)}\), and the prediction for user \(n\)’s rating of movie \(m\) is \(f(n,m) = \bu^{(n)\top}\bv^{(m)}\). The parameters are fitted using stochastic gradient descent on the square difference between predictions \(f\) and the observed rating.

How might we improve or extend this method? Some ideas are below. Try to make progress with as many of these as you can, and/or your own ideas, which could well be better or more interesting. We’re looking for concrete, testable ideas, that a fellow final-year or MSc project student could plausibly follow up on:
- Can we use the fact that we know the output is an integer in 1-5 somehow?
- The Netflix data came with the date of each rating, and the title of each movie. Can you include either or both of these in the prediction function that we’ll fit by SGD?
- Is there other data you’d like to use? If so, how?
- What would we do for a new customer, or one who has rated very little? (This is known as the “cold start” problem.) What do we do with new movies?
- Our model doesn’t actually predict which movies a customer will rate; just the rating assuming they rate it. How could we predict what they’ll rate? Could we use customer representations from this task to help the original rating task?
- How might we regularize the model? Are there any tricks (regularization or otherwise) you’ve learned that might be useful in this setting?
- Would you go about the whole problem differently? If so, how and why?
- Could a recommendation system cause harm to anyone: customers, society, or businesses? If so how, and could anything be done about it?

We recommend that you aim to finish the questions (in the notes and question sheet) and submit your discussion report by the end of Thursday. We will assess only what you have submitted by 4pm UK time on Friday.

As in the Informatics late policy, extensions are not available for weekly hand-ins. We expect many students to miss or under-perform on one hand-in, and will discount the one with the lowest mark. If you experience more significant disruption to your studies, you may need to file special circumstances. Consult your Personal Tutor or Student Support team. Lecturers on a course cannot make allowances outside these procedures.