One of my friend sent out an email asking for suggestion on an algorithm to cluster data given a similarity matrix. And that led to a discussion and another friend forwarded a link to k-medoids algorithm and suggested using it instead of k-means because its more robust to noise etc. But I was still wondering how to find the right k and even if I find the right k is k-medoids better than k-means. I was going to leave it to my friend to figure it out but was too impatient to wait that long.
So, I decided to investigate more and kept reading the wikipedia pages and research papers pertitnent to the topic. Instead of just reading this time I wanted to challenge myself. So, I kept a But, instead of just reading it I wanted to challenge myself. So, I set a 5 hour deadline, to understand, formalize, implement, test, create a video, upload and finally write this blog post on it. I started a timer and boy I was in for a surprise... It took me 6 hours to get everything done :(
There are three advantages I found in doing this.
1) I learnt about the two methods and related work in a quick time, good enough to implement them and know their limitations.
2) A timer countdown helps you focus and ignore everything else instead of just saying I will do this today!
3) Most important, I wanted to see how fast I could transition from basic knowledge to the point of having a running code and beautiful outputs with a good report and finally a blog entry that will stay forever(or till blogspot's existence)
I started on a whiteboard with todo list(image below) and time estimates and within 4 hours got the code working after which I had to spend time on making the video, writing the report, uploading, creating the blog entry etc..
The youtube video of the final output
Matlab code: code