This summer I am taking a basic statistics course and auditing an advanced one. If the basic one is increasing my knowledge and giving me the confidence the advanced one is killing me!(in a very very good way!) After a very long time have I have had to spend nights understanding things from various textbooks and the joy in doing it is awesome! I am writing this post to share few thoughts and resources related to the topic.
1) In the past many of my friends have asked me for soft copies of ebooks. I download most of them from the following site:
http://library.nu
I found almost 90% of the textbooks & novels I searched for from this website. I am using the following four statistics books to keep myself busy! Each book uses a different notation which gets to me more often than not but as of now the patience is paying off.
a) Probability & Statistics by Athanasios Paupoulis
b) Probability & Statistics in Engineering by Hines et. al.
c) Theory and problems of probability & statistics by Schaum series
d) All of statistics by Weisserman
2) I hate calculating expecation, covariance and correlation and all that crap for discrete distributions. Firstly, the calculations are cumbersome and secondly I hate Algebra! So, I put together the following script that calculates these. Just update the pdf and x, y vectors and you will get all the answers you need.
pdf=[11/50, 4/50, 2/50, 1/50, 1/50, 1/50;
8/50, 3/50, 2/50, 1/50, 1/50, 0;
4/50, 3/50, 2/50, 1/50, 0, 0;
3/50, 1/50, 0, 0, 0, 0;
1/50, 0, 0, 0, 0, 0;]
x = [0:5]
y = [0:4]
fx = sum(pdf)
fy = sum(pdf,2)'
ex = sum(x.*fx)
ex2 = sum(x.*x.*fx)
varx = ex2 - ex*ex
ey = sum(y.*fy)
ey2 = sum(y.*y.*fy)
vary = ey2 - ey*ey
[xgrid ygrid] = meshgrid(x,y)
exy = sum(sum(xgrid.*ygrid.*pdf))
covxy = exy - ex*ey
corr = covxy/(sqrt(varx)*sqrt(vary))
3) Proofs have always intrigued me. Although there is lot of fun in using a formula to get the answer and feel satisfied, I always found real satisfaction in deriving the formula. Being a computer science major has deprived me of such opportunities. But thanks to the stats course I get to sit and prove stuff again :-)
I am sharing two of the proofs for which I did not find straightforward solutions online. I hope it helps others who are trying to understand the proofs.