[Questions about Machine Learning] Chapter II Machine Learning Fundamentals

In this chapter, we will discuss some basic knowledge about machine learning

Q. What is regression? What is classification?

Q. What is supervised learning? What is semi-supervised learning? What is weakly-supervised learning? What is unsupervised learning?

Q. What are the steps for a supervised learning?

Q. What is multi instance learning?

Q. What is K-Nearest-Neighbor or KNN? What is SVM?

Q. What is neural network? What are the types of difference neural networks?

Q. What is local optimum? What is global optimum?

One day, Plato asked Socrates: what is love? Socrates said: I ask you to cross this piece of rice fields, to pick one of the largest and most golden wheat back, but there is a rule: you can not go back, and you can only pick once. So Plato went on. After a long time, he was back with empty hand. Socrates ask him why come back with empty? Plato said: When I walked in the field, I had seen a few particularly special bright wheat, but I always think there may be bigger and better in front, so there is no pick; but when I continue walking, see the wheat, always feel it is not better than the previous which i had seen, so I did not pick anything finally. Socrates said meaningfully: this is love.

Q. What are the advantages and disadvantages of common classification algorithms, such as Bayes, Decision Tree or SVM?

Q. Can the accuracy be a great and comprehensive measurement for classification?

Q. If not, what are the measurements for classification algorithm? and what are for regression algorithm?

Q. What is a good enough classification algorithm?

Q. What is logistic regression? What is Poisson regression?

Q. What are the differences between logistic regression and naïve bayes?

Q. What are the differences between linear regression and logistic regression?

A: Linear Regression: $f(x)=\theta ^{T}x=\theta _{1}x _{1}+\theta _{2}x _{2}+...+\theta _{n}x _{n}$

Logistic Regression: $f(x)=P(y=1|x;\theta )=g(\theta ^{T}x)$,where,$g(z)=\frac{1}{1+e^{-z}}$

Q. What is cost function? Why we need it?

Q. Why cost function can work?

Q. Why cost function have to have a lower bound? or why most cost functions cannot be minus?

Q. What are some common cost function?

Q. Why we use cross entropy to replace quadratic cost?

Q. What is loss function? Why we need it?

Q. What are some common loss function?

Q. Why we use log loss function in logistic regression?

Q. How log loss function measures the loss?

Q. What is gradient decent? why we need it?

Q. What are the advantages and drawbacks of gradient decent?

Q. Still unclear, is there any graph or description?

Q. What are the steps of gradient decent?

Q. How to optimize gradient decent?

Q. What is random gradient decent and batch gradient decent? What are the differences between them?

Q. What is computation graph? How to calculate its derivatives?

Q. What is Linear Discriminant Analysis or LDA?

Q. What are the steps of LDA?

Q. What is PCA (Principal Component Analysis)? and its steps?

Q. What are the differences between LDA and PCA?

Q. What are the advantages and drawbacks of LDA and PCA?

Q. Why we need to reduce the dimension?

Q. What is Kernelized Principal Component Analysis or KPCA?

Q. For machine learning models, what are the usually used measurements?

Q. What are the relations between bias, error, variance and covariance?

Q. What is empirical and generalization error?

Q. What is overfitting? What is underfitting? How to solve them respectively?

Q. What is the purpose of cross validation?

Q. What is k-fold cross validation?

Q.