
This is the SVM

as SVM, it prefer more generalize line separator(the one in the middle), over the other, because the other tend to overfit.

This is way SVM is less overfit over most of the algorithm

It find a line in such a way, that the line doesn't tend to the data so much, while still maintaining its consistency.

So we know that general function to draw line is y=mx+b

the one above is plane function, that takes a weight parameters of x and b as the bias unit, the linear, constant value that acts to move the plane separatorÂ in and out.

Suppose we have the binary classifier y, then we want to know the furthest distance from the data, while keeping its consistency

The smaller value of w, the greater the distance between x1 and x2

So we're doing the omega, we fetch the maximum, by using quadratic programming.

But what matters is, because alpha tend to be zero, mostly 0, then just some of the vector,
support vector
, matters to the machine

if you see at the formula, we multiply xx and yy. that is , we're using the similarity between them and count how similar, are them

The graph on the bottom right shows that those data points are not matters for the line separator, and the equation will produce the alpha equals zero. Here we have seen some nearest neighbors, but in svm, we're using the locally weighted(just taking those within the area of separator)

Here's what we do, we're going to fill in x and y in general phi function.

this will return the fibonacci tree of degree 2, and simplify by XtY

Kernel can be some arbitrary function that satisfied Mercer Condition