What are the general kernel functions

What is a kernel and what makes it different from other functions?

There seem to be many machine learning algorithms that rely on kernel functions. SVMs and NNs, to name just two. So what is the definition of a kernel function and what requirements must be met for it to be valid?


For x, y on S, certain functions K (x, y) can be expressed as an inner product (usually in another space). K is often referred to as the kernel or kernel function. The word kernel is used differently throughout math, but this is the most common use in machine learning.

The kernel trick is a way of mapping observations from a general set S into an inner product space V (equipped with its natural norm) without ever having to calculate the mapping explicitly, in the hope that the observations in V have a meaningful linear structure This is important for efficiency (very fast calculation of point products in a very large dimensional space) and for practicality (we can convert linear ML algorithms into nonlinear ML algorithms).

For a function K to be considered a valid kernel, it must meet the conditions of Mercer. In practice this means that we have to make sure that the kernel matrix (computation of the kernel product for each data point you have) is always positive and semidefinite. This ensures that the training objective function is convex, a very important property.

Of Williams, Christopher KI and Carl Edward Rasmussen. "Gaussian Processes for Machine Learning". 3 (2006). Page 80 .


Also kernel = kernel function.

Kernels used in machine learning algorithms tend to have more properties, such as: B. a positive semi-definiteness.

I'll try to find a less technical explanation.

First start with the dot product between two vectors. This shows you how "similar" the vectors are. If the vectors represent points in your data set, the scalar product indicates whether they are similar or not.

In some (many) cases, however, the scalar product is not the best similarity metric. For example:

  • Points may be similar to low-scoring products for other reasons.
  • Data items may not be well represented as points.

So instead of using the dot product, you are using a "kernel" which is just a function that takes two points and gives you a measure of their similarity. I'm not 100 percent sure what technical requirements a function has to meet in order to be a kernel, but that's the idea.

A very nice thing is that the kernel can help you bring your domain knowledge into the problem, in the sense that if you know something about the domain, you can say that two points are the same for xyz reasons.

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.