python 2.7 - User defined SVM kernel with scikit-learn -


i encounter problem when defining kernel myself in scikit-learn. define myself gaussian kernel , able fit svm not use make prediction.

more precisely have following code

from sklearn.datasets import load_digits sklearn.svm import svc sklearn.utils import shuffle import scipy.sparse sparse import numpy np   digits = load_digits(2) x, y = shuffle(digits.data, digits.target)  gamma = 1.0   x_train, x_test = x[:100, :], x[100:, :] y_train, y_test = y[:100], y[100:]  m1 = svc(kernel='rbf',gamma=1) m1.fit(x_train, y_train) m1.predict(x_test)  def my_kernel(x,y):     d = x - y     c = np.dot(d,d.t)     return np.exp(-gamma*c)  m2 = svc(kernel=my_kernel) m2.fit(x_train, y_train) m2.predict(x_test) 

m1 , m2 should same, m2.predict(x_test) return error :

operands not broadcast shapes (260,64) (100,64)

i don't understand problem.

furthermore if x 1 data point, m1.predict(x) gives +1/-1 result, expexcted, m2.predict(x) gives array of +1/-1... no idea why.

the error @ x - y line. cannot subtract 2 that, because first dimensions of both may not equal. here how rbf kernel implemented in scikit-learn, taken here (only keeping essentials):

def row_norms(x, squared=false):      if issparse(x):         norms = csr_row_norms(x)     else:         norms = np.einsum('ij,ij->i', x, x)      if not squared:         np.sqrt(norms, norms)     return norms  def euclidean_distances(x, y=none, y_norm_squared=none, squared=false):    """     considering rows of x (and y=x) vectors, compute     distance matrix between each pair of vectors.      [...]       returns     -------     distances : {array, sparse matrix}, shape (n_samples_1, n_samples_2)    """     x, y = check_pairwise_arrays(x, y)      if y_norm_squared not none:         yy = check_array(y_norm_squared)         if yy.shape != (1, y.shape[0]):             raise valueerror(                 "incompatible dimensions y , y_norm_squared")     else:         yy = row_norms(y, squared=true)[np.newaxis, :]      if x y:  # shortcut in common case euclidean_distances(x, x)         xx = yy.t     else:         xx = row_norms(x, squared=true)[:, np.newaxis]      distances = safe_sparse_dot(x, y.t, dense_output=true)     distances *= -2     distances += xx     distances += yy     np.maximum(distances, 0, out=distances)      if x y:         # ensure distances between vectors , set 0.0.         # may not case due floating point rounding errors.         distances.flat[::distances.shape[0] + 1] = 0.0      return distances if squared else np.sqrt(distances, out=distances)  def rbf_kernel(x, y=none, gamma=none):      x, y = check_pairwise_arrays(x, y)     if gamma none:         gamma = 1.0 / x.shape[1]      k = euclidean_distances(x, y, squared=true)     k *= -gamma     np.exp(k, k)    # exponentiate k in-place     return k 

you might want dig deeper code, @ comments euclidean_distances function. naive implementation of you're trying achieve this:

def my_kernel(x,y):     d = np.zeros((x.shape[0], y.shape[0]))     i, row_x in enumerate(x):         j, row_y in enumerate(y):             d[i, j] = np.exp(-gamma * np.linalg.norm(row_x - row_y))      return d 

Comments

Popular posts from this blog

python - pip install -U PySide error -

arrays - C++ error: a brace-enclosed initializer is not allowed here before ‘{’ token -

apache - setting document root in antoher partition on ubuntu -