python 2.7 - User defined SVM kernel with scikit-learn -
i encounter problem when defining kernel myself in scikit-learn. define myself gaussian kernel , able fit svm not use make prediction.
more precisely have following code
from sklearn.datasets import load_digits sklearn.svm import svc sklearn.utils import shuffle import scipy.sparse sparse import numpy np digits = load_digits(2) x, y = shuffle(digits.data, digits.target) gamma = 1.0 x_train, x_test = x[:100, :], x[100:, :] y_train, y_test = y[:100], y[100:] m1 = svc(kernel='rbf',gamma=1) m1.fit(x_train, y_train) m1.predict(x_test) def my_kernel(x,y): d = x - y c = np.dot(d,d.t) return np.exp(-gamma*c) m2 = svc(kernel=my_kernel) m2.fit(x_train, y_train) m2.predict(x_test)
m1 , m2 should same, m2.predict(x_test) return error :
operands not broadcast shapes (260,64) (100,64)
i don't understand problem.
furthermore if x 1 data point, m1.predict(x) gives +1/-1 result, expexcted, m2.predict(x) gives array of +1/-1... no idea why.
the error @ x - y
line. cannot subtract 2 that, because first dimensions of both may not equal. here how rbf
kernel implemented in scikit-learn, taken here (only keeping essentials):
def row_norms(x, squared=false): if issparse(x): norms = csr_row_norms(x) else: norms = np.einsum('ij,ij->i', x, x) if not squared: np.sqrt(norms, norms) return norms def euclidean_distances(x, y=none, y_norm_squared=none, squared=false): """ considering rows of x (and y=x) vectors, compute distance matrix between each pair of vectors. [...] returns ------- distances : {array, sparse matrix}, shape (n_samples_1, n_samples_2) """ x, y = check_pairwise_arrays(x, y) if y_norm_squared not none: yy = check_array(y_norm_squared) if yy.shape != (1, y.shape[0]): raise valueerror( "incompatible dimensions y , y_norm_squared") else: yy = row_norms(y, squared=true)[np.newaxis, :] if x y: # shortcut in common case euclidean_distances(x, x) xx = yy.t else: xx = row_norms(x, squared=true)[:, np.newaxis] distances = safe_sparse_dot(x, y.t, dense_output=true) distances *= -2 distances += xx distances += yy np.maximum(distances, 0, out=distances) if x y: # ensure distances between vectors , set 0.0. # may not case due floating point rounding errors. distances.flat[::distances.shape[0] + 1] = 0.0 return distances if squared else np.sqrt(distances, out=distances) def rbf_kernel(x, y=none, gamma=none): x, y = check_pairwise_arrays(x, y) if gamma none: gamma = 1.0 / x.shape[1] k = euclidean_distances(x, y, squared=true) k *= -gamma np.exp(k, k) # exponentiate k in-place return k
you might want dig deeper code, @ comments euclidean_distances
function. naive implementation of you're trying achieve this:
def my_kernel(x,y): d = np.zeros((x.shape[0], y.shape[0])) i, row_x in enumerate(x): j, row_y in enumerate(y): d[i, j] = np.exp(-gamma * np.linalg.norm(row_x - row_y)) return d
Comments
Post a Comment