A Unifying Theory of Learning: DL Meets Kernel Methods

I. de Zarzà

Résumé

Wir führen ein Framework ein, um Kernel-Approximationen in der Mini-Batch-Einstellung mit Stochastic Gradient Descent (SGD) als Alternative zu Deep Learning zu verwenden. Basierend auf Random Kitchen Sinks bieten wir eine C++ Bibliothek für ML in großem Maßstab. Es enthält eine CPU-optimierte Implementierung des Algorithmus in Le et al. 2013, mit der ungefähre Kernel-Erweiterungen in logarithmisch linearer Zeit berechnet werden können. Der Algorithmus erfordert die Berechnung des Produkts der Matrizen Walsh Hadamard. Es wurde ein cachefreundlicher Fast Walsh Hadamard entwickelt, der eine überzeugende Geschwindigkeit erreicht und die aktuellen Methoden auf dem neuesten Stand der Technik übertrifft. McKernel legt die Grundlage für eine neue Lernarchitektur, die es ermöglicht, eine nichtlineare Klassifizierung in großem Maßstab zu erhalten, die Blitzkernerweiterungen und einen linearen Klassifizierer kombiniert. Es funktioniert in der Mini-Batch-Einstellung analog zu neuronalen Netzen. Wir zeigen die Gültigkeit unserer Methode durch umfangreiche Experimente mit MNIST und FASHION MNIST. Wir schlagen auch eine neue Architektur vor, um die Überparametrisierung in neuronalen Netzen zu reduzieren. Es wird ein Operand für die schnelle Berechnung im Rahmen von Deep Learning eingeführt, der gelernte Gewichte nutzt. Der Formalismus wird ausführlich beschrieben und bietet sowohl eine genaue Aufklärung der Mechanik als auch der theoretischen Implikationen.

We introduce a framework to use kernel approximates in the mini-batch setting with Stochastic Gradient Descent (SGD) as an alternative to Deep Learning. Based on Random Kitchen Sinks, we provide a C++ library for Large-scale ML. It contains a CPU optimized implementation of the algorithm in Le et al. 2013, that allows the computation of approximated kernel expansions in log-linear time. The algorithm requires to compute the product of matrices Walsh Hadamard. A cache friendly Fast Walsh Hadamard that achieves compelling speed and outperforms current state-of-the-art methods has been developed. McKernel establishes the foundation of a new architecture of learning that allows to obtain large-scale non-linear classification combining lightning kernel expansions and a linear classifier. It travails in the mini-batch setting working analogously to Neural Networks. We show the validity of our method through extensive experiments on MNIST and FASHION MNIST. We also propose a new architecture to reduce over-parametrization in Neural Networks. It introduces an operand for rapid computation in the framework of Deep Learning that leverages learned weights. The formalism is described in detail providing both an accurate elucidation of the mechanics and the theoretical implications.

Nous introduisons un framework pour utiliser les méthodes à noyaux dans le paramètre mini-batch avec Stochastic Gradient Descent (SGD) comme alternative à Deep Learning. Basé sur Random Kitchen Sinks, nous fournissons une bibliothèque C ++ pour le ML à grande échelle. Il contient une implémentation optimisée pour le processeur de l'algorithme de Le et al. 2013, qui permet le calcul des extensions approximatives du noyau en temps log-linéaire. L'algorithme nécessite de calculer le produit des matrices de Walsh Hadamard. Un Fast Walsh Hadamard compatible avec le cache, qui atteint une vitesse irréprochable et surpasse les méthodes actuelles de pointe, a été développé. McKernel jette les bases d'une nouvelle architecture d'apprentissage qui permet d'obtenir une classification non linéaire à grande échelle combinant des méthodes à noyaux rapides et un classificateur linéaire. Il fonctionne dans le cadre du mini-lot fonctionnant de manière analogue aux réseaux de neurones. Nous montrons la validité de notre méthode à travers des expériences approfondies sur MNIST et FASHION MNIST. Nous proposons également une nouvelle architecture pour réduire la sur-paramétrisation dans les réseaux de neurones. Il introduit un opérande pour le calcul rapide dans le cadre du Deep Learning qui exploite les poids appris. Le formalisme est décrit en détail, fournissant à la fois une élucidation précise de la mécanique et des implications théoriques.

Eine einheitliche Theorie des Lernens: DL trifft Kernel-Methoden

A Unifying Theory of Learning: DL Meets Kernel Methods

Une théorie unifiante de l’apprentissage: DL rencontre méthodes à noyaux

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager