Skip to main content
Cornell University
Learn about arXiv becoming an independent nonprofit.
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs > arXiv:1703.04757

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Computer Science > Machine Learning

arXiv:1703.04757 (cs)
[Submitted on 14 Mar 2017 (v1), last revised 11 Mar 2018 (this version, v3)]

Title:Separation of time scales and direct computation of weights in deep neural networks

Authors:Nima Dehmamy, Neda Rohani, Aggelos Katsaggelos
View a PDF of the paper titled Separation of time scales and direct computation of weights in deep neural networks, by Nima Dehmamy and 1 other authors
View PDF
Abstract:Artificial intelligence is revolutionizing our lives at an ever increasing pace. At the heart of this revolution is the recent advancements in deep neural networks (DNN), learning to perform sophisticated, high-level tasks. However, training DNNs requires massive amounts of data and is very computationally intensive. Gaining analytical understanding of the solutions found by DNNs can help us devise more efficient training algorithms, replacing the commonly used mthod of stochastic gradient descent (SGD). We analyze the dynamics of SGD and show that, indeed, direct computation of the solutions is possible in many cases. We show that a high performing setup used in DNNs introduces a separation of time-scales in the training dynamics, allowing SGD to train layers from the lowest (closest to input) to the highest. We then show that for each layer, the distribution of solutions found by SGD can be estimated using a class-based principal component analysis (PCA) of the layer's input. This finding allows us to forgo SGD entirely and directly derive the DNN parameters using this class-based PCA, which can be well estimated using significantly less data than SGD. We implement these results on image datasets MNIST, CIFAR10 and CIFAR100 and find that, in fact, layers derived using our class-based PCA perform comparable or superior to neural networks of the same size and architecture trained using SGD. We also confirm that the class-based PCA often converges using a fraction of the data required for SGD. Thus, using our method training time can be reduced both by requiring less training data than SGD, and by eliminating layers in the costly backpropagation step of the training.
Subjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
Cite as: arXiv:1703.04757 [cs.LG]
  (or arXiv:1703.04757v3 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.1703.04757
arXiv-issued DOI via DataCite

Submission history

From: Nima Dehmamy [view email]
[v1] Tue, 14 Mar 2017 22:13:41 UTC (375 KB)
[v2] Wed, 10 Jan 2018 07:53:52 UTC (2,422 KB)
[v3] Sun, 11 Mar 2018 20:30:28 UTC (2,938 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled Separation of time scales and direct computation of weights in deep neural networks, by Nima Dehmamy and 1 other authors
  • View PDF
  • TeX Source
view license
Current browse context:
cs.LG
< prev   |   next >
new | recent | 2017-03
Change to browse by:
cs
physics
physics.data-an
stat
stat.ML

References & Citations

  • INSPIRE HEP
  • NASA ADS
  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

listing | bibtex
Nima Dehmamy
Neda Rohani
Aggelos K. Katsaggelos
export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status