Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

Georganas, Evangelos; Avancha, Sasikanth; Banerjee, Kunal; Kalamkar, Dhiraj; Henry, Greg; Pabst, Hans; Heinecke, Alexander

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1808.05567 (cs)

[Submitted on 16 Aug 2018 (v1), last revised 20 Aug 2018 (this version, v2)]

Title:Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

Authors:Evangelos Georganas, Sasikanth Avancha, Kunal Banerjee, Dhiraj Kalamkar, Greg Henry, Hans Pabst, Alexander Heinecke

View PDF

Abstract:Convolution layers are prevalent in many classes of deep neural networks, including Convolutional Neural Networks (CNNs) which provide state-of-the-art results for tasks like image recognition, neural machine translation and speech recognition. The computationally expensive nature of a convolution operation has led to the proliferation of implementations including matrix-matrix multiplication formulation, and direct convolution primarily targeting GPUs. In this paper, we introduce direct convolution kernels for x86 architectures, in particular for Xeon and XeonPhi systems, which are implemented via a dynamic compilation approach. Our JIT-based implementation shows close to theoretical peak performance, depending on the setting and the CPU architecture at hand. We additionally demonstrate how these JIT-optimized kernels can be integrated into a lightweight multi-node graph execution model. This illustrates that single- and multi-node runs yield high efficiencies and high image-throughputs when executing state-of-the-art image recognition tasks on CPUs.

Comments:	Accepted to SC18
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:1808.05567 [cs.DC]
	(or arXiv:1808.05567v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1808.05567

Submission history

From: Evangelos Georganas [view email]
[v1] Thu, 16 Aug 2018 16:18:44 UTC (397 KB)
[v2] Mon, 20 Aug 2018 21:08:32 UTC (401 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DC

< prev | next >

new | recent | 2018-08

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Evangelos Georganas
Sasikanth Avancha
Kunal Banerjee
Dhiraj D. Kalamkar
Greg Henry

…

export BibTeX citation

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators