BigDL: A Distributed Deep Learning Framework for Big Data

Dai, Jason; Wang, Yiheng; Qiu, Xin; Ding, Ding; Zhang, Yao; Wang, Yanzhang; Jia, Xianyan; Zhang, Cherry; Wan, Yan; Li, Zhichao; Wang, Jiao; Huang, Shengsheng; Wu, Zhongyuan; Wang, Yang; Yang, Yuhao; She, Bowen; Shi, Dongjie; Lu, Qi; Huang, Kai; Song, Guoqiong

doi:10.1145/3357223.3362707

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1804.05839 (cs)

[Submitted on 16 Apr 2018 (v1), last revised 5 Nov 2019 (this version, v4)]

Title:BigDL: A Distributed Deep Learning Framework for Big Data

View PDF

Abstract:This paper presents BigDL (a distributed deep learning framework for Apache Spark), which has been used by a variety of users in the industry for building deep learning applications on production big data platforms. It allows deep learning applications to run on the Apache Hadoop/Spark cluster so as to directly process the production data, and as a part of the end-to-end data analysis pipeline for deployment and management. Unlike existing deep learning frameworks, BigDL implements distributed, data parallel training directly on top of the functional compute model (with copy-on-write and coarse-grained operations) of Spark. We also share real-world experience and "war stories" of users that have adopted BigDL to address their challenges(i.e., how to easily build end-to-end data analysis and deep learning pipelines for their production data).

Comments:	In ACM Symposium of Cloud Computing conference (SoCC) 2019
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:1804.05839 [cs.DC]
	(or arXiv:1804.05839v4 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1804.05839
Related DOI:	https://doi.org/10.1145/3357223.3362707

Submission history

From: Jason (Jinquan) Dai [view email]
[v1] Mon, 16 Apr 2018 12:04:03 UTC (1,140 KB)
[v2] Mon, 23 Apr 2018 03:21:14 UTC (1,324 KB)
[v3] Mon, 25 Jun 2018 02:57:37 UTC (1,318 KB)
[v4] Tue, 5 Nov 2019 13:12:43 UTC (3,468 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:BigDL: A Distributed Deep Learning Framework for Big Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:BigDL: A Distributed Deep Learning Framework for Big Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators