Abstract
Distributed computing systems have been widely used as the amount of data grows exponentially in the era of information explosion. Job completion time (JCT) is a major metric for assessing their effectiveness. How to reduce the JCT for these systems through reasonable scheduling has become a hot issue in both industry and academia. Data skew is a common phenomenon that can compromise the performance of such distributed computing systems. This paper proposes SMART, which can effectively reduce the JCT through handling the data skew during the reducing phase. SMART predicts the size of reduce tasks based on part of the completed map tasks and then enforces largest-first scheduling in the reducing phase according to the predicted reduce task size. SMART makes minimal modifications to the original Hadoop with only 20 additional lines of code and is readily deployable. The robustness and the effectiveness of SMART have been evaluated with a real-world cluster against a large number of datasets. Experiments show that SMART reduces JCT by up to 6.47%, 9.26%, and 13.66% for Terasort, WordCount and InvertedIndex respectively with the Purdue MapReduce benchmarks suite (PUMA) dataset.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Grandl R, Kandula S, Rao S, Akella A, Kulkarni J. Graphene: Packing and dependency-aware scheduling for data-parallel clusters. In Proc. the 12th USENIX Conference on Operating Systems Design and Implementation, Nov. 2016, pp.81-97.
Chang H, Kodialam M, Kompella R R, Lakshman T V, Lee M, Mukherjee S. Scheduling in MapReduce-like systems for fast completion time. In Proc. the 30th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, Apr. 2011, pp.3074-3082. DOI: 10.1109/IN-FCOM.2011.5935152.
Peng Y, Chen K, Wang G, Bai W, Zhao Y, Wang H, Geng Y, Ma Z, Gu L. Towards comprehensive traffic forecasting in cloud computing: Design and application. IEEE/ACM Transactions on Networking, 2016, 24(4): 2210-2222. DOI: https://doi.org/10.1109/TNET.2015.2458892.
Ullah I, Khan M S, Amir M, Kim J, Kim S M. LSTPD: Least slack time-based preemptive deadline constraint scheduler for Hadoop clusters. IEEE Access, 2020, 8: 111751-111762. DOI: https://doi.org/10.1109/ACCESS.2020.3002565.
Gao Y, Zhou Y, Zhou B, Shi L, Zhang J. Handling data skew in MapReduce cluster by using partition tuning. Journal of Healthcare Engineering, 2017, 2017: Article No. 1425102. DOI: https://doi.org/10.1155/2017/1425102.
Hammoud M, Sakr M F. Locality-aware reduce task scheduling for MapReduce. In Proc. the 3rd IEEE International Conference on Cloud Computing Technology and Science, Nov. 29-Dec. 1, 2011, pp.570-576. DOI: 10.1109/CloudCom.2011.87.
Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Commun. ACM, 2008, 51(1): 107-113. DOI: https://doi.org/10.1145/1327452.1327492.
Ahmad F, Lee S, Thottethodi M, Vijaykumar T. PUMA: Purdue MapReduce benchmarks suite. Technical Report, Purdue University, 2012. https://engineering.purdue.edu/~puma/puma.pdf, May 2022.
Graham R L. Bounds for certain multiprocessing anomalies. The Bell System Technical Journal, 1966, 45(9): 1563-1581. DOI: https://doi.org/10.1002/j.1538-7305.1966.tb01709.x.
Mosharaf C, Ion S. Coow: A networking abstraction for cluster applications. In Proc. the 11th ACM Workshop on Hot Topics in Networks, Oct. 2012, pp.31-36. DOI: 10.1145/2390231.2390237.
Kwon Y, Balazinska M, Howe B, Rolia J. Skew-Tune: Mitigating skew in MapReduce applications. In Proc. the 2012 ACM SIGMOD International Conference on Management of Data, May 2012, pp.25-36. DOI: 10.1145/2213836.2213840.
Chowdhury M, Zhong Y, Stoica I. Efficient coow scheduling with Varys. ACM SIGCOMM Comput. Commun. Rev., 2014, 44(4): 443-454. DOI: https://doi.org/10.1145/2740070.2626315.
Chowdhury M, Stoica I. Efficient coow scheduling without prior knowledge. In Proc. the 2015 ACM Conference on Special Interest Group on Data Communication, Aug. 2015, pp.393-406. DOI: 10.1145/2785956.2787480.
Mao H, Schwarzkopf M, Venkatakrishnan S B, Meng Z, Alizadeh M. Learning scheduling algorithms for data processing clusters. In Proc. the ACM Special Interest Group on Data Communication, Aug. 2019, pp.270-288. DOI: https://doi.org/10.1145/3341302.3342080.
Nguyen K, Wang K, Bu Y, Fang L, Hu J, Xu G. FACADE: A compiler and runtime for (almost) object-bounded big data applications. In Proc. the 20th International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2015, pp.675-690. DOI: 10.1145/2694344.2694345.
Nguyen K, Fang L, Xu G, Demsky B, Lu S, Alamian S, Mutlu O. Yak: A high-performance big-data-friendly garbage collector. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, November 2016, pp.349-365.
Rasmussen A, Lam V T, Conley M, Porter G, Kapoor R, Vahdat A. Themis: An I/O-efficient MapReduce. In Proc. the 3rd ACM Symposium on Cloud Computing, Oct. 2012, Article No. 13. DOI: https://doi.org/10.1145/2391229.2391242.
Rao S, Ramakrishnan R, Silberstein A, Ovsiannikov M, Reeves D. Sailfish: A framework for large scale data processing. In Proc. the 3rd ACM Symposium on Cloud Computing, Oct. 2012, Article No. 4. DOI: https://doi.org/10.1145/2391229.2391233.
Zhang H, Cho B, Seyfe E, Ching A, Freedman M J. Riffle: Optimized shuffle service for large-scale data analytics. In Proc. the 13th EuroSys Conference, Apr. 2018, Article No. 43. DOI: https://doi.org/10.1145/3190508.3190534.
Zaharia M, Borthakur D, Sarma S J, Elmeleegy K, Shenker S, Stoica I. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proc. the 5th European Conference on Computer Systems, Apr. 2010, pp.265-278. DOI: 10.1145/1755913.1755940.
Ibrahim S, Jin H, Lu L, He B, Antoniu G, Wu S. Maestro: Replica-aware map scheduling for MapReduce. In Proc. the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2012, pp.435-442. DOI: 10.1109/CCGrid.2012.122.
Tang Z, Jiang L, Zhou J, Li K, Li K. A self-adaptive scheduling algorithm for reduce start time. Future Generation Computer Systems, 2015, 43/44: 51-60. DOI: https://doi.org/10.1016/j.future.2014.08.011.
Ibrahim S, Jin H, Lu L, Wu S, He B, Qi L. LEEN: Locality/fairness-aware key partitioning for MapReduce in the cloud. In Proc. the 2nd IEEE International Conference on Cloud Computing Technology and Science, Nov. 30-Dec. 3, 2010, pp.17-24. DOI: 10.1109/CloudCom.2010.25.
Tan J, Meng X, Zhang L. Coupling task progress for MapReduce resource-aware scheduling. In Proc. the 2013 IEEE INFOCOM, Apr. 2013, pp.1618-1626. DOI: 10.1109/INFCOM.2013.6566958.
Author information
Authors and Affiliations
Corresponding author
Supplementary Information
ESM 1
(PDF 155 kb)
Rights and permissions
About this article
Cite this article
Dong, JQ., He, ZH., Gong, YY. et al. SMART: Speedup Job Completion Time by Scheduling Reduce Tasks. J. Comput. Sci. Technol. 37, 763–778 (2022). https://doi.org/10.1007/s11390-022-2118-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-022-2118-5