Hadoop-stock is a reliable, scalable, and open source implementation of the MapReduce framework to process data-intensive applications in a distributed and parallel environment. In a common environment between multiple users with various types of applications, due to the lower number of resources than the number of jobs, there will be multi-wave jobs. Shuffling as the longest phase of running a job has the most adverse effect (network traffic) on the job execution time. On one hand, due to the dependency of shuffle phase to reduce task, the shuffle phase could not start until the reduce task being scheduled. On the other hand, the static scheduling of reduce tasks results in loss of reduce slots. This paper presents our ongoing effort in the designing an intelligent service in which the sort/merge and shuffle phases are completely independent of map and reduce phases and could act in parallel with map and reduce phases. This parallelism mitigates the job completion time.

POSTER: An intelligent framework to parallelize hadoop phases

Conti M.
2018

Abstract

Hadoop-stock is a reliable, scalable, and open source implementation of the MapReduce framework to process data-intensive applications in a distributed and parallel environment. In a common environment between multiple users with various types of applications, due to the lower number of resources than the number of jobs, there will be multi-wave jobs. Shuffling as the longest phase of running a job has the most adverse effect (network traffic) on the job execution time. On one hand, due to the dependency of shuffle phase to reduce task, the shuffle phase could not start until the reduce task being scheduled. On the other hand, the static scheduling of reduce tasks results in loss of reduce slots. This paper presents our ongoing effort in the designing an intelligent service in which the sort/merge and shuffle phases are completely independent of map and reduce phases and could act in parallel with map and reduce phases. This parallelism mitigates the job completion time.
2018
HPDC 2018 - Proceedings of The 27th International Symposium on High-Performance Parallel and Distributed Computing Posters/Doctoral Consortium
9781450358996
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3340663
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 23
social impact