keyboard_arrow_up
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Parallel Loops on Multicore Architectures

Authors

Nader Khammassi and Jean-Christophe Le Lann, ENSTA Bretagne, France

Abstract

Effective cache utilization is critical to performance in chip-multiprocessor systems (CMP). Modern CMP architectures are based on hierarchical cache topology with varying private and shared caches configurations at different levels. Cache-aware scheduling has become a great design challenge. Many scheduling strategies have been designed to target specific cache configuration. In this paper we introduce a cache hierarchy-aware task scheduling (CHATS) algorithm which adapt to the underlying architecture and its cache topology. The proposed scheduling policy aims to improve cache performance by optimizing spatial and temporal data locality and reducing communication overhead without neglecting load balancing. CHATS has been implemented in the parallel loop construct of XPU framework introduced in previous works [1,7]. We compared CHATS to several popular scheduling policies including dynamic and static scheduling and task-stealing. Experimental results on synthetic and real workloads shows that our scheduling policy achieves up to 25% execution speed up compared to OpenMP, TBB and Cilk++ parallel loop implementations. We use our parallel loop implementation in two popular applications from the PARSEC benchmark suite and we compare it to the provided OpenMP, TBB and PThreads version on different architectures.

Keywords

Cache-aware Scheduling, Cache Locality, Parallel Loops, Multicore, Hierarchical Cache

Full Text  Volume 4, Number 2