Energy Efficient Compilation of Irregular Task-Parallel Loops
- 👤 Speaker: Krishna Nandivada, IIT Madras, India
- 📅 Date & Time: Thursday 25 January 2018, 14:00 - 15:00
- 📍 Venue: GS15
Abstract
Energy efficient compilation is an important problem for multi-core systems. In this context, irregular programs with task- parallel loops present interesting challenges: the threads with lesser work-loads (non-critical-threads) wait at the join-points for the thread with maximum work-load (critical-thread); this leads to significant energy wastage. This problem becomes more interesting in the context of multi-socket-multi-core (MSMC) systems, where different sockets may run at different frequencies, but all the cores connected to a socket run at a single frequency. In such a configuration, even though the load- imbalance among the cores may be significant, an MSMC -oblivious technique may miss the opportunities to reduce energy consumption, if the load-imbalance across the sockets is minimal. This problem becomes further challenging in the presence of mutual-exclusion, where scaling the frequencies of a socket executing the non-critical-threads can impact the execution time of the critical-threads. In this paper, we propose a scheme (X10Ergy) to obtain energy gains with minimal impact on the execution time, for task-parallel languages like X10 , HJ, and so on. X10 Ergy takes as input a loop-chunked program (parallel-loop iterations divided into chunks and each chunk is executed by a unique thread). X10 Ergy follows a mixed compile-time + runtime approach that (i) uses static analysis to efficiently compute the work-load of each chunk at runtime, (ii) computes the “remaining” work-load of the chunks running on the cores of each socket at regular intervals and tunes the frequency of the sockets accordingly, (iii) groups the threads into different sockets (based on the remaining work-load of their respective chunks), and (iv) in the presence of atomic-blocks, models the effect of frequency-scaling on the critical-thread. We implemented X10 Ergy for X10 and have obtained encouraging results for the IMSuite kernels.
In addition, I will also briefly talk about our experience in optimizing recursive task parallel (RTP) programs.
Series This talk is part of the Computer Laboratory Programming Research Group Seminar series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge talks
- Computer Laboratory Computer Architecture Group Meeting
- Computer Laboratory Programming Research Group Seminar
- Computing and Mathematics
- Department of Computer Science and Technology talks and seminars
- GS15
- Interested Talks
- Logic and Semantics Seminar (Computer Laboratory)
- Martin's interesting talks
- School of Technology
- tcw57’s list
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Thursday 25 January 2018, 14:00-15:00