BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Energy Efficient Compilation of Irregular Task-Parallel Loops  - K
 rishna Nandivada\, IIT Madras\, India
DTSTART:20180125T140000Z
DTEND:20180125T150000Z
UID:TALK99271@talks.cam.ac.uk
CONTACT:Alan Mycroft
DESCRIPTION:Energy efficient compilation is an important problem for multi
 -core systems. In this context\, irregular programs with task- parallel lo
 ops present interesting challenges: the threads with lesser work-loads (no
 n-critical-threads) wait at the join-points for the thread with maximum wo
 rk-load (critical-thread)\; this leads to significant energy wastage. This
  problem becomes more interesting in the context of multi-socket-multi-cor
 e (MSMC) systems\, where different sockets may run at different frequencie
 s\, but all the cores connected to a socket run at a single frequency. In 
 such a configuration\, even though the load- imbalance among the cores may
  be significant\, an MSMC-oblivious technique may miss the opportunities t
 o reduce energy consumption\, if the load-imbalance across the sockets is 
 minimal. This problem becomes further challenging in the presence of mutua
 l-exclusion\, where scaling the frequencies of a socket executing the non-
 critical-threads can impact the execution time of the critical-threads. In
  this paper\, we propose a scheme (X10Ergy) to obtain energy gains with mi
 nimal impact on the execution time\, for task-parallel languages like X10\
 , HJ\, and so on. X10Ergy takes as input a loop-chunked program (parallel-
 loop iterations divided into chunks and each chunk is executed by a unique
  thread). X10Ergy follows a mixed compile-time + runtime approach that (i)
  uses static analysis to efficiently compute the work-load of each chunk a
 t runtime\, (ii) computes the “remaining” work-load of the chunks runn
 ing on the cores of each socket at regular intervals and tunes the frequen
 cy of the sockets accordingly\, (iii) groups the threads into different so
 ckets (based on the remaining work-load of their respective chunks)\, and 
 (iv) in the presence of atomic-blocks\, models the effect of frequency-sca
 ling on the critical-thread. We implemented X10Ergy for X10 and have obtai
 ned encouraging results for the IMSuite kernels. \n\nIn addition\, I will 
 also briefly talk about our experience in optimizing recursive task parall
 el (RTP) programs.
LOCATION:GS15
END:VEVENT
END:VCALENDAR