Stf CS149 Parallel Programming - Lecture 5&6 - Performance optimization

Lecture 5

Video Deciding granularity is important for dynamic scheduling in parallel programming.

Small granularity leads to better workload distribution but comes with higher synchronization overhead.

Performance optimization: locality, communication and contention.

Reduce costs of communication between:

Shared memory communication. Numa: non-uniform memory access

Message passing blocking send and non-blocking send

Reduce communication is important to achieve max utilization of cpu. Just to keep cpu busy

Roofline model:

To achieve maximum computation throughput GFLOPS/s of cpu or gpus one has to have algorithm that has high operation intensity -> high flops/bytes.

Need to has many computation per byte access unit.