Homework submission instructions
• Submit a written report to the homework by email.
Requirements
In this homework you will study the reduction technique in OpenCL. Reduction is very common in parallel programming. You’ll also learn to use common debugging techniques.
1. Read the attached sample program and answer the following questions.
a. The attached sample program is from AMD APP SDK 3.0. The program includes the following files:
i. [login to view URL]
ii. [login to view URL]
iii. [login to view URL]
iv. [login to view URL]
v. [login to view URL]
vi. [login to view URL]
vii. [login to view URL]
b. You can download and install AMD APP SDK. Most samples should work on non-AMD processors.
2. Questions:
a. How many data item are processed?
b. How many work items are created?
c. What is the work group size?
d. How many work groups are created?
e. Briefly describe the key ideas in the reduction process as implemented in Reduction_Kernels.cl. How is the sum calculated? For the work-item with global ID 0, how many additions does it perform?
f. In [login to view URL], what is the purpose of barrier(CL_LOCAL_MEM_FENCE)?
g. Briefly describe how the data array is transferred from host memory to compute device memory. Is the buffer object on compute device memory or host memory? Point out which line of the code actually trigger the data transfer.
h. In [login to view URL], why do we need to add the values in the array outMapPtr? Who provides the values in the array outMapPtr?
output = 0;
for(int i = 0; i < numBlocks * VECTOR_SIZE; ++i)
{
output += outMapPtr[i];
}