MarxSchmarx
9th October 2010, 07:54
OK, here is a debate i am having with a colleague.
Suppose we have an application that takes X seconds to run on a single core CPU of 4 Ghz.
Now suppose I have to execute this application N different times. So on a single core CPU it takes N*X seconds to run through what I need to do.
Now, it strikes me that if a GPU has Y cores, and each core has say Z Ghz, with Z << 4 (e.g., Z Ghz= 200 Mhz), then if I port the application to run on a single GPU core, on a single GPU core it will take 4*X/Z seconds to execute the application. Since Z << 4 this means it will take a lot longer to run it on a single GPU core.
OK, so far so good. What I'm arguing is that since the GPU has, say, hundreds or thousands of the core (let's say Y cores), the execution time is 4 * X/(Y*Z). Depending on the value of Y, we have a situation where N*X < 4 * X/(Y*Z), i.e,. N < 4/(Y*Z). My point being that because the application has to be executed N times, even though it is slower per core the saving accrue by virtue of the fact that we have to execute the same program many times across Y cores.
My colleague, however, insists that this isn't the way to go - that we should figure out how to slow the execution time X by a factor Y (or potentially somewhat less, let's say Y/2).
What do you alll think? WHat is the resolution to this debate?
Edit: Whoops, Y needs to be in the denominator :þ
Suppose we have an application that takes X seconds to run on a single core CPU of 4 Ghz.
Now suppose I have to execute this application N different times. So on a single core CPU it takes N*X seconds to run through what I need to do.
Now, it strikes me that if a GPU has Y cores, and each core has say Z Ghz, with Z << 4 (e.g., Z Ghz= 200 Mhz), then if I port the application to run on a single GPU core, on a single GPU core it will take 4*X/Z seconds to execute the application. Since Z << 4 this means it will take a lot longer to run it on a single GPU core.
OK, so far so good. What I'm arguing is that since the GPU has, say, hundreds or thousands of the core (let's say Y cores), the execution time is 4 * X/(Y*Z). Depending on the value of Y, we have a situation where N*X < 4 * X/(Y*Z), i.e,. N < 4/(Y*Z). My point being that because the application has to be executed N times, even though it is slower per core the saving accrue by virtue of the fact that we have to execute the same program many times across Y cores.
My colleague, however, insists that this isn't the way to go - that we should figure out how to slow the execution time X by a factor Y (or potentially somewhat less, let's say Y/2).
What do you alll think? WHat is the resolution to this debate?
Edit: Whoops, Y needs to be in the denominator :þ