I was wondering whether there is some neat way of coding for multithreading.
First of all, in Matlab, "nmutilcore=min([12 feature('numcores')])" can be used to set automatically the number of cores per worker (for the 2012 version). But under GAUSS, I have to manually declare how many threads to be run by the local machine.
Second, suppose I have some (part of) code that needs to be run simultaneously. So I have to write something like this:
ThreadStat mycode;
ThreadStat mycode;
...
ThreadJoin;
This can be sort of ugly in terms of coding. Is there a way to use loop to simplify coding? For example (suppose mycode can be written as mycode(i)),
for i(1, numberofthreads,1)
ThreadStat mycode(i)
endfor;
ThreadJoin;
3 Answers
0
Parallel for loops are currently being implemented in GAUSS. This will, I think, take care of all of your concerns. Until then, here are a couple of helpful things to know:
You can find out how many processors are on your computer by getting the value of a system environment variable. On Windows, the variable is: NUMBER_OF_PROCESSORS.
n_cores = envget("NUMBER_OF_PROCESSORS");
When thinking about setting the maximum number of threads, there are a few separate caps to keep in mind. The most obvious is controlling the number of GAUSS level threads that you create with threadStat, threadBegin, etc.
However, GAUSS will automatically thread many operations with no user involvement. By default, GAUSS will choose the most efficient number of threads to create dynamically based upon the size of the data, the type of operation (matrix multiply, linear solve, etc) and the number of cores on your machine.
There is an undocumented function that you can use to control this, however, if you would like. Here is an example of its usage:
new_cap = { 4, 1 }; old_cap = sysstate(41, new_cap);
This statement will set the internal thread-cap (for GAUSS automatic threading not threadBegin or threadStats) to be 4 threads when you are NOT inside of a threadBegin or threadStat statement, but decrease to only 1 thread when you are inside a threadBegin or threadStat.
After calling the sysstate function with that case, you need to create a GAUSS level threadset (threadBegin, etc) for the settings to be activated.
0
I wonder how "Parallel for loops are currently being implemented in GAUSS." I have the latest GAUSS 64 bit version on a 32-core machine. When using the "for" loop for simulations, I can check that typically only a tiny percent of the CPU cores are used. If I manually do the multithreading, by writing 32 "ThreadStat"s to split the total number of simulations, then it will take most of the cores. sysstate(41,1) gives {32,2}, which I assume that it means the internal thread-cap is 32 and it decreases to 2 when inside ThreadStat or ThreadBegin. So the puzzle is a typical "for" loop does not utilizes all the 32 cores.
When using "parfor" under Matlab, it is almost always a full 100% loading of 12 out of the 32 cores. (Matlab 2012 only allows for a maximum of 12 cores per worker; if I run three instances of the same code with "parfor" loops, then all the 32 cores will be used at 100% load.)
0
Parallel for loops have not yet been implemented, they are being implemented right now. i.e. the coding is underway for future release.
Your Answer
3 Answers
Parallel for loops are currently being implemented in GAUSS. This will, I think, take care of all of your concerns. Until then, here are a couple of helpful things to know:
You can find out how many processors are on your computer by getting the value of a system environment variable. On Windows, the variable is: NUMBER_OF_PROCESSORS.
n_cores = envget("NUMBER_OF_PROCESSORS");
When thinking about setting the maximum number of threads, there are a few separate caps to keep in mind. The most obvious is controlling the number of GAUSS level threads that you create with threadStat, threadBegin, etc.
However, GAUSS will automatically thread many operations with no user involvement. By default, GAUSS will choose the most efficient number of threads to create dynamically based upon the size of the data, the type of operation (matrix multiply, linear solve, etc) and the number of cores on your machine.
There is an undocumented function that you can use to control this, however, if you would like. Here is an example of its usage:
new_cap = { 4, 1 }; old_cap = sysstate(41, new_cap);
This statement will set the internal thread-cap (for GAUSS automatic threading not threadBegin or threadStats) to be 4 threads when you are NOT inside of a threadBegin or threadStat statement, but decrease to only 1 thread when you are inside a threadBegin or threadStat.
After calling the sysstate function with that case, you need to create a GAUSS level threadset (threadBegin, etc) for the settings to be activated.
I wonder how "Parallel for loops are currently being implemented in GAUSS." I have the latest GAUSS 64 bit version on a 32-core machine. When using the "for" loop for simulations, I can check that typically only a tiny percent of the CPU cores are used. If I manually do the multithreading, by writing 32 "ThreadStat"s to split the total number of simulations, then it will take most of the cores. sysstate(41,1) gives {32,2}, which I assume that it means the internal thread-cap is 32 and it decreases to 2 when inside ThreadStat or ThreadBegin. So the puzzle is a typical "for" loop does not utilizes all the 32 cores.
When using "parfor" under Matlab, it is almost always a full 100% loading of 12 out of the 32 cores. (Matlab 2012 only allows for a maximum of 12 cores per worker; if I run three instances of the same code with "parfor" loops, then all the 32 cores will be used at 100% load.)
Parallel for loops have not yet been implemented, they are being implemented right now. i.e. the coding is underway for future release.