Computer task switching from an analogy perspective: Difference between revisions

From Helpful
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{#addbodyclass:tag_tech}}
{{stub}}
{{stub}}


Line 4: Line 5:




Have you ever gotten the question, or wanted to ask, '''"but how many cores do I make this computation use?"''' or "what's this about MPIs and threads?"
Have you ever gotten the question, or wanted to ask, '''"but how many cores and threads do I make this computation use?"''' or "what's this about MPIs and threads?"




In my technician mind, the answer is something like "maybe you should processes/MPI to half the amount of CPU cores (because people tend to buy Intel, and HyperThreading means tasks will contend for FPUs which are shared between HT cores, so you should think in physical cores rather than virtual cores),  and threads-per-process to 2 at least on platter disk because contending for their spins (but IOPS is an imperfect idea) is likely to make things ''worse'' due to seek overhead, something that doesn't happen at all on SSD (and a little different yet on RAID when perhaps you have tweaked readahead for your application), though cache contention and memory saturation can also make or break some of that, and delays can work out differently on Xeons boards because of local-versus-QPI-accessed memory, and last but not least the relatively localized CPU caches.".
In my technician mind, the answer is something like "maybe you should processes/MPI to half the amount of CPU cores (because people tend to buy Intel for compute for some reason, and HyperThreading means tasks will contend for FPUs which are shared between HT cores, so you should think in physical cores rather than virtual cores). Maybe two threads-per-process, because while you get better scheduling, on platter disk you are ''very soon'' contending for their spins (but IOPS is an imperfect idea) is likely to make things ''worse'' due to seek overhead, something that doesn't happen at all on SSD so maybe forget that in the future (and a little different yet on RAID when perhaps you have tweaked readahead for your application), though cache contention and memory saturation can also make or break some of that, and delays can work out differently on Xeons boards because of local-versus-QPI-accessed memory, and last but not least the relatively localized CPU caches.".


I could go on - I hadn't even ''started'' on threads yet.
I could go on - I hadn't even ''started'' on threads yet.




...and then I'll say "about ten processes", and "no threads unless the compute program specifically told you it's useful", because it's a middle of the road answer,
...and then I'll say "about ten processes", and "no threads unless the compute program specifically told you it's useful", because it's a middle of the road answer, will work well enough on all but the most overspecced single hosts, or unusually IO intensive jobs.
will work well enough on all but the most overspecced single hosts, or unusually IO intensive jobs.




Because if you do give a technical answer, people's eyes just glaze over.
Because if you do give a technical answer, people's eyes just glaze over. And ''rightly so'' - not even I really want to think about all that.
 
And they'll ask again next week, seeing if they can get a clearer answer.
And they'll ask again next week, seeing if they can get a clearer answer.


Which is entirely reasonable - ''no one'' wants to think about all that, just to generally do roughly the right thing.
Not even people like me.


{{comment|(And the select few who ''do'' care to understand what the terms mean, and tweak for best use, or use clusters and want to do so efficiently, will ask more questions anyway)}}
{{comment|(And the select few who ''do'' care to understand what the terms mean, and tweak for best use, or use clusters and want to do so efficiently, will ask more questions anyway)}}




Line 31: Line 27:
<!--
<!--


Before electronic computers, computers were literally a room full of people,
doing math on paper,
because hiring lots of people is a lot faster than doing it alone.


Before electronic computers, computers were ''[[literally]]'' a room full of people,
doing math on paper, because hiring lots of people is a lot faster than doing it alone
(also more expensive, so only done this way when it was ''important'').




Line 40: Line 36:


...and then, like any analogy, we will break it.  
...and then, like any analogy, we will break it.  
Pretty hard.
Pretty hard.
And like any analogy, it will be harder to patch up then describing the actual thing in the first palce.





Latest revision as of 11:03, 7 May 2024

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

(See also Computer task switching from a programmer perspective)


Have you ever gotten the question, or wanted to ask, "but how many cores and threads do I make this computation use?" or "what's this about MPIs and threads?"


In my technician mind, the answer is something like "maybe you should processes/MPI to half the amount of CPU cores (because people tend to buy Intel for compute for some reason, and HyperThreading means tasks will contend for FPUs which are shared between HT cores, so you should think in physical cores rather than virtual cores). Maybe two threads-per-process, because while you get better scheduling, on platter disk you are very soon contending for their spins (but IOPS is an imperfect idea) is likely to make things worse due to seek overhead, something that doesn't happen at all on SSD so maybe forget that in the future (and a little different yet on RAID when perhaps you have tweaked readahead for your application), though cache contention and memory saturation can also make or break some of that, and delays can work out differently on Xeons boards because of local-versus-QPI-accessed memory, and last but not least the relatively localized CPU caches.".

I could go on - I hadn't even started on threads yet.


...and then I'll say "about ten processes", and "no threads unless the compute program specifically told you it's useful", because it's a middle of the road answer, will work well enough on all but the most overspecced single hosts, or unusually IO intensive jobs.


Because if you do give a technical answer, people's eyes just glaze over. And rightly so - not even I really want to think about all that.

And they'll ask again next week, seeing if they can get a clearer answer.


(And the select few who do care to understand what the terms mean, and tweak for best use, or use clusters and want to do so efficiently, will ask more questions anyway)


Analogies

Okay, so....