Intrinsics

From Helpful
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.


Intrinsics are pieces of code that a compiler handles as a special case, usually for performance reasons.


Intrinsic functions

Intrinsic function is a function call replaced with a platform-specific implementation, and possibly inline.


For example, the C standard library's strcpy() and memset(). what they do is very well defined, very simple to implement, and frequently called.


Without intrinsics this will be a regular function call into libc.

Aside from the ability to inline this, to save a few stack pushes/pops and jumps, it can often use an implementation that is a little more efficient on the platform you are compiling for.


In addition, intrinsic functions can also refer to

  • a programmer trying their best to get a specific opcode, e.g. NOPs to implement tiny wait times
  • using functions that are somewhat hardware-specific, like __enable_interrupt()
or using such a common name instead of the actual instructions behind it
  • using code the compiler by default wouldn't generate, such as SIMD instructions like MMX, SSE, FMA, AVX.


SIMD and such

Most general-purpose languages are more imperative than declarative about number crunching, which implies that their compilers cannot conclusively analyse when SIMD might be useful, faster, or when the data reordering might help or even hinder speed of SIMD.


One way to deal with this is to work almost the other way around: You use a library where you tell it exactly what you're doing to a bunch of data.

The compiler can decide, based on the target CPU selected, whether to back this with a SIMD implementation, generic loops (as you probably would have written yourself).


This meaning of intrinsics amounts to optimization hints (though a very explicit form, for something called 'hint').


In theory you could also compile both and decide which runs faster, often at runtime. Such tricks are used in some number crunching libraries, like FFTW and BLAS, which as an API present fast fourier transforms and linear algebra (respectively), but have a bunch of different implementations and run the one that seems to run fastest on your machine (exactly when and how they decide this varies).


All of this can potentially apply to static compilation, to JIT / AoT compilation, and also significantly to vector processors, multiprocessing platforms, etc.