Compiling and linking

From Helpful
(Redirected from Shared object)
Jump to navigation Jump to search
Some fragmented programming-related notes, not meant as introduction or tutorial

Data: Numbers in computers ·· Computer dates and times ·· Data structures

Wider abstractions: Programming language typology and glossary · Generics and templating ·· Some abstractions around programming · · Computational complexity theory notes · Synchronous, asynchronous · First-class citizen

Syntaxy abstractions: Constness · Memory aliasing · Binding, assignment, and such · Closures · Context manager · Garbage collection

Sharing stuff: Communicated state and calls · Locking, data versioning, concurrency, and larger-scale computing notes

Language specific: Python notes ·· C and C++ notes · Compiling and linking ·· Lua notes

Teams and products: Programming in teams, working on larger systems, keeping code healthy · Benchmarking, performance testing, load testing, stress testing, etc. · Maintainability

More applied notes: Optimized number crunching · File polling, event notification · Webdev · GUI toolkit notes

Mechanics of duct taping software together: Automation, remote management, configuration management · Build tool notes · Installers



Unices

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Note: the below is for C.

Some things are analogous for other languages, some things are not. It's just my own self-educational notes.



gcc has groups of options, so argument order matters. For example, linker options (such as -l) come after source files, because gcc's knowledge is incremental, and you may get strange errors (say, about missing symbols) if you do things in a strange order.


Objects and libraries: Extensions, naming, linking, loading

.o

Object files, which consist of code and symbols (names). You get a .o if you compile with gcc's -c option.

A .c file can call functions external to that file. This takes a header to know how to call it, and linking it to the code that goes with it. This can be used to split your program into logical chunks, and is also used to call things from static libraries.


In the linking step of compilation, all symbols must be resolved. This means the linking step needs to know about all your own object files, plus any static libraries you use.

If you are compiling a runnable executable, there is often an implied main symbol which also has to be resolves.


.a

Static system libraries are usually .a files.

Which are archives containing .o files, and can be created and altered with the ar utility.


For example, adding objects to an .a:

ar rcs libname.a object1.o object2.o

The options used in this example:

  • r: insert objects, replace when present (other operations include delete (d), print (p), print table (t) , move (m), append, extract (x))
  • c: create .a file if it didn't exist (to suppress the warning that it creates it as needed)
  • s: write/update index (good style; having it speeds up linking, and having it up to date avoids errors(verify))


When compiling, you can include objects from .a files, for example

gcc nameuser.c -L. -lname -lc -lm -lbz2

Notes:

  • -lname means "look for libname.a" (For example, -lc means libc.a, -lbz2 means libbz2.a)
  • -Ldir gives extra directories to look for .a files in. In simple compiles and simple makefiles, this can easily be .


.so

Whereas a static library has to be linked into a program at compile time, and be fully resolved at that time, a shared object, a.k.a dynamic library, can be linked in at and after program startup, and symbols resolved on-the-fly.

While


Names and conventions

Shared objects have a few names:

  • linker name: soname without version: libname.so (usually a link to the soname)
  • soname has a major version: libname.so.1 (and is often a symbolic link)
  • real name has major, minor and an optional release number: libname.so.1.0[.0] (and is an actual file)

The point of all that is managing versions properly, making dependencies work, allowing multiple versions to be installed, to default to the latest, and to detect a few cases of trying to do something wrong.


'Fully qualified' soname means an absolute-path reference.

Creating .so files

While compiling for a .so, you want to compile to objects but avoid linking. You could do:

gcc -c name.c -fPIC

which will produce name.o, a simple object.

-fPIC makes it easier for the code to be relocated at load time (after all, it won't be running in its own fixed memory space).


Now you can go from an object to a shared object. The shared object is itself named the real name, and contains its soname in it (which is relevant metadata for the eventual loader system). For example:

ld -shared -soname libname.so.1 -o libname.so.1.0 name.o

...and any libraries that may be necessary, say -lc (libc) and -lm (math.h things).


(Note that prepending 'lib' to those .c and .h files that end up being made shared objects makes things more consistent. Large projects have a way of getting messy without consistency like that)


Installing .so files

You copy the .so into one of the system library paths that ldconfig uses (see /etc/ld.so.conf).

For non-system shared objects it is considered good style to put them in a directory like /usr/local/lib, where you can easily find them in case something goes wrong.

There are also privilege details with the system library directories. (TODO: figure out which)


After copying it in, run ldconfig. To verify that the system saw it, do:

ldconfig -v | grep name

You should also create the symbolic link from the linker name to the (current) soname, like:

ln -s libname.so.1 libname.so

(TODO: figure out more detailed linker/so/real name semantics)


Using .so files

You use a predefined API to load, use and unload shared objects. See examples, like e.g. [1].

Summarizing, the basic necessities seem to be:

  • including dlfcn.h
  • calling dlopen, with the linker name you want
  • (testing for success)
  • calling dlsym to look up a function by name,
  • (testing for success)
  • use the resulting function (you got a void pointer, and know the function typing from a header(verify))
  • calling dlclose

See also

C / C++

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.

Quite, quite unverified. I'm sure a few things are just wrong.

C/C++ linking

ld: compile time static linker

(Note: rather different thing form ld.so)

ld is a compile time linker

typically puts together a whole executable from already-compiled parts
e.g. from separate .o files, from static libraries (often .a files, which contain .o files)


Some more notes on ld:

ld: cannot find -lname means this library isn't installed, or is ld is not looking in the directory it is in
To see where ld is looking:
ld --verbose
gcc adds some more, so you may also like gcc -print-search-dirs
To fix:
add a library search dir while linking, -L to ld or gcc
you could symlink from a system dir to


ld.so: runtime dynamic linker

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

(Note: rather a different thing form ld, see above)


ld.so is a runtime linker, meaning

  • when you an executable, the OS first uses ld.so to
    • fetch a list of required libraries from the executable - you can see roughly the same list using ldd executablename
    • then looks for these in a few places


the places ld.so look include:

  • first the environment variable LD_LIBRARY_PATH
takes precedence over the following two, so essentially overrides the system configuration
which is why it's a quick fix that you should avoid when doing things properly (it can even break your system)
for security reasons, it is ignored at runtime for executables that have setuid or setgid set
  • the ELF's own specified paths (DT_RUNPATH, DT_RPATH)
used relatively rarely, as it is bad for portability
  • the file /etc/ld.so.cache, which is a list that is pregenerated for speed reasons
    • ldconfig updates that cache
      • It looks in places included via /etc/ld.so.conf - which itself sometimes includes other things, for pluggability
      • it also updates library symlinks as necessary - within the same major version(verify), because they are assumed to be ABI-compatible within (and not between) them(verify).


Recent systems may have /etc/ld.so.conf based on include /etc/ld.so.conf.d/*.conf to integrate witch package management more cleanly (cf. profile.d and others)


Arguably

  • the most predictable way of structuring library preference (if you need that at all) is via ldconfig's config
  • LD_LIBRARY_CONFIG should only be used for run-time overrides, never system-wide (unless you have a really good argument for it)
it can be useful in isolated environments, temporary installs, some low-privilege installs, user-dir installations
can make make sense to use in wrapper scripts
I've seen people try to get back system prescedence by prepending system library paths to LD_LIBRARY_CONFIG. This can work, although it will probably never be quite the same as ld.so.conf
and can make debugging issues a little more fascinating
  • When software insists it takes precedence over the system libraries (LD_LIBRARY_PATH=/its/path:$LD_LIBRARY_PATH)
if it doesn't conflict, putting it on the end instead may avoids some trouble for other executables in that environment
if it does conflict,
...then arguably the cleanest solution is to isolate that precedence to the act of running that specific software, e.g. in a wrapper script, using environment modules, or similar to isolate it to a shell, and to intentional excepotions.

See also:

LD_PRELOAD

https://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html

extern

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


Using C in C++

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

While a lot of C code is valid to compile as C++, compiled C and compiled C++ are different enough to not be link-compatible.

A large part of the difference lies in symbol mangling. For example, function overloading happens by using different symbol names, by encoding the function type in the name. The C++ compiler does this to everything (overloaded functions, non-overloaded functions, references that happen to be external, etc.).

Since linking only matches equivalent names, mangled C++ names and non-mangled C names just won't match.


Other name mangling: calling conventions

Name mangling is also used in C, and other languages. There is e.g. the C/C++-native cdecl which adds an underscore (and is the assumed default by most C compilers when no other decoration is specified). There are also also _stdcall and _fastcall that add argument byte size, in different ways.

...In different ways by different compilers, and depending on circumstance. C++ mangling may be a fine method internally but is wholly unreadable if used as-is in binaries, which is one reason why in MSVC the mangling style even depends on whether you use a DEF file or __declspec(dllexport).


Convention from microsoft, generally followed elsewhere, includes:

  • _cdecl: _funcname
  • _stdcall: _funcname@argsize (for example, int _stdcall fu(int i); in a 32-bit compiler would be _fu@4). Apparently most windows libraries use stdcall.
  • _fastcall: @funcname@argsize

The C/C++ default is _cdecl (assumed when omitted), while _stdcall is apparently more common between compilers and languages(verify) and avoids extra work when e.g. building dynamic libraries for general use(verify).


In practice, however, it's a mess.


Note that some conventions get their information from the stack, others (e.g. fastcall) from registers. I don't remember any details, just that there were some.

Others

Macros:

  • #ifdef, #if defined something, #if defined(something)


Defined things Note: many of these are only conventions.

  • __cplusplus: when in C++ compiler
  • _WIN32, WIN32
  • _WIN64, WIN64
  • _MSC_VER: often used as an 'are we compiling under MSVC' test
  • DYNAMIC v.s. STATIC_LINKED: creating .dll, or static .lib?


C Problems / limitations

IO

Directory

Filesystems may be important, but in some ways they are just a database, in that the low-level semantics are largely external to the language.

While many languages define files (as an openable-stream thing), few deal directly with directly with directories.


Details an implementation is a per-OS thing. On unix, they are dealt with via POSIX

Opening a full path mostly works because you just hand that path to the underlying system calls, which gets it to the added code mentioned above.


Unicode

You have to do this yourself. Sure, you can use wchar_t strings, but C++ iostream/ios/fstream is basically just layer over layer over things that assume ANSI strings, so it can break Unicode whenever it assumes that wide character strings only ever contain ANSI.

You can disable the ANSI conversion, but not with standard code (there's a function in MS' CRT, for example). Pretty much any solution is a nasty and

pre-compiled headers; StdAfx and .pch, .gch

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Pre-compiled headers lessens compilation time in C/C++ compilation, assuming your headers/macros are nontrivial.


Visual Studio sets up StdAfx.h, StdAfx.cpp, and the compiled headers get file extension .pch

GCC creates .gch


See also:

General compilation errors

That is, errors that relate to language, syntax, and other things that aren't specific to compilers.

invalid types ‘int[int]’ for array subscript

invalid type argument of 'unary *'

Typically, the thing you are trying to dereference isn't a pointer


error: expected constructor, destructor, or type conversion before 'something'

Syntax error, seems to regularly be something at global scope (that you can mess up easily), but not necessarily.

May be as simple as a missing semicolon.



invalid combination of multiple type-specifiers

In my case, 'unsigned byte', where byte was already defined an unsigned char - in other words, I was effectively telling the compiler to use an unsigned unsigned char.


If you are wrapping other people's code and want no-effective-change specifications to Just Work, look at things like boost's typetraits[2]


Gnu errors

Cpp: unrecognized option `-$'

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

For me, this error turns up in tcc (the traffic shaping config parser), where gcc is used to process the configuration.


-$ refers to the ability to use dollars in variable names.

It seems gcc on most systems supports this feature regardless of whether they recognize this option, so this is usually not a fatal error.


It seems -$ is related to -fdollars-in-identifiers (perhaps the shorthand that was abandoned for the longer form?(verify)) The -$ option seems to have disappeared in gcc roughly around version 3.3(verify).

Other options may also imply acceptance of dollars (e.g. -E) but that could depend on the specific compiler.


In some cases, it may be practical to switch between different versions of gcc you have installed.

http://www.google.com/codesearch?q=%28gcc%7Ccpp%29+%5B%5C+%5D%5B%5C-%5D%5C%24



Microsoft errors

This is a list of compiler errors I've run into. I try to keep this restricted to the things that can have non-obvious causes or whose resolution isn't explained well in docs.


D (build tools)

D8016 : '/MT' and '/clr' options are incompatible

(C++/CLI projects)

The short version is that you should use /MD or /MDd instead.

See also [3].


D8045 : cannot compile C file something.c with the /clr option

You cannot compile C files, only C++ files.

This error likely comes from your files having the .c extension. The compiler doesn't want this, regardless of whether they have cplusplus #define wrappers and so can be compiled for C++ without problems.

Either rename them to .cpp or such, or force treatment as C++ with the /TP option ('Advanced' section in Visual Studio options).

C (compiler)

C3862 : 'functionname': cannot compile an unmanaged function with /clr:pure or /clr:safe

(C++/CLI projects)

In my case this was caused by a function optimized with inline x86 assembly.

C4980 : '__gc' : use of this keyword requires /clr:oldSyntax command line option

...or other keywords.

You are trying to use keywords from the outdated MC++ in a project that is set for C++/CLI (the default in managed C++ projects in recent visual studios)

Migrate to the new syntax, or force the old syntax with /clr:oldSyntax.

C3389 : __declspec(keyword) cannot be used with /clr:pure or /clr:safe

(C++/CLI projects)

...don't know. The documentation says little more than "don't do this". Maybe compiling to CIL DLLs means you never have to export anything?


LNK (linker)

LNK2001 and LNK2019 : unresolved external symbol [...]

When this complains about existing functions it probably means you're not linking against something.

When it complains about

  • something like __DllMainCRTStartup@12 it means you have an entry point specified in properties, but have not declared it. Either implement one or use /NOENTRY.
  • something like __imp__InitCommonControlsEx@4 (..in function _WinMainN@16), you're probably not linking with comctl32.lib

LNK4001 : No objects specified; libraries used

This tends to mean you have some incorrectly set paths (particularly when co-occuring with LNK2001(verify)), but it's not always trivial to see how.

Other possible causes:

LNK4221 : no public symbols found; archive member will be inaccessible

This means an .obj was added to a .lib without anything in it being externally referable - that it will will not provide anything to things that links against this library.

If this is the sole output of a project, it means your library does not expose what it should. This can have various causes, including ifdefs that do not consider the platform being compiled for and effectively filter it out.

Will likely cause LNK2001 and LNK2019.


LNK4031: no subsystem specified; CONSOLE assumed

/SUBSYSTEM mostly controls what the entry point will be.

You'll get this error if there is no obvious way to determine it (having a WinMain is pretty obvious to the compiler, for example)

In visual studio, it's in Project properties → Linker → System → SubSystem.