C and C++ notes / Types, values, some basic libraries

Notes related to C and C++

Note: Some notes describe behaviour common to most variants - for C mostly meaning C89 (and a bit of C99), unless mentioned otherwise.

Literals, initialisation, and such

data/string copying

simple parsing and conversion stuff

strings to integers and floats

int                          atoi(const char *str);
long                         atol(const char *str);
long long                   atoll(const char *str);

long                       strtol(const char *str, char **endptr, int base)
unsigned long int         strtoul(const char *str, char **endptr, int base);
unsigned long long int   strtoull(const char *str, char **endptr, int base);

double                       atof(const char *str);

float                      strtof(const char *str, char **endptr);
double                     strtod(const char *str, char **endptr);
long double               strtold(const char *str, char **endptr);

things to strings

In C, you usually want one of the printf family -- see the next section

In C++, you can choose between printf (there from <cstdio>) and cout (sometimes less bother than printf, sometimes more, though it probably helps your sanity not to mix the two).

Sometimes you'll be given an itoa() and possibly utoa(), but this is not part of any C or C++ standard, so not really portable.

For radix 10, 16, or 8, you can use sprintf, respectively (%d or %i or %u), (%x), and (%o).

For other radices you could copy itoa/utoa's code into your project -- it's only ~15 lines.

For other options, see e.g. [1].

itoa is usually defined as:

char *itoa (int value, char *buffer, int radix);
//and possibly:
char *utoa(unsigned int value, char *buffer, int radix);

Notes:

will write a null-terminated string
radix should be within 2..36 (uses digits and a-z)
The default radix is 10 (i.e. decimal)
on signedness:
- for itoa: If radix!=10, the number is assumed to be unsigned, with 10 it is considered signed.
- utoa (if present) considers everything unsigned
the buffer should be large enough to hold the result (The worst case length is for binary (radix=2): 17 bytes for a 16-bit int, 33 for a 32-bit int.
the return value is a copy of *buffer pointer

printf and variants

int   printf(const char *format, ...);                          // to stdout
int  fprintf(FILE *stream, const char *format, ...);            // to FILE (file, or stream opened as such)
int  sprintf(char *str, const char *format, ...);               // to target buffer 
int snprintf(char *str, size_t size, const char *format, ...);  // ...same, with call-imposed max chars,
                                                                //          for safety against overflows

The same list but with a v prepended:

int   vprintf(const char *format, va_list ap);
int  vfprintf(FILE *stream, const char *format, va_list ap);
int  vsprintf(char *str, const char *format, va_list ap);
int vsnprintf(char *str, size_t size, const char *format, va_list ap);

Same set, but takes arguments using a va_list instead of a variable number of arguments. If you don't know what that means, you probably won't need them.

format strings

You know, those things that look like

%d
%-30s
%.3f
%-9.3f
%+-i
%07d
%#o

Printf will scan a string for these things, and replace them with value also handed to it, behaving according to the conversion specifier.

They always start with a %, end with a conversion specifier, and optionally any further format details inbetween.

The following focuses on the widely supported stuff, shared by standards or at least most C flavours.

required: %
optional flags - mostly for monospaced alignment
- + : use plus instead of nothing before a positive number (like the next, left-aligns better with negative numbers)
- on positive numbers, prepend a single space
- - : left-justifies numbers instead of the default right justification
- 0 pad with zeroes instead of spaces
- # alternate form:
  - for hex (x/X) this appends 0x/0X (except to 0)
  - for octal (o), this ensures the number starts with a zero
  - f/F, e/E, g/G, a decimal point is always present
  - for g/G, trailing zeroes are not removed as they normally are
- some C-variant-specific flags

optional: minimum field width

Often: a little wider than you expect, to get things to align the same way

optional: precision, for which the meaning varies a bit:
- integers (d, i, o, u, x, and X): minimum number of digits
- floats (e, E, f, F, and sometimes a and A): minimum digits after radix character
- floats (g, G): maximum number of significant digits

optional: length modifier

required: conversion specifier

Where the above says conversion specifier:

strings and characters:
- c - from unsigned char
- cl - from wint_t, converted via wcrtomb
- s - null-terminated string
- sl - (wide-)null-terminated string, converted via wcrtomb

signed int:
- d, i - signed decimal
unsigned int:
- u - unsigned decimal
- x - unsigned hex, lowercase abcdef
- X - unsigned hex, uppercase ABCDEF
- o - unsigned octal
- The # flag prints hex and octal formats in a way that scanf would understand (how standard is this?(verify))

float/double:
- e, E
  - [-]d.ddde±dd
  - e/E difference controls whether E in string is lowercase/uppercase
  - exponent always shown, even if 0 (shown as 00)
  - precision defaults to 6.
- f (F exists but is not very standard)
  - [-]ddd.ddd
  - precision defaults to 6.
- g, G
  - precision defaults to 6.
  - precision argument specifies significant digits (unlike e, f). Precision of 0 treated as 1.
  - uses %e style if exponent<-4 or if exponent>precision, otherwise uses %f style (verify)
  - (trailing zeroes stripped. If only zeroes after decimal point, decimal point is stripped too)
also
- % - literal %. (so %% in format string)
- p - pointer as hex, much like %#x or %#lx

Examples:

    value            %13.4f     %-15.6e           %-+13.6g               % -11.4f              %#13g
      0.3    '       0.3000'  '3.000000e-01   '    '+0.3         '      ' 0.3000    '    '     0.300000'
    0.005    '       0.0050'  '5.000000e-03   '    '+0.005       '      ' 0.0050    '    '   0.00500000'
  -.00007    '      -0.0001'  '-7.000000e-05  '    '-7e-05       '      '-0.0001    '    ' -7.00000e-05'
     5000    '    5000.0000'  '5.000000e+03   '    '+5000        '      ' 5000.0000 '    '      5000.00'

# more flag stuff
    value                 %x               %#x                 %o                 %#o              %07d
        0                '0'              '0x0'                '0'                '0'          '0000000'
     5000             '1388'           '0x1388'            '11610'           '011610'          '0005000'

TODO: more examples

Note that localization may create representations that don't necessarily convert back with C's own string-to-text functions, particularly if communicated to another locale. The simplest example is language's preference for a period or comma for its decimal point.

This includes the use of ' to get locale-specific ten-thousand grouping. To steal an example, printf("%'.2f", 1234567.89)

in POSIX: 1234567.89
in nl_NL: 1234567,89
in dk_DK: 1.234.567,89

More complex parsing and conversion

strtok(, strtok_r), wcstok

char *       strtok(   char *str,    const char *delim)
char *     strtok_r(   char *str,    const char *delim,    char **saveptr)
wchar_t *    wcstok(wchar_t *wcs, const wchar_t *delim, wchar_t **ptr)

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

strtok splits a string into a sequence of tokens, which approximately means "writes NULL characters onto the next token character and returns the newly terminated token." You'll often call it until it returns NULL (which signals that there are no more tokens).

When you want to tokenize a string, you should hand that string into only the first strtok call. Successive calls should specify NULL for str; specifying a string there signals you want to start over and parse that string instead.

It keeps state between calls. The extra state means that you cannot use strtok() when you're still working on another (e.g. tokenizing a token in a nested for), and that it is not thread-safe.

If, in short, you want to do strtoks that can't affect other strtoks, you'll probably want strtok_r (exists on POSIX), a reentrant version that has an extra argument that helps signal which string you want to continue handling.

You can't strtok constant strings, and if you want to keep an unmodified string, you'll probably want to use strdup() (and free() afterwards)

wcstok() is the wchar_t version of strtok_r

strsep

string searching and similar utils

searching for characters and substrings

void *  memchr(const void *str, int c, size_t n)
void * memrchr(const void *str, int c, size_t n)

char *  strchr(const char *str, int c)
char * strrchr(const char *str, int c)
char *   index(const char *str, int c)        // this name apparently deprecated in POSIX
char *  rindex(const char *str, int c)        // this name apparently deprecated in POSIX

char *  strstr(const char *haystack, const char *needle)

char * strpbrk(const char *str, const char *accept)
size_t  strspn(const char *str, const char *accept)
size_t strcspn(const char *str, const char *reject)