Programming language typology and glossary: Difference between revisions

Latest revision as of 16:00, 11 September 2023

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Typing (glossary)

type annotation, type checking, and type hinting

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Type declaration is putting types on function arguments, return values, and variables

which usually implies static typing

Type checking is checking that these declared types are adhered to.

In static, compiled languages this can be done at compile time (though static languages that still allow dynamic casts can still subvert things).
In dynamic languages this check could only be done at runtime - and whether and when it is checked depends on the language and/or the programmer

Type annotation is a name we may prefer to use when types are mentioned, but not checked - really just inline documentation of what we expect.

This applies only to dynamic languages (in statically typed languages would be a syntax error to not specify types (or have it be inferred, for languages that do that)). Type annotation can still be very useful for editors to hint to you what to hand in, for certain kind of checks.

Type hinting is a vaguer term, and can refer to things like:

type annotation as described above (not enforced in any way)

mainly there so that IDEs can show this to programmers

e.g. Python 3[1] (see also Python_notes_-_syntax_and_language#Type_annotation)

type annotation that is enforced at runtime, but optional to specify

e.g. PHP (though it calls it type declaration) [2]

around generics, it can refer to requesting which types should be precompiled

(a specific case of a compiler hint)

e.g. tensorflow[3]

Some languages add some further distinctions, like Java's 'declaration annotation' [4][5]

Conversion, casting, coercing

The shorthand terms (like conversion, casting, and coercing) are somewhat fuzzy

Or, rather, context-dependent. It's often defined very clearly within a language's specs -- but different languages may use the terms differently, so once you step outside a single language, definitions vary.

More verbose terms may be more specific, but even they can vary between specific language's type system (and sensibly so within each).

The result is that people will often use the terms consistent with the language they know best, so it really helps to understand the underlying concepts. Context then solves most things.

More pragmatically

Programmers often deal with:

having to explicitly convert values (explicit converting type cast)

e.g. (float)intvar

cases where that's done for you (implicit converting type cast)

e.g. expressions like 2 + 2.2

e.g. a function taking a float, but you can call with an int because the language's coercion rules specifically allows that as an implicit conversion and does it for you

the typing system, and its coercion rules (which may be alterable)

that e.g. says "math mixing integers and floats will always become float" or "it becomes the type of the left value"

that e.g. emits "this is a common source of mistakes" warnings, e.g. around signedness, or pointer-type conversion (C)

that raises compiler errors like "no you can't turn a float to a string directly" or "wrong pointer type"

that makes some things implicit and forces other things to be explicit

seeing underlying bytes in a different way (explicit non-converting type cast)

often hackish, sometimes useful.

For example, consider the C code:

int i1 = 10;
int i2 = 4;
float f1 = i1 / i2;
float f2 = (float)i1 / i2;

f1 will store 2.0, because that expression is an integer/integer division resulting in 2, followed by you happening to want to store it into a float so an implicit conversion to suit that.

the f2 line explicitly converts (only) i1 to a float, which then counts on coercion to mean the division happens as a float/float division, and the result 2.5 can be assigned (without conversion).

More technically

(and, while clearer, these are still not universal, but a lot clearer)

a converting type cast, sometimes type conversion

changes the underlying bits according to known interpretation of both old and new types

often to get the best possible, or most useful representation, in another type

e.g. int as float (typically accurate enough),

or float as string (typically rounded for human feedback)

a non-converting type cast (always explicit)

does not change the underling bits, but sees those same bits with a different interpretation

e.g. "see these four adjacent bytes as one int32"

Which sometimes makes sense, mostly for speed reasons

and is often not necessary, or safe. So not all languages expose the ability, or make it easy

always has to be done explicitly

A converting type cast:

explicit type conversion

e.g. when you do (float)intvalue

implicit type conversion, often called coercion

the language's type system allowing certain conversions and doing them for you, things like

e.g. handing an integer to a function expecting a float, smaller to larger integers,

e.g. expressions like 2 + 2.2

e.g. in the 2+2.2 case many languages have coercion rules that effectively say "any integer-float mix becomes a float", while some others always focus on the left value

This will sometimes emit "this is a common source of mistakes" warnings

e.g. C around signedness, or pointer-type conversion, which both makes sense

Coercion mostly dictates how implicit conversion can work. And as such is often used as a near-synonym

Typing (typology)

strong typing, weak typing

Mainly describes how easily types are coerced to another within an expression.

Strongly typed languages have stricter and more enforced rules about mixing types, meaning you need to do more conversions explicitly.

Weak typing often means there are more type-operator-type combinations predefined - and that they won't always do what you expect.

Consider 2 + "2"

In strong languages, this will give an error

In weak ones, it will do... something. Depending on the language, it may be 4, it may be "22".

Strong and weak typing is actually a gliding scale.

some things are more easily coerced in most language.

In particular, most languages will allow mixing of ints and floats, so 2 + 2.2 is usually valid (and usually becomes a float), because it's pretty convenient.

Overly weak typing is often disliked, in part because these are often also dynamically typed, meaning there are a lot of hidden rules to the typing, like "order matters a lot" or "actually in that case it coerces via integers and not strings like everything else" or other "dunno, specs say so" arguments.

This amounts to "the correctness depends on whether you have internalized this particular language's typing model".

It's potentially more opaque around variables rather than literals, because there is no clear indication of the type it currently has.

Overly strong typed is also disliked, in that it makes you type out absolutely everything needlessly.

It's a balance, of verbosity and how obvious mistakes are.

dynamic typing, static typing

In statically typed languages, variables have types.

In dynamically typed languages, values do.

One way to look at it is that variables in statically typed languages make some space according to the type, and you can store things in it, e.g.

int i=0
i="foo"; # is invalid

Whereas in dynamically typed languages, variables are just temporary names that point at a value-that-comes-with-a-type.

i=0
i='foo' # i now points to a string instead

Note that this is independent of strong/weak typing. Strong typing may still be in place.

Static is liked because the explicit typing is clearer both to people and compilers, and letting you know about typing errors at compile time.

Static is disliked when it makes you write things out too much. Implicit typing is a nice feature some languages have to lessen this.

Dynamic is liked for its flexibility and often shorter code

Dynamic is disliked for its ability to hide bugs, and for some of those bugs to only become discovered at runtime.

Manifest typing, implicit typing

Imperative, declarative, functional, etc.

Imperative, declarative

functional

pass by value, pass by reference

@@ Line 1: / Line 1: @@
+{{programming}}
 {{stub}}
@@ Line 73: / Line 75: @@
 For example, consider the C code:
-<code lang="c">
+<syntaxhighlight lang="c">
 int i1 = 10;
 int i2 = 4;
 float f1 = i1 / i2;
 float f2 = (float)i1 / i2;
-</code>
+</syntaxhighlight >
 f1 will store 2.0, because that expression is an ''integer/integer'' division resulting in 2, followed by you happening to want to store it into a float so an implicit conversion to suit that.
@@ Line 213: / Line 215: @@
 -->
+==Imperative, declarative, functional, etc.==
+===Imperative, declarative===
+<!--
+Imperative is "these are a full list of the concrete steps I want done"
+Declarative is "this is a description of what I want to happen, you figure out some of the implied details".
+Declarative often means there are a lot of rules of evaluation already present.
+This is ''somewhat'' of a gliding scale.
+Say, infix notation expressions in ''any'' language rely on some compiler/interpreter rules.
+But we usually use the term referring e.g. functional languages,
+where you only need to write a complete-enough set of facts,
+and the actual execution is implied from that.
+Or that to domain specific language - making a singular task easier to express. And a bunch of extra code to evaluate that DSL.
+-->
+===functional===
+<!--
+Functions in the mathematical sense {{comment|(sometimes 'pure functions')}} are not about computation, but about mapping.
+That mapping does not involve side effects, and no mutation.
+They will always do the same thing - which makes testing simple and conclusive.
+It also means composing functions is well-behaved by definition.
+It also means parallelism is much easier -- because the lack of mutability avoids the most common mistake, that read-modify-write easily leads to races.
+But it does shift around how you do things at all, and the more purist you are, the more impractical it is, so while it is really useful to understand how this paradigm works, in practice it's better to only spice it in.
+----
+Functional gets mild discomfort from various people who learned imperative first, which is most people.
+Imperative gets you really used to thinking in "throw down a pile of related things in little islands we call scope, until they make sense together".
+This approach is something that isn't optional to unlearn in functional,
+where there isn't so much scope as parameters.
+(at least, if functional isn't your first style, or learned in parallel)
+And sometimes one is cleaner than the other.
+But that also depends a lot on habits - a thing that also exists withing much more usual language.
+You can clearly tell when people e.g. try to write C in Python, or Java in Python, and are having a hard time with it. That too will pass.
+-->
+==pass by value, pass by reference==
+<!--
+In any language with pointers or references
+Pass by value passes a copy of the complete value
+: passes that value on the stack
+Pass by reference passes a pointer to the data
+: passes a pointer on the stack, to a value typically in the heap
+It is relatively common for languages to
+: pass scalars by value (by default)
+: pass structs by reference
+The reason to do either is a mix of efficiency and function/semantics.
+Efficiency in that
+* passing things larger than the pointer size via the stack will
+** is more copying than often necessary
+** make that stack overflow more easily
+Anything passed by pointer becomes shared data.
+Which can be good for efficiency, but bad for e.g. safety.
+...because that is generally the most efficient way of doing both.
+Passing a pointer to something smaller than 64 bits
+-->
 [[Category:Programming]]