Some abstractions around programming: Difference between revisions

From Helpful
Jump to navigation Jump to search
mNo edit summary
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{#addbodyclass:tag_tech}}
{{#addbodyclass:tag_prog}}
{{programming}}
{{programming}}


Line 9: Line 11:


===In math===
===In math===
<!--
{{stub}}
In math, invariants are usually used to point out the concept of 'something that remains unchanged under certain transformations'
In math, invariants are usually used to point out the concept of 'something that remains unchanged under certain transformations'


In particular being able to say that something holds under entire ''classes'' of transformations.
(and in the cases you care to use it, it often likes to be as broad as possible, saying that entire ''classes'' of transformations don't affect it)




Line 38: Line 40:
https://mathworld.wolfram.com/Invariant.html
https://mathworld.wolfram.com/Invariant.html


-->
 


===In more practical engineering===
===In more practical engineering===
<!--
<!--
In pragmatic terms, invariants are related to the assumpption we're making.
In pragmatic terms, invariants are related to the assumption we're making.


For example, in 3D sensing, it can be useful to point out you are assuming detection is more or less invariant with direction.
For example, in 3D sensing, it can be useful to point out you are assuming detection is more or less invariant with direction.


Particularly when you ''know'' this isn't true, to point out that you know - and that it works well enough, and things stay a lot simpler, if you ''do'' make that assumption.
More so when you ''know'' this isn't true, to point that out,
and particularly to e.g. say "okay, this isn't quite as invariant as it should be, but close enough and our solution stays a lot simpler this way"
 
 
-->
-->
===In programming===
{{stub}}
Invariants are expressions that do not change in value.
: In some cases, that's "things that should test as true no matter when or where I test them".


: In other cases, it only matters that they are true at the start and end of a function.


===In programming===
...the difference matters to things like recursion, and to [[concurrent]] and/or [[parallel]] execution.
<!--
Invariants are expressions that does not change in value.


Often a boolean expression that should stay true, and often an equality expression.


Often a boolean expression that should stay true.
{{comment|(in practice, they often contain quality expressions, sometimes contain counts, relatively rarely more complex things)}}


If the invariant changes, then the code is probably wrong.
If the invariant changes, then the code is probably wrong.


Invariants are useful to
describe behaviour in broad terms,
describe the correctness of individual operations,
to test these,
and to do that while debugging.




Invariants are ''also'' useful to
: describe behaviour in broad terms,
: describe the correctness of individual operations,
: to test these assumptions, during debugging and sometimes production




<!--


Uses  
Uses  
* invariants are often placed in [[assert]] statements
* a bunch of [[assert]] statements are technically invariants - things that share often placed in [[assert]] statements
:: particularly in the more critical state management, that should halt a program if any mistakes are suspected
:: particularly in the more critical state management, that should halt a program if any mistakes are suspected


Line 131: Line 141:


====Class invariant====
====Class invariant====
<!--
{{stub}}
 
 


A [[class invariant]], also known as '''type invariant''' (and apparently '''object invariant'''{{verify}}) is something that should stay true throughout an object's lifetime, regardless of its state.
A [[class invariant]], also known as '''type invariant''' (and apparently '''object invariant'''{{verify}}) is something that should stay true throughout an object's lifetime, regardless of its state.




This isn't any different from the above concept, just points out you can often think of an object as a sort of state machine that you may change on any call.
This isn't any different from the above concept, just points out you can often think of an object as a sort of state machine that you may change on any call - and that you should think about correctness on any operation that changes its state.
 
 
 
-->


==Memoization==
==Memoization==
Line 175: Line 179:
   else:
   else:
     return fib(n-1) + fib(n-2)
     return fib(n-1) + fib(n-2)
which is basically the mathematical definition in code form, nice.
which is basically the mathematical definition in code form.
 
 


Now call <tt>fibonacci(n)</tt> for n in 1, 2, 3, 4 (the n-th number in the series).
Now call <tt>fibonacci(n)</tt> for n in 1, 2, 3, 4 (the n-th number in the series).


This works, but each call is spending an increasing amount just recusing -- most of which is not only work that has been done for the previous n just before, but  
This works, but each call is spending an increasing amount of work just recursing -- most of which is not only work that has been done for the previous n just before, but ''many'' calls for any one n actually duplicates of what is being done in this same top-level call, because the -1 and -2 calls end up calling for the same n.  
''many'' calls for any one n actually duplicates of what is being done in this same top-level call, because the -1 and -2 calls end up calling for the same n.  
 
It turns out it's exponential. ({{search|J Roberts, "How Many Recursive Calls Does a Recursive Function Make?"|the actual number is slightly funny to calculate]])


import time
It turns out it's exponential. ({{search|J Roberts, "How Many Recursive Calls Does a Recursive Function Make?"|the actual number is slightly funny to calculate]]). Which also means the amount of time taken increases exponentially:
for i in range(38):
  start = time.time()
  v = fib(i)
  print( 'took %.3f sec to calculate %d-th number: %d'%(time.time()-start, i, v))


took 0.000 sec to calculate 0-th number: 0
import time
...
for i in range(38):
took 0.000 sec to calculate 10-th number: 55
  start = time.time()
...
  v = fib(i)
took 0.005 sec to calculate 20-th number: 6765
  print( 'took %.3f sec to calculate %d-th number: %d'%(time.time()-start, i, v))
...
took 0.036 sec to calculate 25-th number: 75025
took 0.000 sec to calculate 0-th number: 0
...
...
took 0.455 sec to calculate 30-th number: 832040
took 0.000 sec to calculate 10-th number: 55
...
...
took 4.454 sec to calculate 35-th number: 9227465
took 0.005 sec to calculate 20-th number: 6765
took 7.221 sec to calculate 36-th number: 14930352
...
took 11.837 sec to calculate 37-th number: 24157817
took 0.036 sec to calculate 25-th number: 75025
...
took 0.455 sec to calculate 30-th number: 832040
...
took 4.454 sec to calculate 35-th number: 9227465
took 7.221 sec to calculate 36-th number: 14930352
took 11.837 sec to calculate 37-th number: 24157817
...and soon after that you can probably beat the computer with a pencil.


and soon after you can probably beat the computer with a pencil.




And yes, calculating it this way '''is''' somewhat contrived (and ends up you don't want it ''anyway'') -- but if you wanted to optimize this, you would probably do it via memoization. Consider:
Yes, calculating it this way '''is''' somewhat contrived (and ends up you don't want it ''anyway'') -- but if you wanted to optimize this, you would probably do it via memoization. Consider:


answers = {}
answers = {}
Line 222: Line 227:
     return answer
     return answer


Now when you run the same timing tests, all answers are more or less instant, because almost all of these calls almost immediately hit on a number we already calculated and stored. Roughly, they end up up doing is what you would do on paper, or in a smarter implementation: the calls only
Now when you run the same timing tests, all answers are more or less instant, because almost all of these calls almost immediately hit on a number we already calculated and stored. Roughly, they end up up doing is what you would do on paper, or in a smarter implementation:  




{{skipme|
{{skipme|
 
The reason you ''still'' might not want to do it ''quite'' that way is that most languages have a maximum recursion depth (as a guard against programming mistakes), so you can only get so far.  
The reason you still might not want to do it that way is that most languages have a maximum recursion depth (as a guard against programming mistakes), so you can only get so far. Say,
Say,
  fib(1000)
  fib(1000)
will complain. You can sort of work around it if you do something like:
will complain. You can sort of work around it if you do something like:
Line 237: Line 242:
{{skipme|
{{skipme|
The above still puts ''n'' things.   
The above still puts ''n'' things.   
This is pretty cheap on PCs, but if you were more constrained, and your task is only to print the ''series'' (and not an arbitrary n-th item from the series),
This is pretty cheap on PCs, but if you were more constrained, and your task is only to print the ''series'' (and not an arbitrary n-th item from the series), then you can do with memorizing only two numbers. The last two. That's the definition, after all.
then you can do with memorizing only two numbers. The last two, that's the definition, after all.
}}
}}
-->
-->
Line 304: Line 308:




In comparison, the power button on most computers ''toggles'' (and is only a ''request'' to change - unless held for 4 seconds as per ATX specs)
In comparison, the power button on most computers ''toggles'' {{comment|(and is only a ''request'' to change - unless held for 4 seconds as per ATX specs)}}
: ...so you won't know entirely for sure about its state afterwards
: ...so you won't know entirely for sure about its state afterwards
:: so it's ''not'' idempotent.   
:: so it's ''not'' idempotent.   
Line 418: Line 422:
* HTTP HEAD, HTTP OPTIONS and HTTP TRACE is supposed to be idempotent (in that it does not attempt to make changes)
* HTTP HEAD, HTTP OPTIONS and HTTP TRACE is supposed to be idempotent (in that it does not attempt to make changes)


* a HTTP GET is ''expected'' not to (you ''should'' use HTTP POST if it probably changes state)
* a HTTP GET is ''expected'' not to - you ''should'' use HTTP POST if it probably changes state
:: though yeah, a lot of web designers don't strictly adhere to this -- but that's mostly within login sessions so it doesn't invalidate mechanical use much
 
* use of HTTP POST often signals that it's expected to often not be idempotent (but might be in some cases)
 


* an SQL SELECT and is idempotent in that it doesn't even ''try'' to change state
* an SQL SELECT and is idempotent in that it doesn't even ''try'' to change state
Line 424: Line 432:


* an HTTP PUT and HTTP DELETE, and SQL DELETE are often assumed to be idempotent
* an HTTP PUT and HTTP DELETE, and SQL DELETE are often assumed to be idempotent
:: DELETE because if it's not there, it still won't be there afterwards
:: PUT because putting it again
* ...though of course in a distributed system there are other clients




* a HTTP POST is expected to often not be idempotent (but may be in some cases)


* an SQL UPDATE is expected to often not be idempotent  
* an SQL UPDATE is expected to often not be idempotent  
Line 532: Line 543:
https://stackoverflow.com/questions/1077412/what-is-an-idempotent-operation
https://stackoverflow.com/questions/1077412/what-is-an-idempotent-operation
-->
-->
=Scope and namespaces=
<!--
If you think at machine code level, you just have a lot of memory you could write and read.
It's your job to think about what bits of code touches it when, and whether that makes sense.
You have to do absolutely everything yourself, you ''can'', and early programming
but you can imagine people started adding abstractions that made life simpler on them
For example, the function call is probably the singular most useful tool.
Aside from the purely functional part of giving a useful fragment of code a name so that it can be easily reused,
it also turns out to be ''really convenient'' if you can also give it its own little scratch pad that won't interact with the rest.
...more specifically, if the language you work in ensures that variable names defined in the function implies
every time you call it, that variable is usable only within that function  (...we can worry about where that memory comes from elsewhere. We ''have to'', but that also becomes a separate topic).
That means anything outside can't accidentally interfere with it (by accidentally using a variable name that someone else happened to use elsewhere.
This compartmentalization makes both names less messy, and makes memory access less messy.
That function's '''local scope''' is dis
Yes, you have to deal with the fact a function might now refer to both things in local scope and in global scope,
and that even introduces some new problems, but generally the benefit outweighs the cost (particularly if you set some extra rules for yourself).
Yes, you can typically break the abstraction that is scoping, but only intentionally.
Similarly, once you add a lot of different functions, you start having more and more names,
and they will at some point conflict.
''Variable'' names may have the benefit of often being local to functions,
but most ''function'' names are just on a heap of ''everything that exists that you can draw on''
When you choose to (or have to) put names within a specific namespace,
you can only get to those names by explicitly poking into the namespace.
This is often done to avoid ambiguity, particularly if you use names that other names
Namespacing is a general term.
Some language use the term as a specific concept, e.g. in C++ {{inlinecode|namespace}} is a keyword, a named block with names inside.
Other languages do most of their name organization in other ways. For example, python organizes its code mostly though packages and modules.
: namespaces are still a thing, but it's more about "built-ins come from a special-cased namespace", "your own imports go into a program-specific namespace", and "actually that also happens at function definition level"  https://realpython.com/python-namespaces-scope/#namespaces-in-python
...which runs a little into scoping
-->
=Encapsulation, Information Hiding=
<!--
Encapsulation is often explained as bundling data, and the things that alter that data, in the same ''thing''.
OO is a common context ''now'', and classes/objects themselves the means,
yet OO is not the origin, not or only way of doing encapsulation.
'''Information hiding''' is a similar idea but more aimed at ''segregating'' design decisions
of things that are likely to change.
In OO, information hiding is often done by nesting types so that you only see the interface of the outermost type (until you dig deeper).
The terms overlap and there is no clean separation;
: some argue information hiding is the idea, encapsulation is the means, (and both are compartmentalizing)
: others argue they are distinct goals that happen to overlap in how you implement them
It further overlaps with the whole idea of [[interfaces]] and APIs
being the thing you talk to so that implementation changes
have no effect on clients of those interfaces.
https://en.wikipedia.org/wiki/Encapsulation_(computer_programming)
https://en.wikipedia.org/wiki/Information_hiding
-->
=Exceptions=






[[Category:Progamming]]
[[Category:Progamming]]

Revision as of 00:19, 22 April 2024

Some fragmented programming-related notes, not meant as introduction or tutorial

Data: Numbers in computers ·· Computer dates and times ·· Data structures

Wider abstractions: Programming language typology and glossary · Generics and templating ·· Some abstractions around programming · · Computational complexity theory notes · Synchronous, asynchronous · First-class citizen

Syntaxy abstractions: Constness · Memory aliasing · Binding, assignment, and such · Closures · Context manager · Garbage collection

Sharing stuff: Communicated state and calls · Locking, data versioning, concurrency, and larger-scale computing notes

Language specific: Python notes ·· C and C++ notes · Compiling and linking ·· Lua notes

Teams and products: Programming in teams, working on larger systems, keeping code healthy · Benchmarking, performance testing, load testing, stress testing, etc. · Maintainability

More applied notes: Optimized number crunching · File polling, event notification · Webdev · GUI toolkit notes

Mechanics of duct taping software together: Automation, remote management, configuration management · Build tool notes · Installers



Mathy concepts, state

Invariants

In math

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

In math, invariants are usually used to point out the concept of 'something that remains unchanged under certain transformations'

(and in the cases you care to use it, it often likes to be as broad as possible, saying that entire classes of transformations don't affect it)


This is potentially useful to

  • classify mathematical objects into groups, and name those groups
  • characterize certain objects.
say, the Pythagorean theorem is an an invariant property of right triangles
(and is still invariant under any operation that maintains the property of being a right triangle)
going a little further, Fundamental invariants of triangles


  • mention what to watch out for in more pragmatic use
Say, programmers (around graphics, geography, and more) may care to know things like
the area of a triangle is invariant under translation, rotation and reflection, but not under magnification
angles are invariant under all of those
  • It is sometimes useful to explain the contraints of a given problem or puzzle.
  • It's useful around various topology.


https://mathworld.wolfram.com/Invariant.html


In more practical engineering

In programming

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Invariants are expressions that do not change in value.

In some cases, that's "things that should test as true no matter when or where I test them".
In other cases, it only matters that they are true at the start and end of a function.

...the difference matters to things like recursion, and to concurrent and/or parallel execution.


Often a boolean expression that should stay true. (in practice, they often contain quality expressions, sometimes contain counts, relatively rarely more complex things)

If the invariant changes, then the code is probably wrong.


Invariants are also useful to

describe behaviour in broad terms,
describe the correctness of individual operations,
to test these assumptions, during debugging and sometimes production


Class invariant

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

A class invariant, also known as type invariant (and apparently object invariant(verify)) is something that should stay true throughout an object's lifetime, regardless of its state.


This isn't any different from the above concept, just points out you can often think of an object as a sort of state machine that you may change on any call - and that you should think about correctness on any operation that changes its state.

Memoization

In programming, a memo function is a function that remembers the output for a specific set of input parameters it has previously processed, and can answer from that memory rather than from calculation.

Using this as a speed/memory optimization tradeoff is called memoization.


It becomes effective when the duplicated calculation is nontrivial to calculate and/or when the a function is often called with the same input.


A memo function should not have any intentional side effects, or (in general) depend on any external state.


In some cases, memoization is something from the language/syntax itself, particularly in more functional languages (e.g. haskell), but in procedural language it's often done with a more explicit cache, which requires some management, e.g. to keep the cache of answers in check, and possibly even be smart about retaining entries that are requested more than others.

Side effects

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

In computing, a side effect tends to mean altering some state outside the environment in consideration.


You may see this most often in programming, where e.g.

  • a function that can only refer to its own local variables can never have side effects (...on other code) - (see also pure function)
  • a function that alters global state may easily have effect on other runs of that same function -- or on anything else that happens to rely on it.
  • a function that shares state with others by design
alters shared state tends to be more intentional, done with



More practically, you usually talk about side effects to talk about when you share state, particularly when you unintentionally share state, and particularly when this can lead to issues like race conditions.


...and also when intropducting the abstractions we introduce to avoid such issues.


Such abstractions include

clearer/singular owners of state,
exclusive access to state,
or e.g. concurrent data structures, which exist to tame such issues.


Side effects are not necessarily negative, in that if there is an implicit/explicit contract pointing out how to deal with them. For example, one approach to concurrent data structures exist to have things work predictably even in the presence of side effects.



https://en.wikipedia.org/wiki/Side_effect_(computer_science)

Idempotency

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

In most contexts, the point of an idempotent operation is that it has no additional effect if it is done more than once.


In general, in machines

For a tactile example, consider an industrial machine has an separate 'on' and a separate 'off' button.

Regardless of the state before, if you press the 'on' button, it is going to be on afterwards
Regardless of the state before, if you press the 'off' button, it is going to be off afterwards
...also regardless of how often you press that same button afterwards.


In comparison, the power button on most computers toggles (and is only a request to change - unless held for 4 seconds as per ATX specs)

...so you won't know entirely for sure about its state afterwards
so it's not idempotent.
(the fact that that's because action depends on the current state just happens to be the main reason, not a defining part of this non-idempotency)


Stoplight crosswalk buttons are idempotent (with few exceptions)

in that you can press them more than once, but it'll just stay in 'requested' mode until the system acts on it.
(and sometimes nullipotent, meaning the light will schedule you regardless of whether you pressed, and it's there just to make you feel in control)
  • elevator call and floor buttons are idempotent (with few exceptions)
...even if people like to hold on to their magical thinking about special modes that make it go faster


In math, in computing


In math and computing, you're saying the output doesn't change if called more than once (with the same input parameters).

For example

  • anything that makes no changes
e.g. HTTP GET, HEAD, OPTIONS, TRACE
though people do abuse GET (e.g. allowing sets from GET requests, not just POST, is moderately common)
  • adding or removing an item to a set
e.g. HTTP DELETE
  • various string operations, e.g. uppercasing a string

...and may more than we usually don't name specifically.


Such a property happens to be useful, in design, and in talking about design, in particular APIs, because you're typically modeling what should happen on the other end, and are usually not inspecting or policing every step.

Say, it's really useful to know that if an identical operation is sent to a server several times, it leaves the server in the same intended state.

Around updating state, this can e.g. simplify complex state management to 'just retry until is says OK'.


For the HTTP examples,

It's just defined that way so that HTTP clients can assume certain things.
it's not that you can't break that, by mistake or even by choice. In particular GET is sometimes not idempotent in practice.



Around parallelism, this relates to things like reentrancy


Scope and namespaces

Encapsulation, Information Hiding

Exceptions