Automation, remote management, configuration management

From Helpful
Jump to navigation Jump to search
Some fragmented programming-related notes, not meant as introduction or tutorial

Data: Numbers in computers ·· Computer dates and times ·· Data structures

Wider abstractions: Programming language typology and glossary · Generics and templating ·· Some abstractions around programming · · Computational complexity theory notes · Synchronous, asynchronous · First-class citizen

Syntaxy abstractions: Constness · Memory aliasing · Binding, assignment, and such · Closures · Context manager · Garbage collection

Sharing stuff: Communicated state and calls · Locking, data versioning, concurrency, and larger-scale computing notes

Language specific: Python notes ·· C and C++ notes · Compiling and linking ·· Lua notes

Teams and products: Programming in teams, working on larger systems, keeping code healthy · Benchmarking, performance testing, load testing, stress testing, etc. · Maintainability

More applied notes: Optimized number crunching · File polling, event notification · Webdev · GUI toolkit notes

Mechanics of duct taping software together: Automation, remote management, configuration management · Build tool notes · Installers



Even in cloudy environments, the platform only really creates the instance.

That might be mostly populated with software (if image-based).

In some cases it makes sense to also have configuration in there, but in a lot of cases, that makes it too specific, and you want to inject the configuration to say what precisely each instance is for.


which often means you need to hand some things into instancing, and/or poke the instance once you started it up.


There are a lot of vague, overlapping terms that mean "managing the inside of VMs/containers", among them

  • configuration management
  • infrastructure as code
  • automation
  • remote execution
  • data-driven orchestration
  • orchestration
  • event-driven infrastructure
  • reactive provisioning


It all just refers to various details of duct taping things together, and when it comes to software, different packages may be aimed at more or less area of what you might include in that.

It may be more useful to describe the things software isn't for,

stuff you end up still doing manually on an everyday basis
runtime configuration changes (but this may be something you avoid in the first place, e.g. in lieu of restarting containers)


Entirely regular jobs

Cron

Cron runs programs at scheduled times.

It is commonly used to do maintenance like cleaning/checking databases, run things like updatedb (for the locate command), collect data at regular intervals, rebuild reports, and anything else that is handy when regular and/or scheduled.


Cron is basic and many implementations are stateless - it runs a thing when it's time, and that's it.

Some cron daemons will catch up on jobs scheduled to be run when the computer was off, some won't(verify).


If you want details on runs, rerun failed things, etc. then look at something like Airflow.


crontab

Crontab is the configuration file for cron.

Some cron daemons have only a system crontab that only an admin can alter - at /etc/crontab
Others additionally support user crontabs (in which case root's crontab acts as the system one).


To edit crontab, it is suggested you run

crontab -e

This will

  • pick up the user crontab where relevant (verify)
  • create a temporary file to be edited
  • run your configured EDITOR
  • check the last line isn't missing a newline (usually(verify))
  • and cause your new crontab to be loaded into cron
some crons will reload based on mtime anyway, but others may not

(...but yeah, in a lot of cases directly editing /etc/crontab is almost the same.)


Notes:

  • (...which is easier than editing edit /etc/crontab and restarting/reloading the cron daemon)
  • If your cron daemon supports user crontabs, this command will work for non-root users too.
if you're root, you can edit a specific users's crontab with crontab -u username -e


crontab format and examples

The lines consist of whitespace-separated

  • minute
0-59
  • hour
0-23
  • day of month
1-31
  • month
1-12
  • day of week
three-letter english names: sun, mon, tue, wed, thu, fri, sat
or 0-6 for sun through sat. On most systems 7 is also valid and means sun
  • the command to run


Also: (TODO: detail support of each)

  • * is a wildcard meaning it should match all values
  • n,m,o means match these values only
  • */n means match every-nth value
  • n-m means match every value in a range

The syntax was extended over time. Most of the below seems standard or just widely supported(verify), but some cron daemons are more permissive than others, so check that things work. For example, 1-5,31-35 would be accepted by some and rejected by others.


Examples:

#min  hour dom m dow   cmd
*     *      * * *     echo "Every minute"
*/4   *      * * *     echo "Every four minutes"
*/30  *      * * *     echo "Every thirty minutes"
1,31  *      * * *     echo "Every thirty minutes (more adjustable)"
10    0,8,16 * * *     echo "Thrice daily - 0:10AM, 8:10AM, 16:10PM"
5     4      * * Sat   echo "Saturday 04:05 maintenance"
0     *      * * *     echo "Every hour, on the hour"
0     0      * * *     echo "Midnight daily"
0     0     13 * 5     echo "Friday the thirteenth (midnight)"
0     9-17   * * 1-5   echo "Whole hours between 9AM and 5PM, from monday 'til friday"
30    *      1 * *     echo "Every half-past-an-hour on the first day of each month"
0     0      2 6 *     echo "It's June second, John's birthday"
0     4      1 * *     echo "First of every month, 4AM"
0     0      2 6 1     echo "June second, but only if it's a monday"
1-5/2 0      * * *     echo "One, three, and five minutes past each hour"


Notes:

  • you can often set extra variables here, e.g. anacron has
these seem to be set in regular shell's environment, so you can set PATH, SHELL, and such (verify)
some with special meaning to anacron itself:
MAILTO, MAILFROM
RANDOM_DELAY
START_HOURS_RANGE


  • Since stdout of cronjobs usually gets mailed to you, entries like the above ought to be a good test of whether they get triggered at the times you expect.
  • Don't forget a newline on the last line, or that line may not be used.
Editing via crontab -e will usually check that and complain


See also things like:

Running specific commands as specific users

Logging / mail

If a command produces output it is mailed to you. Which, unless you've done more setup, means a local mailbox.


Depending on the specific cron, it may only care about stdout or stderr or both. Check how yours works / is configured.

You may wish to do some redirection, e.g. throw away stdout but keep stderr.

>/dev/null  2>&1            # throw away stdout,               and mail stderr (by making it stdout)
>>/home/mydir/cronlog 2>&1  # append stdout in user's directory,   mail stderr
>/dev/null                  # throw away stdout explicitly
2>&1                        # Mail both stderr and stdout. Note that since both are probably buffered,
                            #  they can get interleaved


It seems that (at least for anacron(verify))

  • if there is a MAILTO in crontab, then it is sent to the listed address(es)
  • if there is a MAILTO="", no mail is sent
  • if there is no MAILTO, it defaults to the running user, which for system crontab will typically be root.

crons will typically invoke sendmail, so ensure that the mail server it ends up with can actually send to the internet at large (this is nontrivial to set up, and the most basic config of a mail server might only only deliver to local users).


Keep in mind:

  • mail daemons will reject overly large mail.
that means some huge dumps may never get delivered
Logging details to a file is the safer option, but you may still want to mail a summary.
Or to store these things elsewhere (e.g. intercepting sendmail calls is one option. Piping each relevant crontab line to a logger is another)
This also argues for sending watchdog emails.



Note: You can disable logging by starting the cron entry with a tilde (~) (verify)

Debugging

Cron and environment

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


Cron runs in a restricted environment, and scripts run as, well, scripts, so assume no implicit inclusion of profile/bashrc stuff.

This is intentional, good for security, and to avoid a mess.


Any script that needs more environment should either

  • draw it in explicitly (e.g. sourcing something explicitly)
  • be given that explicitly
some cron implementations allow env to be set in the crontab (global to all cronjobs)
the more generic solution is to put exports before the command, though this may be messy.
the cleaner solution may be to source such exports from a file.


While debugging you may like a temporary:

* * * * *  printenv > /tmp/cronenv

There's a suggestion here that you can get a good imitation of that environment to test scripts in outside of cron:

env - `cat /tmp/cronenv` /bin/sh


Note also that crontab is using /bin/sh which might e.g. be be dash rather than bash or whatever else you prefer.

So write your scripts with shebangs.

For loose commands to work under a specific shell,

  • you may wish to explicitly name a shell,
  • or use SHELL to have crontab to use a specific shell instead.

systemd timers

See Systemd_notes#timer

Airflow

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


Workflow management system - something like a fancier cron, allowing more complex tasks, has existing modules for common things you need to connect to, distributed runs, monitoring, per-task retries, messaging on failure.


Its parts:

  • Runs a scheduler daemon.

Execution can be local, or via celery or kuberenetes executor.

  • Needs a database backend (postgres or mysql; sqlite in tutorials) to store worker state and history.
  • Has a webinterface, mostly to show that history and some statistics.


A DAG(directed acyclic graph) groups tasks/operators, which let you have sequences, and conditions (having it be a dag avoids some weirdness) within an overall execution.


Example - a basic "run maintenance SQL on midnight":

from airflow import DAG
from airflow.operators.postres_operator import PostgresOperator

dag = DAG('basic-sql-dag',  
           default_args={'owner':'airflow', 'start_date':'2020-06-28'},
           schedule_interval='0 0 * * *')

maintenance_task = PostgresOperator(
                           dag=dag,
                           mysql_conn_id='mysql_default',
                           task_id='mysql_task',
                           sql='<path>/maint.sql',   params={'user_id': 5}
)

maintenance_task

Notes:

  • this DAG happens to have only a single task
in many real ones you may link together two or three
  • That SQL file will be a template processed by jinja, filling in {{ params.user_id }}.


Backfill and catchup

Because various people use this for what you'ld call ETL (e.g. "do work on data for a specified interval" like archiving old data, or compiling nontrivial daily reports), and because you may sometimes wish to pause a particular DAG (e.g. while changing or fixing something), there is often a need to deal with past intervals.

Pausing a DAG means tasks are still planned regularly, but not executed.


Backfill and catchup refer to executing DAGs for intervals planned in the past (which means their planned interval is passed in as data, which can be very useful for things like reports).

The two terms are sometimes used interchangeably. When used more specifically, catchup refer to automatically doing this for you as per airflow/DAG configuration, and backfill to you manually running for an interval using airflow backfill) (airflow backfill can also mark previous intervals as successful - useful e.g. when you just created a DAG and your start_date was a little further back than you were thinking)

Catchup can be disabled in a DAG when there is no need, or no point (e.g. tasks that actually do "process everything necessary").

You may sometimes care about depends_on_past: only run if the previously scheduled task instance succeeded.(verify)

[1]



More on its model

Operators are existing templates of common jobs, like BashOperator, PythonOperator, HTTPOperator, DockerOperator, BigQueryOperator, PostgresOperator, MySqlOperator, SlackAPIPostOperator, EmailOperator

You can often do everything you need with a few of these existing ones
Operators plus parameters become concrete tasks.

Hooks fetch extra things for a task, like

connections (made by airflow itself[2], avoids having to put auth in DAGs, and lets you reuse between tasks)
other necessary information, like fetching from S3. There are more generic ones, database/http fetches

Sensors expose details about long-running tasks.

XComs communicates data between tasks




See also:

On-demand, remote, and such

Puppet

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Platform: windows, *nix, osx

Dependencies:

  • being installed on all nodes(verify)
  • WinRM on windows, SSH on *nix (or an enterprise-only protocol instead of either?(verify))
  • (optional) PuppetDB

Puppet tries to keep checking whether things are in the desired state.


Chef

Dependencies: being installed on all nodes.


Configuration management, though somewhat more on the orchestration side(verify)


Ansible

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


From a distance, Ansible is basically a way of doing stuff on a lot of hosts,

specified via YAML(/INI?)

Ansible keeps an inventory of what it has done,

The free version doesn't check.


Ansible can, somewhat more easily than some others, be used for provisioning, though at scale you would probably always use more refined/integrated orchestration tools (say, kubernetes).

(But you might still use Ansible to set up things within new Kubernetes hosts)


Dependencies

Python on the controlling side.
SSH on *nix, WinRM on windows
(agentless in that while this requires config, it doesn't require install)

Salt/Saltstack

Dependencies:

  • being installed on all nodes
on *nix meaning python with a handful of modules
on windows is its own installer

Communication:

  • ZeroMQ



Configuration management tool.


CFEngine

Terraform

https://en.wikipedia.org/wiki/Terraform_(software)


WinRM