Security notes / Message signing notes: Difference between revisions

From Helpful
Jump to navigation Jump to search
Line 145: Line 145:
<!--
<!--


When the purpose is for two parties to be fairly sure we're talking about the same data,
When you send a message,
using a short reference,
you can also send a small piece of extra extra information that lets the other side check that we are talking about the same message.
people start saying 'digest' instead.




This is essentially the same as the data transfer integrity check,
This is typically just a hash value.  
but signals that the ''purpose'' is to refer to a unique message, almost like an identifier of that data.


When its ''purpose'' is two parties checking we're talking about the same data,
people start saying 'message digest' or just 'digest'.


Sometimes specifically '''message digest''',
but there are other things


This is essentially the same idea as the data transfer integrity check,
but signals that the ''practical purpose'' is to refer to a unique message,
almost like an identifier of that data.




There are similar uses.
Say, package installation managers may communicate the index of all packages
and mention their digest, as both a reference of "you can fetch the content via this because our package server answers to it"
and "after download you can verify that content's validity via this".


{{comment|(MIC (Message Integrity Code) is also sometimes used, but 'integrity' may suggests more than it really does, and e.g. RFC 4949 discourages using MIC and 'integrity' for that reason. Also, some people mix MIC and MAC.  So consider that terms like 'checksum', 'error detection code', 'hash', 'keyed hash', 'Message Authentication Code' (MAC), 'protected checksum' are more clearly distinct terms.)}}
The "refers to a known message" idea is the same, it's just a similar, practical use.


You should assume that digests like this have no security value, particularly when sent over the same channel,
for the dumb reason that anyone who can tamper with one can probably tamper with the other,
and there is no (often cannot be a) shared secret that makes it hard to fake.
(When that matters, packages ''will'' be signed, often with asymmetric cryptography.)
The last is also why some argue it is useful to use more specific terms, like maybe MIC (Message Integrity Code) -- but even 'integrity' might suggests more than it really does, and e.g. RFC 4949 discourages using MIC and 'integrity' for that reason. {{comment|(Also, some people mix MIC and MAC.  So consider that terms like 'checksum', 'error detection code', 'hash', 'keyed hash', 'Message Authentication Code' (MAC), 'protected checksum' are more clearly distinct terms.)}}
-->
-->



Revision as of 16:51, 18 March 2024

Security related stuff.


Linux - PAM notes · SELinux

Securing services


A little more practical


More techincal waffling

Message signing notes · Hashing notes ·
Auth - identity and auth notes
Encryption - Encryption notes · public key encryption notes · data-at-rest encryption ·pre-boot authentication · encrypted connections

Unsorted - · Anonymization notes · website security notes · integrated security hardware · Glossary · unsorted


Message integrity

When you read notes on hash functions, you might find notes that the hash function is hard to reverse - it's hard to guess the message based on the hash.


This, in itself, is useful for message integrity: you do a hash on the source side, and make the receiving side do the same hash, and compare them.

This is generally a good indicator of both success on success, and of failure on failure, needs only a tiny amount of adde transfer.

Requirement:

  • speed: as fast as possible
  • robustness: hash should have avalanche effect

Doesn't care much about:

  • keyspace size
  • being a secure hash (but it's sometimes a nice bonus)



Checks against obvious garbled data

Detecting unintentional corruption of transmitted data is useful to avoid some very basic misbehaviour, and doesn't have to be complex at all to significantly lower the chances of that happening.


There are some things so simple they are barely hashes, but serve the same function. Examples:

four different methods, but all necessarily simple. Odd is arguably the best, but not by much.
not actually used a lot - they give so little protection, and no no automatic fix, that you're better off adding your own checking (and possible correction) in your protocol
  • GPS, specifically the NMEA 0183 text protocol
per line, all characters are XORed together. The resulting byte is appended as hex, e.g. the 06 in $GPGSA,A,2,,,,,,,,,,,,,50.0,50.0,50.0*06
which is weak protection, but enough to ignore most garbled lines
  • Barcodes tend to have a check digit. That digit does not carry any useful data, it just lets you check that some math involving all digits checks out. This avoids a lot of incorrect scans
For example, ISBNs (used on books) have their last digit as a check digit


More serious but still relatively simple hashes include:

  • IPv4 uses a 16-bit checksum (of the header only) that is basically just addition[1]
  • TCP over IPv4 uses a 16-bit checksum (of header and payload) that is basically just addition[2]
  • TCP over IPv6 uses a 16-bit checksum (of header and payload) that is basically just addition[3]
  • XMODEM used a sum of bytes, a later variant used 16-bit CRC instead
  • YMODEM used a 16-bit CRC
  • ZMODEM used 16-bit or 32-bit CRC


Most of these are mainly meant to make fairly likely that a garbled transmission is detected as bad, but...

not focusing on the best probability of that detection (CRC is decent, the rest is not)
not error correction (without retransmission)
not strong against ill intent


Slightly more robust variants

On keyspace size

Keyspace size matters in that in that how likely you are likely to get the right hash with wrong data is a probability that lowers with larger spaces.

Luckily, said probabilities lower very quickly. While 16 bits probably isn't enough, and 32-bit is meh, there are simple and fast hashes like MD5 (128 bit) or SHA-1 (160 bit) that are arguably already overkill for this use, even if you're verifying gigabytes of data at a time.


Say,

That serial port check bit has a roughly 1 in 2 chance. Which is why we rarely use it.
that GPS byte, no matter how much smarter you could calculate it, when garbled still has a roughly 1 in 256 chance in matching.



On avalance effect

The avalance effect roughly means that the calculation has some reason that a small, local change leads to larger changes throughout the hash.

This protects against errors that happen locally any might check out for the wrong reasons.


For example, one of the simplest hash functions is a simple bytewise XOR, which is usually blindingly fast to execute, But if, say, data was transmitted over a 7-bit serial line (that chopped off the highest bit in bytes), XOR would check out perfectly, because the XOR would also have that bit missing. 0 XOR 0 = 0.

Sure, that's a pathological example, but hashes with at least some avalance effect are simple to make, and solve this particular issue.


It's also why throwing a cryptographic hashes at a problem can rarely hurt, even if you otherwise don't need it.

message authenticity; message signing; digital signatures

For repeated context, message integrity is just masking a hash of the also-public contents.


Consider what happens if you include a shared secret in that hashing process, and that hashing process only?

Add something you won't send, but the other party already knows?

If the other side finds that the public content plus that shared key corresponds to the hash also sent, that strongly asserts that it must have come from someone else with that secret.


In a simple symmetric key scheme, this might be as simple as appending public data with a secret.

But keeping that secret a secret is annoying for dumbly practical reasons, so we actually implement this with asymmetric keys, because then we can give out the means of verification without giving information about signing it.

Yet practically the result is still much the same: I can verify that this must have come from a specific source

now because verifying with the public key very strongly asserts it comes from the according private key


Digital signature refers to the same thing.

It's the term we seem to prefer when we're talking about specific implementations / setups, and often more automated ones, like

documents we gave to each other that the viewer can check for us,
or software we downloaded that the installation manager can check for us.
automatic updates being verified as authentic


https://en.wikipedia.org/wiki/Digital_signature

Hash as identifier

Message digest

Supporting data structures

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Authenticity via hashes

See also:




Code signing

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.