Binary files, text files: Difference between revisions

From Helpful
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
Line 2: Line 2:
===What do these terms even mean?===
===What do these terms even mean?===


'''A "binary file" is, arguably, a file were there is useful data that is ''not'' text - and where that's probably ''most'' of the contents.'''


'''Binary data''' / '''binary file''', usually means raw data that can take any values, and often also helps point out that it is not trivially human-readable, or human-editable. All e.g. unlike (plain) text.
Pragmatically,
* text file = "All data is useful as text"
: characters in a sequence that you could edit at will in the simplest types of "characters after another" style editor
: human-interpretable, human-editable


{{comment|(Note that there is no direct relation to binary in the on-off, base-two sense. '''Byte data''' may be a little clearer (or not) in that it avoids that association)}}
* binary file = "not just text". It's a catch-all.
: a binary file is one you probably can't edit without severely breaking the present structure
:: and where it probably wouldn't occur to you, e.g. because the most useful data isn't text to start with.
: probably not human-readable, probably not human-editable


<!--
Even if text is involved, you can't be entirely sure of how to interpret or edit it without
parsing the file according to whatever standard the file is encoded to (which may be de facto, or even non-portable serialization).
-->




Even that needs footnotes, and we haven't even gotten technical yet.


'''A text file is, arguably, a file where there the only useful data is text.'''


There is arguably no such thing as ''plain'' text.
'Binary' seems to come from a time before a lot of different file formats existed,
That is, there are too many variations, that are indistinguisable except for guessing hard.
where computer use was computer programming,
and where we mostly had code that humans wrote,  
and code in compiled, machine-readable form.


The compiler output was ofetn called 'the binary', and that is still used.
So arguably it's short for 'a binary executable' or some such term.








If a file or (byte)string contains only text (particularly if in a common coding like ASCII, ISO8859, UTF8) it would often be called '''plain text'''.




<!--
'Binary data' or 'binary file' is actually a fairly empty and dumb name, because in this context it means "could be anything, but not just text".
-->
<!--
And more pedantically, everything is just as much made of ones and zeroes as anyhting else when stored, ''and'' [[the ones and zeroes thing is a dumb trope|we ever look at data that way to start with]].
-->
<!--
Even if text is involved, you can't be entirely sure of how to interpret or edit it without
parsing the file according to whatever standard the file is encoded to (which may be de facto, or even non-portable serialization).
-->


A '''bytestring''' (sometimes binary string) is a sequence of bytes that can contain any value, not just readable characters.
<!--(More structured documents formats have solved this decades ago)-->


Around C/C++ and some others, this also implies mean that the length is stored separately, because C's historic strings are null-terminated, which does ''not'' allow you to have strings containing the value 0x00 (NUL) without doing that.
<!--
If a file or (byte)string contains only text (particularly if in a common coding like ASCII, ISO8859, UTF8) it would often be called '''plain text'''.
-->






A '''string''' in the wide sense refers to a string/array of values but {{comment|(since we have words like array and list)}} it usually refers specifically to a string of readable characters (unless terms like bytestring are used).
<!--
In programming
* A '''string''' in the wide sense refers to a array of values
: ''usually'' to a string of readable characters (unless terms like bytestring are used).
:: {{comment|(...in part because we have words like array and list for numbers and other things)}}


* A '''bytestring''' (sometimes binary string) is a sequence of bytes that can contain any value, not just readable characters.
: Around C/C++ and some others, a string is terminated by a ''value'' -- which means that value cannot appear in the data. That means that for bytestrings, you must store the length separately.




'''A binary''' usually refers to a program in executable (and often compiled) form.
-->
 
Seems to usually be considered a short for 'a binary executable'.

Revision as of 14:10, 16 January 2024

What do these terms even mean?

Pragmatically,

  • text file = "All data is useful as text"
characters in a sequence that you could edit at will in the simplest types of "characters after another" style editor
human-interpretable, human-editable
  • binary file = "not just text". It's a catch-all.
a binary file is one you probably can't edit without severely breaking the present structure
and where it probably wouldn't occur to you, e.g. because the most useful data isn't text to start with.
probably not human-readable, probably not human-editable


Even that needs footnotes, and we haven't even gotten technical yet.


'Binary' seems to come from a time before a lot of different file formats existed, where computer use was computer programming, and where we mostly had code that humans wrote, and code in compiled, machine-readable form.

The compiler output was ofetn called 'the binary', and that is still used. So arguably it's short for 'a binary executable' or some such term.