Newlines

From Helpful
(Redirected from Newline)
Jump to navigation Jump to search

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


For contexts, the concepts of:

the carriage refers to a typewriter(/printer) carriage, the thing that will put things on paper and being moved across the paper
carriage return means 'return it to the start of the line
i.e. movement across
  • Line feed (LF) is used to shifting the paper to the next line
i.e. movement down


In typewriters, teleprinters, and earlier line printers, there are reasons you might do one without the other.

So while you often did both 'move both the carriage left and the paper down' (CR+LF), the two were kept separate.


So in that context, a 'newline', 'line ending', 'end of line', 'next line' (NEL) or 'line break' could mean either or usually both, CR+LF

These days, we typically mean both at the same time.



ASCII byte values

Line separators in plain-text files are encoded by:

  • \n is LF (LineFeed): 0x0a in hex, 10 in decimal, 12 in octal
  • \r is CR (Carriage Return): 0x0d in hex, 13 in decimal, 15 in octal


How they are used

Different systems use the two in diferent ways:

  • Most unices use LF (\n, 0x0A) by itself.
  • DOS and various windows programs use CRLF (\r\n, 0x0D 0x0A)
  • Macs up to OS9 used CR (\r, 0x0A) by itself(verify) -- OSX sometimes uses this, sometimes unix style.

I have seen mention of \n\r, but this seems to be confusion about which character is which.


Note that most programming languages use newline to refer specifically mean LF (\n, 0x0a), regardless of OS. This applies mostly to output, though there may also be some newline handling/translating code for reading, most commonly for file reading.

Many windows programs will understand both \r\n and \n, though some won't.

In *nix, many utilities will read lines, absorb CRLF and print as LF without you needing to worry about it. Those that do not often show CR as ^M. Many programs do not know about the Mac way.



Translating

There are some utilities to convert these, though most may not be installed. It can be useful to know some tricks with standard utilities.


General solutions aren't very simple.

Very specific cases often are, though. For example, if you have a wordlist that you want only LFs in, you can do:

cat wordlist.crlf | tr -s '\r' '\n' > wordlist.lfonly

This is not a general solution: without that squeeze you double-space the file, and with it you remove empty lines, so it would be more accurate to convert every byte sequence of '\r\n' into '\n'.


Applications that that do not absorb \r usually will see it as just another (control) character, so you can usually say that you want to remove a \r when it is last on a line. For example, the following effectively converts crlf to lf:

sed 's/\r$//'


For more automatic handling and conversion, from and to all formats, you'll have to detect what a file actually contains.


See also

Related software: