Newlines
For contexts, the concepts of:
- Carriage return (CR)
- the carriage refers to a typewriter(/printer) carriage, the thing that will put things on paper and being moved across the paper
- carriage return means 'return it to the start of the line
- i.e. movement across
- Line feed (LF) is used to shifting the paper to the next line
- i.e. movement down
In typewriters, teleprinters, and earlier line printers,
there are reasons you might do one without the other.
So while you often did both 'move both the carriage left and the paper down' (CR+LF), the two were kept separate.
So in that context, a 'newline', 'line ending', 'end of line',
'next line' (NEL) or 'line break' could mean either or usually both, CR+LF
These days, we typically mean both at the same time.
ASCII byte values
Line separators in plain-text files are encoded by:
- \n is LF (LineFeed): 0x0a in hex, 10 in decimal, 12 in octal
- \r is CR (Carriage Return): 0x0d in hex, 13 in decimal, 15 in octal
How they are used
Different systems use the two in diferent ways:
- Most unices use LF (\n, 0x0A) by itself.
- DOS and various windows programs use CRLF (\r\n, 0x0D 0x0A)
- Macs up to OS9 used CR (\r, 0x0A) by itself(verify) -- OSX sometimes uses this, sometimes unix style.
I have seen mention of \n\r, but this seems to be confusion about which character is which.
Note that most programming languages use newline to refer specifically mean LF (\n, 0x0a), regardless of OS.
This applies mostly to output, though there may also be some newline handling/translating code for reading, most commonly for file reading.
Many windows programs will understand both \r\n and \n, though some won't.
In *nix, many utilities will read lines, absorb CRLF and print as LF without you needing to worry about it. Those that do not often show CR as ^M. Many programs do not know about the Mac way.
Translating
There are some utilities to convert these, though most may not be installed. It can be useful to know some tricks with standard utilities.
General solutions aren't very simple.
Very specific cases often are, though. For example, if you have a wordlist that you want only LFs in, you can do:
cat wordlist.crlf | tr -s '\r' '\n' > wordlist.lfonly
This is not a general solution: without that squeeze you double-space the file, and with it you remove empty lines, so it would be more accurate to convert every byte sequence of '\r\n' into '\n'.
Applications that that do not absorb \r usually will see it as just another (control) character, so you can usually say that you want to remove a \r when it is last on a line.
For example, the following effectively converts crlf to lf:
sed 's/\r$//'
For more automatic handling and conversion, from and to all formats, you'll have to detect what a file actually contains.
See also
- http://en.wikipedia.org/wiki/Newline (goes into a lot more detail)
Related software:
- http://ccrma-www.stanford.edu/~craig/utility/flip/
- fixdos' crlf (only CR→CRLF and CRLF→CR) (verify)
- dos2unix ((verify)maybe. Its man page doesn't actually describe what it does so I suspect it does various others things)