Newline

上一篇 / 下一篇  2008-01-28 17:42:59 / 个人分类:技术小笺

f,P gG,w0bs5oD0为了在自己的PC下也能运行Linux下的相关程序和脚本,在Windows XP下装了个cygwin。一开始没有注意到Unix文本文件和DOS/Windows文本文件的格式区别,产生了一些问题。比如同样的Shell脚本,在Windows下用Notepad编辑后,在cygwin下运行将产生错误:51Testing软件测试网,mjS3c#\N

51Testing软件测试网W5DUo]v~m

./test.sh: line 2: $'\r': command not found51Testing软件测试网i)oy`r[
./test.sh: line 3: syntax error near unexpected token `$'\r''
&t8ZS mT.N%Ya9g051Testing软件测试网bg I4\sJ
这个问题不存在于通过scp命令从Linux服务器远程拷贝到Windows下的同样的文本文件。51Testing软件测试网C-J.i$dAgw:y%D
这就是所谓的Newline问题。因此学习了一下Wiki百科全书的相关条目。下面是一些摘录:
]h$o!c1By3I s0
RR(m6Y]0Adapted from: http://en.wikipedia.org/wiki/Newline51Testing软件测试网 Y]X#aX5~
51Testing软件测试网G4RAH%N
In computing, a newline (also known as a line break or end-of-line / EOL character) is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the text immediately preceding the newline. The actual codes representing a newline vary across hardware platforms and operating systems, which can be a problem when exchanging data between systems with different representations.51Testing软件测试网S0?8Ky3r(s

Q L9g2d"N7}WI0There is also some confusion as to whether newlines terminate or separate lines. If a newline is considered a separator, there will be no newline after the last line of a file. The general convention on most systems is to add a newline even after the last line, i.e., to treat newline as a line terminator. Some programs have problems processing the last line of a file if it isn't newline terminated. Conversely, programs that expect newline to be used as a separator will interpret a final newline as starting a new (empty) line. This can result in a different line count being reported for the file, but is otherwise generally harmless.

"GMqB~-m `.]0

NN5q6H]0Software applications and operating systems usually represent a newline with one or two control characters:51Testing软件测试网DD8WB&j2}i

Systems based on ASCII or a compatible character set use either LF (Line feed, 0Ah) or CR (Carriage Return, 0Dh) individually, or CR followed by LF (CR+LF, 0Dh 0Ah); see below for the historical reason for the CR+LF convention. These characters are based on printer commands: The line feed indicated that one line of paper should feed out of the printer, and a carriage return indicated that the printer carriage should return to the beginning of the current line.
iY0CDV sW M [.J!EN0LF: Multics, Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS X, etc.), BeOS, Amiga, RISC OS, and others51Testing软件测试网!x%v K'j(qx9k9P!m {'\
CR+LF: DEC RT-11 and most other early non-Unix, non-IBM OSes, CP/M, MP/M, DOS, OS/2, Microsoft Windows
Ko el{G0CR: Commodore machines, Apple II family and Mac OS up to version 951Testing软件测试网7~ v)i n F \2Pm

The C programming language provides the escape sequences '\n' (newline) and '\r' (carriage return). However, contrary to popular belief, these are in fact not required to be equivalent to the ASCII LF and CR control characters. The C standard only guarantees two things:

T!Cb+y(dZ6v01. Each of these escape sequences maps to a unique implementation-defined number that can be stored in a single char value.51Testing软件测试网D$\ ? d ze4G e L
2. When writing a file in text mode, '\n' is transparently translated to the native newline sequence used by the system, which may be longer than one character. (Note that a C implementation is allowed to not store newline characters in files. For example, the lines of a text file could be stored as rows of a SQL table or as fixed-length records.) When reading in text mode, the native newline sequence is translated back to '\n'. In binary mode, the second mode of I/O supported by the C library, no translation is performed, and the internal representation of any escape sequence is output directly.
wL"['[}7N0
g |!^z D8`0

v y-_Wm)Cj3J ]+w0The different newline conventions often cause text files that have been transferred between systems of different types to be displayed incorrectly. For example, files originating on Unix or Apple Macintosh systems may appear as a single long line on a Windows system. Conversely, when viewing a file from a Windows computer on a Unix system, the extra CR may be displayed as ^M at the end of each line or as a second line break.51Testing软件测试网^BdG6k+O(l

ifk&H%k.Z!VL1l8j0Conversion utilities51Testing软件测试网3K)^^)l]{ _"es
1. Windows下的UltraEdit编辑器51Testing软件测试网q9[(M-K6gj7R
2. Some UNIX utilities: dos2unix, unix2dos, mac2unix, unix2mac, mac2dos, dos2mac
@*tT L}cdJN d$YO03. grep utility51Testing软件测试网+@ik'Y?k
   grep -PL '\r\n' myfile.txt # show UNIX style file (LF terminated)
W&a-ZzVS0   grep -Pl '\r\n' myfile.txt # show DOS style file (CRLF terminated)
5eZ dR@#i04. tr utility51Testing软件测试网m{zJ3`
   tr -d '\r' < inputfile > outputfile # convert a DOS file to UNIX file
&iD"eR1nLFy!jZ05. others
]1r5[M5n6d%W0sed -e 's/$/\r/'inputfile>outputfile  # UNIX to DOS  (adding CRs)51Testing软件测试网H2dbw0~\H
sed -e 's/\r$//'inputfile>outputfile  # DOS  to UNIX (removing CRs)
%f`?)G8[0perl -p -e 's/(\r\n|\n|\r)/\r\n/g'inputfile>outputfile  # Convert to DOS
G;[X:D'w{:mLI0perl -p -e 's/(\r\n|\n|\r)/\n/g'  inputfile>outputfile  # Convert to UNIX
4eZ~%Lb.J0perl -p -e 's/(\r\n|\n|\r)/\r/g'  inputfile>outputfile  # Convert to old Mac
s!f}Txm|0
8Z:Z8b7V%LfR L0就总结这么多吧,很基本也很重要。51Testing软件测试网}}_#`H2CE.R
51Testing软件测试网0EM%? A;k&e


TAG:

 

评分:0

我来说两句

日历

« 2024-05-12  
   1234
567891011
12131415161718
19202122232425
262728293031 

数据统计

  • 访问量: 22553
  • 日志数: 38
  • 文件数: 1
  • 书签数: 3
  • 建立时间: 2007-08-14
  • 更新时间: 2008-05-01

RSS订阅

Open Toolbar