diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2002-02-12 21:25:41 +0000 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2002-02-12 21:25:41 +0000 |
commit | 9832a235c5d809b4cba1d6ff2b3e20f4eda2b396 (patch) | |
tree | d55ab539650e9c4fa5f7d8b143403ba78969114d /doc/src | |
parent | c16ef167df36a6364a981e1a9f61a8ab8003fe94 (diff) | |
download | postgresql-9832a235c5d809b4cba1d6ff2b3e20f4eda2b396.tar.gz postgresql-9832a235c5d809b4cba1d6ff2b3e20f4eda2b396.zip |
Modify COPY TO to emit carriage returns and newlines as backslash escapes
(backslash-r, backslash-n) for protection against newline-conversion
munging. In future we will also tweak COPY FROM, but this part of the
change should be backwards-compatible. Per pghackers discussion.
Also, update COPY reference page to describe the backslash conversions
more completely and accurately.
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/ref/copy.sgml | 142 |
1 files changed, 97 insertions, 45 deletions
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 850af1f0775..b4a226876a9 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -1,5 +1,5 @@ <!-- -$Header: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v 1.27 2002/01/20 22:19:56 petere Exp $ +$Header: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v 1.28 2002/02/12 21:25:34 tgl Exp $ PostgreSQL documentation --> @@ -74,7 +74,7 @@ COPY [ BINARY ] <replaceable class="parameter">table</replaceable> [ WITH OIDS ] <term><replaceable class="parameter">filename</replaceable></term> <listitem> <para> - The absolute Unix file name of the input or output file. + The absolute Unix path name of the input or output file. </para> </listitem> </varlistentry> @@ -225,7 +225,7 @@ ERROR: <replaceable>reason</replaceable> By default, a text copy uses a tab ("\t") character as a delimiter between fields. The field delimiter may be changed to any other single character with the keyword phrase USING DELIMITERS. Characters - in data fields which happen to match the delimiter character will + in data fields that happen to match the delimiter character will be backslash quoted. </para> @@ -265,8 +265,8 @@ ERROR: <replaceable>reason</replaceable> by the <application>PostgreSQL</application> user (the user ID the server runs as), not the client. <command>COPY</command> naming a file is only allowed to database - superusers, since it allows writing on any file that the backend has - privileges to write on. + superusers, since it allows reading or writing any file that the backend + has privileges to access. <tip> <para> @@ -297,57 +297,109 @@ ERROR: <replaceable>reason</replaceable> <title>File Formats</title> <refsect2> <refsect2info> - <date>2001-01-02</date> + <date>2002-02-12</date> </refsect2info> <title>Text Format</title> <para> - When <command>COPY TO</command> is used without the BINARY option, - the file generated will have each row (instance) on a single line, with each - column (attribute) separated by the delimiter character. Embedded - delimiter characters will be preceded by a backslash character - ("\"). The attribute values themselves are strings generated by the - output function associated with each attribute type. The output - function for a type should not try to generate the backslash - character; this will be handled by <command>COPY</command> itself. + When <command>COPY</command> is used without the BINARY option, + the file read or written is a text file with one line per table row. + Columns (attributes) in a row are separated by the delimiter character. + The attribute values themselves are strings generated by the + output function, or acceptable to the input function, of each + attribute's data type. The specified null-value string is used in + place of attributes that are NULL. </para> <para> - The actual format for each instance is - <programlisting> -<attr1><<replaceable class=parameter>separator</replaceable>><attr2><<replaceable class=parameter>separator</replaceable>>...<<replaceable class=parameter>separator</replaceable>><attr<replaceable class="parameter">n</replaceable>><newline> - </programlisting> - Note that the end of each row is marked by a Unix-style newline - ("\n"). <command>COPY FROM</command> will not behave as desired - if given a file containing DOS- or Mac-style newlines. + If WITH OIDS is specified, the OID is read or written as the first column, + preceding the user data columns. (An error is raised if WITH OIDS is + specified for a table that does not have OIDs.) </para> <para> - The OID is emitted as the first column if WITH OIDS is specified. - (An error is raised if WITH OIDS is specified for a table that does not - have OIDs.) + End of data can be represented by a single line containing just + backslash-period (<literal>\.</>). An end-of-data marker is + not necessary when reading from a Unix file, since the end of file + serves perfectly well; but an end marker must be provided when copying + data to or from a client application. </para> <para> - If <command>COPY TO</command> is sending its output to standard - output instead of a file, after the last row it will send a backslash ("\") - and a period (".") followed by a newline. - Similarly, if <command>COPY FROM</command> is reading - from standard input, it will expect a backslash ("\") and a period - (".") followed by a newline, as the first three characters on a - line to denote end-of-file. However, <command>COPY FROM</command> - will terminate correctly (followed by the backend itself) if the - input connection is closed before this special end-of-file pattern is - found. + Backslash characters (<literal>\</>) may be used in the + <command>COPY</command> data to quote data characters that might otherwise + be taken as row or column delimiters. In particular, the following + characters <emphasis>must</> be preceded by a backslash if they appear + as part of an attribute value: backslash itself, newline, and the current + delimiter character. </para> <para> - The backslash character has other special meanings. A literal backslash - character is represented as two - consecutive backslashes ("\\"). A literal tab character is represented - as a backslash and a tab. (If you are using something other than tab - as the column delimiter, backslash that delimiter character to include - it in data.) A literal newline character is - represented as a backslash and a newline. When loading text data - not generated by <application>PostgreSQL</application>, - you will need to convert backslash - characters ("\") to double-backslashes ("\\") to ensure that they - are loaded properly. + The following special backslash sequences are recognized by + <command>COPY FROM</command>: + + <informaltable> + <tgroup cols="2"> + <thead> + <row> + <entry>Sequence</entry> + <entry>Represents</entry> + </row> + </thead> + + <tbody> + <row> + <entry><literal>\b</></entry> + <entry>Backspace (ASCII 8)</entry> + </row> + <row> + <entry><literal>\f</></entry> + <entry>Form feed (ASCII 12)</entry> + </row> + <row> + <entry><literal>\n</></entry> + <entry>Newline (ASCII 10)</entry> + </row> + <row> + <entry><literal>\r</></entry> + <entry>Carriage return (ASCII 13)</entry> + </row> + <row> + <entry><literal>\t</></entry> + <entry>Tab (ASCII 9)</entry> + </row> + <row> + <entry><literal>\v</></entry> + <entry>Vertical tab (ASCII 11)</entry> + </row> + <row> + <entry><literal>\</><replaceable>digits</></entry> + <entry>Backslash followed by one to three octal digits specifies + the character with that numeric code</entry> + </row> + </tbody> + </tgroup> + </informaltable> + + Presently, <command>COPY TO</command> will never emit an octal-digits + backslash sequence, but it does use the other sequences listed above + for those control characters. + </para> + <para> + Never put a backslash before a data character <literal>N</> or period + (<literal>.</>). Such pairs will be mistaken for the default null string + or the end-of-data marker, respectively. Any other backslashed character + that is not mentioned in the above table will be taken to represent itself. + </para> + <para> + It is strongly recommended that applications generating COPY data convert + data newlines and carriage returns to the <literal>\n</> and + <literal>\r</> sequences respectively. At present + (<productname>PostgreSQL</productname> 7.2 and older versions) it is + possible to represent a data carriage return without any special quoting, + and to represent a data newline by a backslash and newline. However, + these representations will not be accepted by default in future releases. + </para> + <para> + Note that the end of each row is marked by a Unix-style newline + ("\n"). Presently, <command>COPY FROM</command> will not behave as + desired if given a file containing DOS- or Mac-style newlines. + This is expected to change in future releases. </para> </refsect2> |