aboutsummaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2002-02-12 21:25:41 +0000
committerTom Lane <tgl@sss.pgh.pa.us>2002-02-12 21:25:41 +0000
commit9832a235c5d809b4cba1d6ff2b3e20f4eda2b396 (patch)
treed55ab539650e9c4fa5f7d8b143403ba78969114d /doc/src
parentc16ef167df36a6364a981e1a9f61a8ab8003fe94 (diff)
downloadpostgresql-9832a235c5d809b4cba1d6ff2b3e20f4eda2b396.tar.gz
postgresql-9832a235c5d809b4cba1d6ff2b3e20f4eda2b396.zip
Modify COPY TO to emit carriage returns and newlines as backslash escapes
(backslash-r, backslash-n) for protection against newline-conversion munging. In future we will also tweak COPY FROM, but this part of the change should be backwards-compatible. Per pghackers discussion. Also, update COPY reference page to describe the backslash conversions more completely and accurately.
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/ref/copy.sgml142
1 files changed, 97 insertions, 45 deletions
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 850af1f0775..b4a226876a9 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -1,5 +1,5 @@
<!--
-$Header: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v 1.27 2002/01/20 22:19:56 petere Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v 1.28 2002/02/12 21:25:34 tgl Exp $
PostgreSQL documentation
-->
@@ -74,7 +74,7 @@ COPY [ BINARY ] <replaceable class="parameter">table</replaceable> [ WITH OIDS ]
<term><replaceable class="parameter">filename</replaceable></term>
<listitem>
<para>
- The absolute Unix file name of the input or output file.
+ The absolute Unix path name of the input or output file.
</para>
</listitem>
</varlistentry>
@@ -225,7 +225,7 @@ ERROR: <replaceable>reason</replaceable>
By default, a text copy uses a tab ("\t") character as a delimiter
between fields. The field delimiter may be changed to any other single
character with the keyword phrase USING DELIMITERS. Characters
- in data fields which happen to match the delimiter character will
+ in data fields that happen to match the delimiter character will
be backslash quoted.
</para>
@@ -265,8 +265,8 @@ ERROR: <replaceable>reason</replaceable>
by the <application>PostgreSQL</application> user (the user ID the
server runs as), not the client.
<command>COPY</command> naming a file is only allowed to database
- superusers, since it allows writing on any file that the backend has
- privileges to write on.
+ superusers, since it allows reading or writing any file that the backend
+ has privileges to access.
<tip>
<para>
@@ -297,57 +297,109 @@ ERROR: <replaceable>reason</replaceable>
<title>File Formats</title>
<refsect2>
<refsect2info>
- <date>2001-01-02</date>
+ <date>2002-02-12</date>
</refsect2info>
<title>Text Format</title>
<para>
- When <command>COPY TO</command> is used without the BINARY option,
- the file generated will have each row (instance) on a single line, with each
- column (attribute) separated by the delimiter character. Embedded
- delimiter characters will be preceded by a backslash character
- ("\"). The attribute values themselves are strings generated by the
- output function associated with each attribute type. The output
- function for a type should not try to generate the backslash
- character; this will be handled by <command>COPY</command> itself.
+ When <command>COPY</command> is used without the BINARY option,
+ the file read or written is a text file with one line per table row.
+ Columns (attributes) in a row are separated by the delimiter character.
+ The attribute values themselves are strings generated by the
+ output function, or acceptable to the input function, of each
+ attribute's data type. The specified null-value string is used in
+ place of attributes that are NULL.
</para>
<para>
- The actual format for each instance is
- <programlisting>
-&lt;attr1&gt;&lt;<replaceable class=parameter>separator</replaceable>&gt;&lt;attr2&gt;&lt;<replaceable class=parameter>separator</replaceable>&gt;...&lt;<replaceable class=parameter>separator</replaceable>&gt;&lt;attr<replaceable class="parameter">n</replaceable>&gt;&lt;newline&gt;
- </programlisting>
- Note that the end of each row is marked by a Unix-style newline
- ("\n"). <command>COPY FROM</command> will not behave as desired
- if given a file containing DOS- or Mac-style newlines.
+ If WITH OIDS is specified, the OID is read or written as the first column,
+ preceding the user data columns. (An error is raised if WITH OIDS is
+ specified for a table that does not have OIDs.)
</para>
<para>
- The OID is emitted as the first column if WITH OIDS is specified.
- (An error is raised if WITH OIDS is specified for a table that does not
- have OIDs.)
+ End of data can be represented by a single line containing just
+ backslash-period (<literal>\.</>). An end-of-data marker is
+ not necessary when reading from a Unix file, since the end of file
+ serves perfectly well; but an end marker must be provided when copying
+ data to or from a client application.
</para>
<para>
- If <command>COPY TO</command> is sending its output to standard
- output instead of a file, after the last row it will send a backslash ("\")
- and a period (".") followed by a newline.
- Similarly, if <command>COPY FROM</command> is reading
- from standard input, it will expect a backslash ("\") and a period
- (".") followed by a newline, as the first three characters on a
- line to denote end-of-file. However, <command>COPY FROM</command>
- will terminate correctly (followed by the backend itself) if the
- input connection is closed before this special end-of-file pattern is
- found.
+ Backslash characters (<literal>\</>) may be used in the
+ <command>COPY</command> data to quote data characters that might otherwise
+ be taken as row or column delimiters. In particular, the following
+ characters <emphasis>must</> be preceded by a backslash if they appear
+ as part of an attribute value: backslash itself, newline, and the current
+ delimiter character.
</para>
<para>
- The backslash character has other special meanings. A literal backslash
- character is represented as two
- consecutive backslashes ("\\"). A literal tab character is represented
- as a backslash and a tab. (If you are using something other than tab
- as the column delimiter, backslash that delimiter character to include
- it in data.) A literal newline character is
- represented as a backslash and a newline. When loading text data
- not generated by <application>PostgreSQL</application>,
- you will need to convert backslash
- characters ("\") to double-backslashes ("\\") to ensure that they
- are loaded properly.
+ The following special backslash sequences are recognized by
+ <command>COPY FROM</command>:
+
+ <informaltable>
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Sequence</entry>
+ <entry>Represents</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><literal>\b</></entry>
+ <entry>Backspace (ASCII 8)</entry>
+ </row>
+ <row>
+ <entry><literal>\f</></entry>
+ <entry>Form feed (ASCII 12)</entry>
+ </row>
+ <row>
+ <entry><literal>\n</></entry>
+ <entry>Newline (ASCII 10)</entry>
+ </row>
+ <row>
+ <entry><literal>\r</></entry>
+ <entry>Carriage return (ASCII 13)</entry>
+ </row>
+ <row>
+ <entry><literal>\t</></entry>
+ <entry>Tab (ASCII 9)</entry>
+ </row>
+ <row>
+ <entry><literal>\v</></entry>
+ <entry>Vertical tab (ASCII 11)</entry>
+ </row>
+ <row>
+ <entry><literal>\</><replaceable>digits</></entry>
+ <entry>Backslash followed by one to three octal digits specifies
+ the character with that numeric code</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+
+ Presently, <command>COPY TO</command> will never emit an octal-digits
+ backslash sequence, but it does use the other sequences listed above
+ for those control characters.
+ </para>
+ <para>
+ Never put a backslash before a data character <literal>N</> or period
+ (<literal>.</>). Such pairs will be mistaken for the default null string
+ or the end-of-data marker, respectively. Any other backslashed character
+ that is not mentioned in the above table will be taken to represent itself.
+ </para>
+ <para>
+ It is strongly recommended that applications generating COPY data convert
+ data newlines and carriage returns to the <literal>\n</> and
+ <literal>\r</> sequences respectively. At present
+ (<productname>PostgreSQL</productname> 7.2 and older versions) it is
+ possible to represent a data carriage return without any special quoting,
+ and to represent a data newline by a backslash and newline. However,
+ these representations will not be accepted by default in future releases.
+ </para>
+ <para>
+ Note that the end of each row is marked by a Unix-style newline
+ ("\n"). Presently, <command>COPY FROM</command> will not behave as
+ desired if given a file containing DOS- or Mac-style newlines.
+ This is expected to change in future releases.
</para>
</refsect2>