ECPG - Embedded SQL in Cembedded SQLin C
This chapter describes the embedded SQL package
for PostgreSQL. It works with
C and C++. It was written by
Linus Tolke (linus@epact.se) and Michael Meskes
(meskes@postgresql.org).
Admittedly, this documentation is quite incomplete. But since this
interface is standardized, additional information can be found in
many resources about SQL.
The Concept
An embedded SQL program consists of code written in an ordinary
programming language, in this case C, mixed with SQL commands in
specially marked sections. To build the program, the source code
is first passed to the embedded SQL preprocessor, which converts it
to an ordinary C program, and afterwards it can be processed by a C
compilation tool chain.
Embedded SQL has advantages over other methods
for handling SQL commands from C code. First, it
takes care of the tedious passing of information to and from
variables in your C program. Secondly, embedded
SQL in C is defined in the SQL standard and supported by many other
SQL databases. The PostgreSQL implementation is designed to match
this standard as much as possible, and it is usually possible to
port embedded SQL programs written for other
RDBMS to PostgreSQL
with relative ease.
As indicated, programs written for the embedded SQL interface are
normal C programs with special code inserted to perform
database-related actions. This special code always has the form
EXEC SQL ...;
These statements syntactically take the place of a C statement.
Depending on the particular statement, they may appear in the
global context or within a function. Embedded SQL statements
follow the case-sensitivity rules of normal SQL code, and not those
of C.
The following sections explain all the embedded SQL statements.
Connecting to the Database Server
One connects to a database using the following statement:
EXEC SQL CONNECT TO targetAS connection-nameUSER user-name;
The target can be specified in the
following ways:
dbname>@hostname>:port>tcp:postgresql://hostname>:port>/dbname>?options>unix:postgresql://hostname>:port>/dbname>?options>character variablecharacter stringDEFAULT
There are also different ways to specify the user name:
useriduserid/passworduserid IDENTIFIED BY passworduserid USING password
The userid and
password may be a constant text, a
character variable, or a character string.
The connection-name is used to handle
multiple connections in one program. It can be omitted if a
program uses only one connection.
Closing a Connection
To close a connection, use the following statement:
EXEC SQL DISCONNECT [connection];
The connection can be specified
in the following ways:
connection-nameDEFAULTCURRENTALLRunning SQL Commands
Any SQL command can be run from within an embedded SQL application.
Below are some examples of how to do that.
Creating a table:
EXEC SQL CREATE TABLE foo (number integer, ascii char(16));
EXEC SQL CREATE UNIQUE INDEX num1 ON foo(number);
EXEC SQL COMMIT;
Inserting rows:
EXEC SQL INSERT INTO foo (number, ascii) VALUES (9999, 'doodad');
EXEC SQL COMMIT;
Deleting rows:
EXEC SQL DELETE FROM foo WHERE number = 9999;
EXEC SQL COMMIT;
Singleton Select:
EXEC SQL SELECT foo INTO :FooBar FROM table1 WHERE ascii = 'doodad';
Select using Cursors:
EXEC SQL DECLARE foo_bar CURSOR FOR
SELECT number, ascii FROM foo
ORDER BY ascii;
EXEC SQL FETCH foo_bar INTO :FooBar, DooDad;
...
EXEC SQL CLOSE foo_bar;
EXEC SQL COMMIT;
Updates:
EXEC SQL UPDATE foo
SET ascii = 'foobar'
WHERE number = 9999;
EXEC SQL COMMIT;
The tokens of the form
:something are
host variables, that is, they refer to
variables in the C program. They are explained in the next
section.
In the default mode, statements are committed only when
EXEC SQL COMMIT is issued. The embedded SQL
interface also supports autocommit of transactions (as known from
other interfaces) via the command-line option
to ecpg (see below) or via the EXEC SQL
SET AUTOCOMMIT TO ON statement. In autocommit mode, each
query is automatically committed unless it is inside an explicit
transaction block. This mode can be explicitly turned off using
EXEC SQL SET AUTOCOMMIT TO OFF.
Passing Data
To pass data from the program to the database, for example as
parameters in a query, or to pass data from the database back to
the program, the C variables that are intended to contain this data
need to be declared in a specially marked section, so the embedded
SQL preprocessor is made aware of them.
This section starts with
EXEC SQL BEGIN DECLARE SECTION;
and ends with
EXEC SQL END DECLARE SECTION;
Between those lines, there must be normal C variable declarations, such as
int x;
char foo[16], bar[16];
The declarations are also echoed to the output file as a normal C
variables, so there's no need to declare them again. Variables
that are not intended to be used with SQL commands can be declared
normally outside these special sections.
The definition of a structure or union also must be listed inside a
DECLARE> section. Otherwise the preprocessor cannot
handle these types since it does not know the definition.
The special types VARCHAR and VARCHAR2
are converted into a named struct> for every variable. A
declaration like:
VARCHAR var[180];
is converted into:
struct varchar_var { int len; char arr[180]; } var;
This structure is suitable for interfacing with SQL datums of type
VARCHAR.
To use a properly declared C variable in an SQL statement, write
:varname> where an expression is
expected. See the previous section for some examples.
Error Handling
The embedded SQL interface provides a simplistic and a complex way
to handle exceptional conditions in a program. The first method
causes a message to printed automatically when a certain condition
occurs. For example:
EXEC SQL WHENEVER sqlerror sqlprint;
or
EXEC SQL WHENEVER not found sqlprint;
This error handling remains enabled throughout the entire program.
This is not an exhaustive example of usage
for the EXEC SQL WHENEVER statement. Further
examples of usage may be found in SQL manuals (e.g.,
The LAN TIMES Guide to SQL> by Groff and Weinberg).
For a more powerful error handling, the embedded SQL interface
provides a struct> and a variable with the name
sqlca as follows:
struct sqlca
{
char sqlcaid[8];
long sqlabc;
long sqlcode;
struct
{
int sqlerrml;
char sqlerrmc[70];
} sqlerrm;
char sqlerrp[8];
long sqlerrd[6];
/* 0: empty */
/* 1: OID of processed tuple if applicable */
/* 2: number of rows processed in an INSERT, UPDATE */
/* or DELETE statement */
/* 3: empty */
/* 4: empty */
/* 5: empty */
char sqlwarn[8];
/* 0: set to 'W' if at least one other is 'W' */
/* 1: if 'W' at least one character string */
/* value was truncated when it was */
/* stored into a host variable. */
/* 2: empty */
/* 3: empty */
/* 4: empty */
/* 5: empty */
/* 6: empty */
/* 7: empty */
char sqlext[8];
} sqlca;
(Many of the empty fields may be used in a future release.)
If no error occurred in the last SQL statement,
sqlca.sqlcode will be 0
(ECPG_NO_ERROR>). If sqlca.sqlcode is
less that zero, this is a serious error, like the database
definition does not match the query. If it is greater than zero, it
is a normal error like the table did not contain the requested row.
sqlca.sqlerrm.sqlerrmc will contain a string
that describes the error. The string ends with the line number in
the source file.
These are the errors that can occur:
-12, Out of memory in line %d.
Should not normally occur. This indicates your virtual memory
is exhausted.
-200 (ECPG_UNSUPPORTED): Unsupported type %s on line %d.
Should not normally occur. This indicates the preprocessor has
generated something that the library does not know about.
Perhaps you are running incompatible versions of the
preprocessor and the library.
-201 (ECPG_TOO_MANY_ARGUMENTS): Too many arguments line %d.
This means that the server has returned more arguments than we
have matching variables. Perhaps you have forgotten a couple
of the host variables in the INTO
:var1,:var2 list.
-202 (ECPG_TOO_FEW_ARGUMENTS): Too few arguments line %d.
This means that the server has returned fewer arguments than we
have host variables. Perhaps you have too many host variables
in the INTO :var1,:var2 list. -203 (ECPG_TOO_MANY_MATCHES): Too many matches line %d.
This means the query has returned several rows but the
variables specified are not arrays. The
SELECT command was not unique.
-204 (ECPG_INT_FORMAT): Not correctly formatted int type: %s line %d.
This means the host variable is of type int and
the field in the PostgreSQL database
is of another type and contains a value that cannot be
interpreted as an int. The library uses
strtol() for this conversion.
-205 (ECPG_UINT_FORMAT): Not correctly formatted unsigned type: %s line %d.
This means the host variable is of type unsigned
int and the field in the
PostgreSQL database is of another
type and contains a value that cannot be interpreted as an
unsigned int. The library uses
strtoul() for this conversion.
-206 (ECPG_FLOAT_FORMAT): Not correctly formatted floating-point type: %s line %d.
This means the host variable is of type float and
the field in the PostgreSQL database
is of another type and contains a value that cannot be
interpreted as a float. The library uses
strtod() for this conversion.
-207 (ECPG_CONVERT_BOOL): Unable to convert %s to bool on line %d.
This means the host variable is of type bool and
the field in the PostgreSQL database
is neither 't'> nor 'f'>.
-208 (ECPG_EMPTY): Empty query line %d.
The query was empty. (This cannot normally happen in an
embedded SQL program, so it may point to an internal error.)
-209 (ECPG_MISSING_INDICATOR): NULL value without indicator in line %d.
A null value was returned and no null indicator variable was
supplied.
-210 (ECPG_NO_ARRAY): Variable is not an array in line %d.
An ordinary variable was used in a place that requires an
array.
-211 (ECPG_DATA_NOT_ARRAY): Data read from backend is not an array in line %d.
The database returned an ordinary variable in a place that
requires array value.
-220 (ECPG_NO_CONN): No such connection %s in line %d.
The program tried to access a connection that does not exist.
-221 (ECPG_NOT_CONN): Not connected in line %d.
The program tried to access a connection that does exist but is
not open.
-230 (ECPG_INVALID_STMT): Invalid statement name %s in line %d.
The statement you are trying to use has not been prepared.
-240 (ECPG_UNKNOWN_DESCRIPTOR): Descriptor %s not found in line %d.
The descriptor specified was not found. The statement you are
trying to use has not been prepared.
-241 (ECPG_INVALID_DESCRIPTOR_INDEX): Descriptor index out of range in line %d.
The descriptor index specified was out of range.
-242 (ECPG_UNKNOWN_DESCRIPTOR_ITEM): Descriptor %s not found in line %d.
The descriptor specified was not found. The statement you are trying to use has not been prepared.
-243 (ECPG_VAR_NOT_NUMERIC): Variable is not a numeric type in line %d.
The database returned a numeric value and the variable was not
numeric.
-244 (ECPG_VAR_NOT_CHAR): Variable is not a character type in line %d.
The database returned a non-numeric value and the variable was
numeric.
-400 (ECPG_PGSQL): Postgres error: %s line %d.
Some PostgreSQL error. The message
contains the error message from the
PostgreSQL backend.
-401 (ECPG_TRANS): Error in transaction processing line %d.PostgreSQL signaled that we cannot
start, commit, or rollback the transaction.
-402 (ECPG_CONNECT): Could not connect to database %s in line %d.
The connect to the database did not work.
100 (ECPG_NOT_FOUND): Data not found line %d.
This is a normal error that tells you that what
you are querying cannot be found or you are at the end of the
cursor.
Including Files
To include an external file into your embedded SQL program, use:
EXEC SQL INCLUDE filename;
The embedded SQL preprocessor will look for a file named
filename.h,
preprocess it, and include it in the resulting C output. Thus,
embedded SQL statements in the included file are handled correctly.
Note that this is not the same as
#include <filename.h>
because the file would not be subject to SQL command preprocessing.
Naturally, you can continue to use the C
#include directive to include other header
files.
The include file name is case-sensitive, even though the rest of
the EXEC SQL INCLUDE command follows the normal
SQL case-sensitivity rules.
Processing Embedded SQL Programs
Now that you have an idea how to form embedded SQL C programs, you
probably want to know how to compile them. Before compiling you
run the file through the embedded SQL
C preprocessor, which converts the
SQL statements you used to special function
calls. After compiling, you must link with a special library that
contains the needed functions. These functions fetch information
from the arguments, perform the SQL query using
the libpq interface, and put the result
in the arguments specified for output.
The preprocessor program is called ecpg and is
included in a normal PostgreSQL installation. Embedded SQL
programs are typically named with an extension
.pgc. If you have a program file called
prog1.pgc, you can preprocess it by simply
calling
ecpg prog1.pgc
This will create a file called prog1.c. If
your input files do not follow the suggested naming pattern, you
can specify the output file explicitly using the
option.
The preprocessed file can be compiled normally, for example
cc -c prog1.c
The generated C source files include headers files from the
PostgreSQL installation, so if you installed PostgreSQL in a
location that is not searched by default, you have to add an option
such as -I/usr/local/pgsql/include to the
compilation command line.
To link an embedded SQL program, you need to include the
libecpg library, like so:
cc -o myprog prog1.o prog2.o ... -lecpg
Again, you might have to add an option like
-L/usr/local/pgsql/lib to that command line.
If you manage the build process of a larger project using
make, it may be convenient to include
the following implicit rule to your makefiles:
ECPG = ecpg
%.c: %.pgc
$(ECPG) $<
The complete syntax of the ecpg command is
detailed in the &cite-reference;.
Library Functions
The libecpg library primarily contains
hidden functions that are used to implement the
functionality expressed by the embedded SQL commands. But there
are some functions that can usefully be called directly. Note that
this makes your code unportable.
ECPGdebug(int on, FILE
*stream) turns on debug
logging if called with the first argument non-zero. Debug logging
is done on stream. Most
SQL statement log their arguments and results.
The most important function, ECPGdo, logs
all SQL statements with both the expanded
string, i.e. the string with all the input variables inserted,
and the result from the PostgreSQL
server. This can be very useful when searching for errors in your
SQL statements.
ECPGstatus() This method returns true if we
are connected to a database and false if not.
Porting From Other RDBMS Packages
The design of ecpg follows the SQL
standard. Porting from a standard RDBMS should not be a problem.
Unfortunately there is no such thing as a standard RDBMS. Therefore
ecpg tries to understand syntax
extensions as long as they do not create conflicts with the
standard.
The following list shows all the known incompatibilities. If you
find one not listed please notify the developers. Note, however,
that we list only incompatibilities from a preprocessor of another
RDBMS to ecpg and not
ecpg features that these RDBMS do not
support.
Syntax of FETCHFETCH>embedded SQL>
The standard syntax for FETCH is:
FETCH direction>> amount>> IN|FROM cursorOracle>>
Oracle, however, does not use the
keywords IN or FROM. This
feature cannot be added since it would create parsing conflicts.
For the Developer
This section explain how ecpg works
internally. This information can occasionally be useful to help
users understand how to use ecpg.
The Preprocessor
The first four lines written by ecpg to the
output are fixed lines. Two are comments and two are include
lines necessary to interface to the library. Then the
preprocessor reads through the file and writes output. Normally
it just echoes everything to the output.
When it sees an EXEC SQL statement, it
intervenes and changes it. The command starts with exec
sql and ends with ;. Everything in
between is treated as an SQL statement and
parsed for variable substitution.
Variable substitution occurs when a symbol starts with a colon
(:). The variable with that name is looked up
among the variables that were previously declared within a
EXEC SQL DECLARE> section. Depending on whether the
variable is being use for input or output, a pointer to the
variable is output to allow access by the function.
For every variable that is part of the SQL
query, the function gets other arguments:
The type as a special symbol.
A pointer to the value or a pointer to the pointer.
The size of the variable if it is a char or varchar.
The number of elements in the array (for array fetches).
The offset to the next element in the array (for array fetches).
The type of the indicator variable as a special symbol.
A pointer to the value of the indicator variable or a pointer
to the pointer of the indicator variable.
0
Number of elements in the indicator array (for array fetches).
The offset to the next element in the indicator array (for
array fetches).
Note that not all SQL commands are treated in this way. For
instance, an open cursor statement like
EXEC SQL OPEN cursor;
is not copied to the output. Instead, the cursor's
DECLARE> command is used because it opens the cursor as
well.
Here is a complete example describing the output of the
preprocessor of a file foo.pgc (details may
change with each particular version of the preprocessor):
EXEC SQL BEGIN DECLARE SECTION;
int index;
int result;
EXEC SQL END DECLARE SECTION;
...
EXEC SQL SELECT res INTO :result FROM mytable WHERE index = :index;
is translated into:
/* Processed by ecpg (2.6.0) */
/* These two include files are added by the preprocessor */
#include <ecpgtype.h>;
#include <ecpglib.h>;
/* exec sql begin declare section */
#line 1 "foo.pgc"
int index;
int result;
/* exec sql end declare section */
...
ECPGdo(__LINE__, NULL, "SELECT res FROM mytable WHERE index = ? ",
ECPGt_int,&(index),1L,1L,sizeof(int),
ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L, ECPGt_EOIT,
ECPGt_int,&(result),1L,1L,sizeof(int),
ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L, ECPGt_EORT);
#line 147 "foo.pgc"
(The indentation in this manual is added for readability and not
something the preprocessor does.)
The Library
The most important function in the library is
ECPGdo. It takes a variable number of
arguments. Hopefully there are no computers that limit the number
of variables that can be accepted by a
varargs() function. This can easily add up to
50 or so arguments.
The arguments are:
A line number
This is a line number of the original line; used in error
messages only.
A string
This is the SQL query that is to be issued.
It is modified by the input variables, i.e. the variables that
where not known at compile time but are to be entered in the
query. Where the variables should go the string contains
?.
Input variables
As described in the section about the preprocessor, every
input variable gets ten arguments.
ECPGt_EOIT>
An enum> telling that there are no more input
variables.
Output variables
As described in the section about the preprocessor, every
input variable gets ten arguments. These variables are filled
by the function.
ECPGt_EORT>
An enum> telling that there are no more variables.