<application>ECPG</application> - Embedded <acronym>SQL</acronym> in C

<application>ECPG</application> - Embedded <acronym>SQL</acronym> in C embedded SQLin C This chapter describes the embedded SQL package for PostgreSQL. It works with C and C++. It was written by Linus Tolke (linus@epact.se) and Michael Meskes (meskes@postgresql.org). Admittedly, this documentation is quite incomplete. But since this interface is standardized, additional information can be found in many resources about SQL. The Concept An embedded SQL program consists of code written in an ordinary programming language, in this case C, mixed with SQL commands in specially marked sections. To build the program, the source code is first passed to the embedded SQL preprocessor, which converts it to an ordinary C program, and afterwards it can be processed by a C compilation tool chain. Embedded SQL has advantages over other methods for handling SQL commands from C code. First, it takes care of the tedious passing of information to and from variables in your C program. Secondly, embedded SQL in C is defined in the SQL standard and supported by many other SQL databases. The PostgreSQL implementation is designed to match this standard as much as possible, and it is usually possible to port embedded SQL programs written for other RDBMS to PostgreSQL with relative ease. As indicated, programs written for the embedded SQL interface are normal C programs with special code inserted to perform database-related actions. This special code always has the form EXEC SQL ...; These statements syntactically take the place of a C statement. Depending on the particular statement, they may appear in the global context or within a function. Embedded SQL statements follow the case-sensitivity rules of normal SQL code, and not those of C. The following sections explain all the embedded SQL statements. Connecting to the Database Server One connects to a database using the following statement: EXEC SQL CONNECT TO target AS connection-name USER user-name; The target can be specified in the following ways: dbname@hostname:port tcp:postgresql://hostname:port/dbname?options unix:postgresql://hostname:port/dbname?options character variable character string DEFAULT There are also different ways to specify the user name: userid userid/password userid IDENTIFIED BY password userid USING password The userid and password may be a constant text, a character variable, or a character string. The connection-name is used to handle multiple connections in one program. It can be omitted if a program uses only one connection. Closing a Connection To close a connection, use the following statement: EXEC SQL DISCONNECT [connection]; The connection can be specified in the following ways: connection-name DEFAULT CURRENT ALL Running SQL Commands Any SQL command can be run from within an embedded SQL application. Below are some examples of how to do that. Creating a table: EXEC SQL CREATE TABLE foo (number integer, ascii char(16)); EXEC SQL CREATE UNIQUE INDEX num1 ON foo(number); EXEC SQL COMMIT; Inserting rows: EXEC SQL INSERT INTO foo (number, ascii) VALUES (9999, 'doodad'); EXEC SQL COMMIT; Deleting rows: EXEC SQL DELETE FROM foo WHERE number = 9999; EXEC SQL COMMIT; Singleton Select: EXEC SQL SELECT foo INTO :FooBar FROM table1 WHERE ascii = 'doodad'; Select using Cursors: EXEC SQL DECLARE foo_bar CURSOR FOR SELECT number, ascii FROM foo ORDER BY ascii; EXEC SQL FETCH foo_bar INTO :FooBar, DooDad; ... EXEC SQL CLOSE foo_bar; EXEC SQL COMMIT; Updates: EXEC SQL UPDATE foo SET ascii = 'foobar' WHERE number = 9999; EXEC SQL COMMIT; The tokens of the form :something are host variables, that is, they refer to variables in the C program. They are explained in the next section. In the default mode, statements are committed only when EXEC SQL COMMIT is issued. The embedded SQL interface also supports autocommit of transactions (as known from other interfaces) via the command-line option to ecpg (see below) or via the EXEC SQL SET AUTOCOMMIT TO ON statement. In autocommit mode, each query is automatically committed unless it is inside an explicit transaction block. This mode can be explicitly turned off using EXEC SQL SET AUTOCOMMIT TO OFF. Passing Data To pass data from the program to the database, for example as parameters in a query, or to pass data from the database back to the program, the C variables that are intended to contain this data need to be declared in a specially marked section, so the embedded SQL preprocessor is made aware of them. This section starts with EXEC SQL BEGIN DECLARE SECTION; and ends with EXEC SQL END DECLARE SECTION; Between those lines, there must be normal C variable declarations, such as int x; char foo[16], bar[16]; The declarations are also echoed to the output file as a normal C variables, so there's no need to declare them again. Variables that are not intended to be used with SQL commands can be declared normally outside these special sections. The definition of a structure or union also must be listed inside a DECLARE section. Otherwise the preprocessor cannot handle these types since it does not know the definition. The special types VARCHAR and VARCHAR2 are converted into a named struct for every variable. A declaration like: VARCHAR var[180]; is converted into: struct varchar_var { int len; char arr[180]; } var; This structure is suitable for interfacing with SQL datums of type VARCHAR. To use a properly declared C variable in an SQL statement, write :varname where an expression is expected. See the previous section for some examples. Error Handling The embedded SQL interface provides a simplistic and a complex way to handle exceptional conditions in a program. The first method causes a message to printed automatically when a certain condition occurs. For example: EXEC SQL WHENEVER sqlerror sqlprint; or EXEC SQL WHENEVER not found sqlprint; This error handling remains enabled throughout the entire program. This is not an exhaustive example of usage for the EXEC SQL WHENEVER statement. Further examples of usage may be found in SQL manuals (e.g., The LAN TIMES Guide to SQL by Groff and Weinberg). For a more powerful error handling, the embedded SQL interface provides a struct and a variable with the name sqlca as follows: struct sqlca { char sqlcaid[8]; long sqlabc; long sqlcode; struct { int sqlerrml; char sqlerrmc[70]; } sqlerrm; char sqlerrp[8]; long sqlerrd[6]; /* 0: empty */ /* 1: OID of processed tuple if applicable */ /* 2: number of rows processed in an INSERT, UPDATE */ /* or DELETE statement */ /* 3: empty */ /* 4: empty */ /* 5: empty */ char sqlwarn[8]; /* 0: set to 'W' if at least one other is 'W' */ /* 1: if 'W' at least one character string */ /* value was truncated when it was */ /* stored into a host variable. */ /* 2: empty */ /* 3: empty */ /* 4: empty */ /* 5: empty */ /* 6: empty */ /* 7: empty */ char sqlext[8]; } sqlca; (Many of the empty fields may be used in a future release.) If no error occurred in the last SQL statement, sqlca.sqlcode will be 0 (ECPG_NO_ERROR). If sqlca.sqlcode is less that zero, this is a serious error, like the database definition does not match the query. If it is greater than zero, it is a normal error like the table did not contain the requested row. sqlca.sqlerrm.sqlerrmc will contain a string that describes the error. The string ends with the line number in the source file. These are the errors that can occur: -12, Out of memory in line %d. Should not normally occur. This indicates your virtual memory is exhausted. -200 (ECPG_UNSUPPORTED): Unsupported type %s on line %d. Should not normally occur. This indicates the preprocessor has generated something that the library does not know about. Perhaps you are running incompatible versions of the preprocessor and the library. -201 (ECPG_TOO_MANY_ARGUMENTS): Too many arguments line %d. This means that the server has returned more arguments than we have matching variables. Perhaps you have forgotten a couple of the host variables in the INTO :var1,:var2 list. -202 (ECPG_TOO_FEW_ARGUMENTS): Too few arguments line %d. This means that the server has returned fewer arguments than we have host variables. Perhaps you have too many host variables in the INTO :var1,:var2 list. -203 (ECPG_TOO_MANY_MATCHES): Too many matches line %d. This means the query has returned several rows but the variables specified are not arrays. The SELECT command was not unique. -204 (ECPG_INT_FORMAT): Not correctly formatted int type: %s line %d. This means the host variable is of type int and the field in the PostgreSQL database is of another type and contains a value that cannot be interpreted as an int. The library uses strtol() for this conversion. -205 (ECPG_UINT_FORMAT): Not correctly formatted unsigned type: %s line %d. This means the host variable is of type unsigned int and the field in the PostgreSQL database is of another type and contains a value that cannot be interpreted as an unsigned int. The library uses strtoul() for this conversion. -206 (ECPG_FLOAT_FORMAT): Not correctly formatted floating-point type: %s line %d. This means the host variable is of type float and the field in the PostgreSQL database is of another type and contains a value that cannot be interpreted as a float. The library uses strtod() for this conversion. -207 (ECPG_CONVERT_BOOL): Unable to convert %s to bool on line %d. This means the host variable is of type bool and the field in the PostgreSQL database is neither 't' nor 'f'. -208 (ECPG_EMPTY): Empty query line %d. The query was empty. (This cannot normally happen in an embedded SQL program, so it may point to an internal error.) -209 (ECPG_MISSING_INDICATOR): NULL value without indicator in line %d. A null value was returned and no null indicator variable was supplied. -210 (ECPG_NO_ARRAY): Variable is not an array in line %d. An ordinary variable was used in a place that requires an array. -211 (ECPG_DATA_NOT_ARRAY): Data read from backend is not an array in line %d. The database returned an ordinary variable in a place that requires array value. -220 (ECPG_NO_CONN): No such connection %s in line %d. The program tried to access a connection that does not exist. -221 (ECPG_NOT_CONN): Not connected in line %d. The program tried to access a connection that does exist but is not open. -230 (ECPG_INVALID_STMT): Invalid statement name %s in line %d. The statement you are trying to use has not been prepared. -240 (ECPG_UNKNOWN_DESCRIPTOR): Descriptor %s not found in line %d. The descriptor specified was not found. The statement you are trying to use has not been prepared. -241 (ECPG_INVALID_DESCRIPTOR_INDEX): Descriptor index out of range in line %d. The descriptor index specified was out of range. -242 (ECPG_UNKNOWN_DESCRIPTOR_ITEM): Descriptor %s not found in line %d. The descriptor specified was not found. The statement you are trying to use has not been prepared. -243 (ECPG_VAR_NOT_NUMERIC): Variable is not a numeric type in line %d. The database returned a numeric value and the variable was not numeric. -244 (ECPG_VAR_NOT_CHAR): Variable is not a character type in line %d. The database returned a non-numeric value and the variable was numeric. -400 (ECPG_PGSQL): Postgres error: %s line %d. Some PostgreSQL error. The message contains the error message from the PostgreSQL backend. -401 (ECPG_TRANS): Error in transaction processing line %d. PostgreSQL signaled that we cannot start, commit, or rollback the transaction. -402 (ECPG_CONNECT): Could not connect to database %s in line %d. The connect to the database did not work. 100 (ECPG_NOT_FOUND): Data not found line %d. This is a normal error that tells you that what you are querying cannot be found or you are at the end of the cursor. Including Files To include an external file into your embedded SQL program, use: EXEC SQL INCLUDE filename; The embedded SQL preprocessor will look for a file named filename.h, preprocess it, and include it in the resulting C output. Thus, embedded SQL statements in the included file are handled correctly. Note that this is not the same as #include <filename.h> because the file would not be subject to SQL command preprocessing. Naturally, you can continue to use the C #include directive to include other header files. The include file name is case-sensitive, even though the rest of the EXEC SQL INCLUDE command follows the normal SQL case-sensitivity rules. Processing Embedded SQL Programs Now that you have an idea how to form embedded SQL C programs, you probably want to know how to compile them. Before compiling you run the file through the embedded SQL C preprocessor, which converts the SQL statements you used to special function calls. After compiling, you must link with a special library that contains the needed functions. These functions fetch information from the arguments, perform the SQL query using the libpq interface, and put the result in the arguments specified for output. The preprocessor program is called ecpg and is included in a normal PostgreSQL installation. Embedded SQL programs are typically named with an extension .pgc. If you have a program file called prog1.pgc, you can preprocess it by simply calling ecpg prog1.pgc This will create a file called prog1.c. If your input files do not follow the suggested naming pattern, you can specify the output file explicitly using the option. The preprocessed file can be compiled normally, for example cc -c prog1.c The generated C source files include headers files from the PostgreSQL installation, so if you installed PostgreSQL in a location that is not searched by default, you have to add an option such as -I/usr/local/pgsql/include to the compilation command line. To link an embedded SQL program, you need to include the libecpg library, like so: cc -o myprog prog1.o prog2.o ... -lecpg Again, you might have to add an option like -L/usr/local/pgsql/lib to that command line. If you manage the build process of a larger project using make, it may be convenient to include the following implicit rule to your makefiles: ECPG = ecpg %.c: %.pgc $(ECPG) $< The complete syntax of the ecpg command is detailed in the &cite-reference;. Library Functions The libecpg library primarily contains hidden functions that are used to implement the functionality expressed by the embedded SQL commands. But there are some functions that can usefully be called directly. Note that this makes your code unportable. ECPGdebug(int on, FILE *stream) turns on debug logging if called with the first argument non-zero. Debug logging is done on stream. Most SQL statement log their arguments and results. The most important function, ECPGdo, logs all SQL statements with both the expanded string, i.e. the string with all the input variables inserted, and the result from the PostgreSQL server. This can be very useful when searching for errors in your SQL statements. ECPGstatus() This method returns true if we are connected to a database and false if not. Porting From Other <acronym>RDBMS</acronym> Packages The design of ecpg follows the SQL standard. Porting from a standard RDBMS should not be a problem. Unfortunately there is no such thing as a standard RDBMS. Therefore ecpg tries to understand syntax extensions as long as they do not create conflicts with the standard. The following list shows all the known incompatibilities. If you find one not listed please notify the developers. Note, however, that we list only incompatibilities from a preprocessor of another RDBMS to ecpg and not ecpg features that these RDBMS do not support. Syntax of FETCH FETCHembedded SQL The standard syntax for FETCH is: FETCH direction amount IN|FROM cursor Oracle Oracle, however, does not use the keywords IN or FROM. This feature cannot be added since it would create parsing conflicts. For the Developer This section explain how ecpg works internally. This information can occasionally be useful to help users understand how to use ecpg. The Preprocessor The first four lines written by ecpg to the output are fixed lines. Two are comments and two are include lines necessary to interface to the library. Then the preprocessor reads through the file and writes output. Normally it just echoes everything to the output. When it sees an EXEC SQL statement, it intervenes and changes it. The command starts with exec sql and ends with ;. Everything in between is treated as an SQL statement and parsed for variable substitution. Variable substitution occurs when a symbol starts with a colon (:). The variable with that name is looked up among the variables that were previously declared within a EXEC SQL DECLARE section. Depending on whether the variable is being use for input or output, a pointer to the variable is output to allow access by the function. For every variable that is part of the SQL query, the function gets other arguments: The type as a special symbol. A pointer to the value or a pointer to the pointer. The size of the variable if it is a char or varchar. The number of elements in the array (for array fetches). The offset to the next element in the array (for array fetches). The type of the indicator variable as a special symbol. A pointer to the value of the indicator variable or a pointer to the pointer of the indicator variable. 0 Number of elements in the indicator array (for array fetches). The offset to the next element in the indicator array (for array fetches). Note that not all SQL commands are treated in this way. For instance, an open cursor statement like EXEC SQL OPEN cursor; is not copied to the output. Instead, the cursor's DECLARE command is used because it opens the cursor as well. Here is a complete example describing the output of the preprocessor of a file foo.pgc (details may change with each particular version of the preprocessor): EXEC SQL BEGIN DECLARE SECTION; int index; int result; EXEC SQL END DECLARE SECTION; ... EXEC SQL SELECT res INTO :result FROM mytable WHERE index = :index; is translated into: /* Processed by ecpg (2.6.0) */ /* These two include files are added by the preprocessor */ #include <ecpgtype.h>; #include <ecpglib.h>; /* exec sql begin declare section */ #line 1 "foo.pgc" int index; int result; /* exec sql end declare section */ ... ECPGdo(__LINE__, NULL, "SELECT res FROM mytable WHERE index = ? ", ECPGt_int,&(index),1L,1L,sizeof(int), ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L, ECPGt_EOIT, ECPGt_int,&(result),1L,1L,sizeof(int), ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L, ECPGt_EORT); #line 147 "foo.pgc" (The indentation in this manual is added for readability and not something the preprocessor does.) The Library The most important function in the library is ECPGdo. It takes a variable number of arguments. Hopefully there are no computers that limit the number of variables that can be accepted by a varargs() function. This can easily add up to 50 or so arguments. The arguments are: A line number This is a line number of the original line; used in error messages only. A string This is the SQL query that is to be issued. It is modified by the input variables, i.e. the variables that where not known at compile time but are to be entered in the query. Where the variables should go the string contains ?. Input variables As described in the section about the preprocessor, every input variable gets ten arguments. ECPGt_EOIT An enum telling that there are no more input variables. Output variables As described in the section about the preprocessor, every input variable gets ten arguments. These variables are filled by the function. ECPGt_EORT An enum telling that there are no more variables.