aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
* Use symbolic names not octal constants for file permission flags.Tom Lane2010-12-10
| | | | | | | | Purely cosmetic patch to make our coding standards more consistent --- we were doing symbolic some places and octal other places. This patch fixes all C-coded uses of mkdir, chmod, and umask. There might be some other calls I missed. Inconsistency noted while researching tablespace directory permissions issue.
* Fix efficiency problems in tuplestore_trim().Tom Lane2010-12-10
| | | | | | | | | | | | | | | | | | | | | | The original coding in tuplestore_trim() was only meant to work efficiently in cases where each trim call deleted most of the tuples in the store. Which, in fact, was the pattern of the original usage with a Material node supporting mark/restore operations underneath a MergeJoin. However, WindowAgg now uses tuplestores and it has considerably less friendly trimming behavior. In particular it can attempt to trim one tuple at a time off a large tuplestore. tuplestore_trim() had O(N^2) runtime in this situation because of repeatedly shifting its tuple pointer array. Fix by avoiding shifting the array until a reasonably large number of tuples have been deleted. This can waste some pointer space, but we do still reclaim the tuples themselves, so the percentage wastage should be pretty small. Per Jie Li's report of slow percent_rank() evaluation. cume_dist() and ntile() would certainly be affected as well, along with any other window function that has a moving frame start and requires reading substantially ahead of the current row. Back-patch to 8.4, where window functions were introduced. There's no need to tweak it before that.
* Eliminate O(N^2) behavior in parallel restore with many blobs.Tom Lane2010-12-09
| | | | | | | | | | | | | | | | | | | | | | | | | With hundreds of thousands of TOC entries, the repeated searches in reduce_dependencies() become the dominant cost. Get rid of that searching by constructing reverse-dependency lists, which we can do in O(N) time during the fix_dependencies() preprocessing. I chose to store the reverse dependencies as DumpId arrays for consistency with the forward-dependency representation, and keep the previously-transient tocsByDumpId[] array around to locate actual TOC entry structs quickly from dump IDs. While this fixes the slow case reported by Vlad Arkhipov, there is still a potential for O(N^2) behavior with sufficiently many tables: fix_dependencies itself, as well as mark_create_done and inhibit_data_for_failed_table, are doing repeated searches to deal with table-to-table-data dependencies. Possibly this work could be extended to deal with that, although the latter two functions are also used in non-parallel restore where we currently don't run fix_dependencies. Another TODO is that we fail to parallelize restore of multiple blobs at all. This appears to require changes in the archive format to fix. Back-patch to 9.0 where the problem was reported. 8.4 has potential issues as well; but since it doesn't create a separate TOC entry for each blob, it's at much less risk of having enough TOC entries to cause real problems.
* Self review of previous patch. Fix assumption that xmax >= xmin.Simon Riggs2010-12-09
|
* Reduce spurious Hot Standby conflicts from never-visible records.Simon Riggs2010-12-09
| | | | | | | | | | | | Hot Standby conflicts only with tuples that were visible at some point. So ignore tuples from aborted transactions or for tuples updated/deleted during the inserting transaction when generating the conflict transaction ids. Following detailed analysis and test case by Noah Misch. Original report covered btree delete records, correctly observed by Heikki Linnakangas that this applies to other cases also. Fix covers all sources of cleanup records via common code.
* Force default wal_sync_method to be fdatasync on Linux.Tom Lane2010-12-08
| | | | | | | | | | | | | | | | | | | | | | | Recent versions of the Linux system header files cause xlogdefs.h to believe that open_datasync should be the default sync method, whereas formerly fdatasync was the default on Linux. open_datasync is a bad choice, first because it doesn't actually outperform fdatasync (in fact the reverse), and second because we try to use O_DIRECT with it, causing failures on certain filesystems (e.g., ext4 with data=journal option). This part of the patch is largely per a proposal from Marti Raudsepp. More extensive changes are likely to follow in HEAD, but this is as much change as we want to back-patch. Also clean up confusing code and incorrect documentation surrounding the fsync_writethrough option. Those changes shouldn't result in any actual behavioral change, but I chose to back-patch them anyway to keep the branches looking similar in this area. In 9.0 and HEAD, also do some copy-editing on the WAL Reliability documentation section. Back-patch to all supported branches, since any of them might get used on modern Linux versions.
* Optimize commit_siblings in two ways to improve group commit.Simon Riggs2010-12-08
| | | | | | | | First, avoid scanning the whole ProcArray once we know there are at least commit_siblings active; second, skip the check altogether if commit_siblings = 0. Greg Smith
* Fix bugs in the hot standby known-assigned-xids tracking logic. If there'sHeikki Linnakangas2010-12-07
| | | | | | | | | | | | | | | | | | | | | | | an old transaction running in the master, and a lot of transactions have started and finished since, and a WAL-record is written in the gap between the creating the running-xacts snapshot and WAL-logging it, recovery will fail with "too many KnownAssignedXids" error. This bug was reported by Joachim Wieland on Nov 19th. In the same scenario, when fewer transactions have started so that all the xids fit in KnownAssignedXids despite the first bug, a more serious bug arises. We incorrectly initialize the clog code with the oldest still running transaction, and when we see the WAL record belonging to a transaction with an XID larger than one that committed already before the checkpoint we're recovering from, we zero the clog page containing the already committed transaction, leading to data loss. In hindsight, trying to track xids in the known-assigned-xids array before seeing the running-xacts record was too complicated. To fix that, hold XidGenLock while the running-xacts snapshot is taken and WAL-logged. That ensures that no transaction can begin or end in that gap, so that in recvoery we know that the snapshot contains all transactions running at that point in WAL.
* Add a stack overflow check to copyObject().Tom Lane2010-12-06
| | | | | | | | | | | | | | | There are some code paths, such as SPI_execute(), where we invoke copyObject() on raw parse trees before doing parse analysis on them. Since the bison grammar is capable of building heavily nested parsetrees while itself using only minimal stack depth, this means that copyObject() can be the front-line function that hits stack overflow before anything else does. Accordingly, it had better have a check_stack_depth() call. I did a bit of performance testing and found that this slows down copyObject() by only a few percent, so the hit ought to be negligible in the context of complete processing of a query. Per off-list report from Toshihide Katayama. Back-patch to all supported branches.
* Allow the low level COPY routines to read arbitrary numbers of fields.Andrew Dunstan2010-12-06
| | | | | | | This doesn't involve any user-visible change in behavior, but will be useful when the COPY routines are exposed to allow their use by Foreign Data Wrapper routines, which will be able to use these routines to read irregular CSV files, for example.
* Fix two typos, by Fujii Masao.Heikki Linnakangas2010-12-06
|
* Put only single space after "Sort Method:", for consistencyPeter Eisentraut2010-12-06
|
* Reduce memory consumption inside inheritance_planner().Tom Lane2010-12-05
| | | | | | | | | | | | | | | Avoid eating quite so much memory for large inheritance trees, by reclaiming the space used by temporary copies of the original parsetree and range table, as well as the workspace needed during planning. The cost is needing to copy the finished plan trees out of the child memory context. Although this looks like it ought to slow things down, my testing shows it actually is faster, apparently because fewer interactions with malloc() are needed and/or we can do the work within a more readily cacheable amount of memory. That result might be platform-dependent, but I'll take it. Per a gripe from John Papandriopoulos, in which it was pointed out that the memory consumption actually grew as O(N^2) for sufficiently many child tables, since we were creating N copies of the N-element range table.
* Fix two small bugs in new gistget.c logic.Tom Lane2010-12-04
| | | | | | | | | | | | | 1. Complain, rather than silently doing nothing, if an "invalid" tuple is found on a leaf page. Per off-list discussion with Heikki. 2. Fix oversight in code that removes a GISTSearchItem from the search queue: we have to reset lastHeap if this was the last heap item in the parent GISTSearchTreeItem. Otherwise subsequent additions will do the wrong thing. This was probably masked in early testing because in typical cases the parent item would now be completely empty and would be deleted on next call. You'd need a queued non-leaf page at exactly the same distance as a heap tuple to expose the bug.
* Make output width consistent for all ways of invoking a regression testPeter Eisentraut2010-12-04
| | | | | run_schedule() and run_single_test() were using different output widths, which would show up in bigcheck/bigtest, for example.
* Update comment to match later code changes.Tom Lane2010-12-04
|
* Add external documentation for KNNGIST.Tom Lane2010-12-03
|
* Put back gistgettuple's check for backwards scan request.Tom Lane2010-12-03
| | | | | On reflection it's a bad idea for the KNNGIST patch to have removed that. We don't want it silently returning incorrect answers.
* KNNGIST, otherwise known as order-by-operator support for GIST.Tom Lane2010-12-03
| | | | | | | | | | | | | | | This commit represents a rather heavily editorialized version of Teodor's builtin_knngist_itself-0.8.2 and builtin_knngist_proc-0.8.1 patches. I redid the opclass API to add a separate Distance method instead of turning the Consistent method into an illogical mess, fixed some bit-rot in the rbtree interfaces, and generally worked over the code style and comments. There's still no non-code documentation to speak of, but I'll work on that separately. Some contrib-module changes are also yet to come (right now, point <-> point is the only KNN-ified operator). Teodor Sigaev and Tom Lane
* Remove now-outdated mention of quotes being required in recovery.conf.Robert Haas2010-12-03
| | | | Noted by Itagaki Takahiro.
* Use GUC lexer for recovery.conf parsing.Robert Haas2010-12-03
| | | | | | | | This eliminates some crufty, special-purpose code and, as a non-trivial side benefit, allows recovery.conf parameters to be unquoted. Dimitri Fontaine, with review and cleanup by Alvaro Herrera, Itagaki Takahiro, and me.
* Remove misleading comments. Move _Clone and _DeClone functions beforeHeikki Linnakangas2010-12-03
| | | | the "END OF FORMAT CALLBACKS" comment, because they are format callbacks too.
* Remove unnecessary string null-termination in pg_convert.Itagaki Takahiro2010-12-03
| | | | We can directly verify the unterminated input with pg_verify_mbstr_len.
* Create core infrastructure for KNNGIST.Tom Lane2010-12-02
| | | | | | | | | | | | | | | | | | | This is a heavily revised version of builtin_knngist_core-0.9. The ordering operators are no longer mixed in with actual quals, which would have confused not only humans but significant parts of the planner. Instead, ordering operators are carried separately throughout planning and execution. Since the API for ambeginscan and amrescan functions had to be changed anyway, this commit takes the opportunity to rationalize that a bit. RelationGetIndexScan no longer forces a premature index_rescan call; instead, callers of index_beginscan must call index_rescan too. Aside from making the AM-side initialization logic a bit less peculiar, this has the advantage that we do not make a useless extra am_rescan call when there are runtime key values. AMs formerly could not assume that the key values passed to amrescan were actually valid; now they can. Teodor Sigaev and Tom Lane
* Move private struct declaration to compress_io.cAlvaro Herrera2010-12-02
| | | | Keep only the typedef in the header file.
* Remove trailing whitespaceAlvaro Herrera2010-12-02
|
* Remove useless struct declarationAlvaro Herrera2010-12-02
|
* Silence compilerAlvaro Herrera2010-12-02
|
* Refactor the pg_dump zlib code from pg_backup_custom.c to a separate file,Heikki Linnakangas2010-12-02
| | | | | | | | | | | | | to make it easier to reuse that code. There is no user-visible changes. This is in preparation for the patch to add a new archive format, a directory, to perform a custom-like dump but with each table being dumped to a separate file (that in turn is a prerequisite for parallel pg_dump). This also makes it easier to add new compression methods in the future, and makes the pg_backup_custom.c code easier to read, when the compression-related code is factored out. Joachim Wieland, with heavy editorialization by me.
* Prevent inlining a SQL function with multiple OUT parameters.Tom Lane2010-12-01
| | | | | | | | | | | | | There were corner cases in which the planner would attempt to inline such a function, which would result in a failure at runtime due to loss of information about exactly what the result record type is. Fix by disabling inlining when the function's recorded result type is RECORD. There might be some sub-cases where inlining could still be allowed, but this is a simple and backpatchable fix, so leave refinements for another day. Per bug #5777 from Nate Carson. Back-patch to all supported branches. 8.1 happens to avoid a core-dump here, but it still does the wrong thing.
* Simplify and speed up mapping of index opfamilies to pathkeys.Tom Lane2010-11-29
| | | | | | | | | | | | | | | | | | | | | | Formerly we looked up the operators associated with each index (caching them in relcache) and then the planner looked up the btree opfamily containing such operators in order to build the btree-centric pathkey representation that describes the index's sort order. This is quite pointless for btree indexes: we might as well just use the index's opfamily information directly. That saves syscache lookup cycles during planning, and furthermore allows us to eliminate the relcache's caching of operators altogether, which may help in reducing backend startup time. I added code to plancat.c to perform the same type of double lookup on-the-fly if it's ever faced with a non-btree amcanorder index AM. If such a thing actually becomes interesting for production, we should replace that logic with some more-direct method for identifying the corresponding btree opfamily; but it's not worth spending effort on now. There is considerably more to do pursuant to my recent proposal to get rid of sort-operator-based representations of sort orderings, but this patch grabs some of the low-hanging fruit. I'll look at the remainder of that work after the current commitfest.
* Move call to GetTopTransactionId() earlier in LockAcquire(),Simon Riggs2010-11-29
| | | | | | | | | | removing an infrequently occurring race condition in Hot Standby. An xid must be assigned before a lock appears in shared memory, rather than immediately after, else GetRunningTransactionLocks() may see InvalidTransactionId, causing assertion failures during lock processing on standby. Bug report and diagnosis by Fujii Masao, fix by me.
* In libpq/Makefile, use OBJS += as a way to break up long link lines intoBruce Momjian2010-11-27
| | | | something that can be documented.
* On further testing, PQping also needs an explicit check for AUTH_REQ.Tom Lane2010-11-27
| | | | | The pg_fe_sendauth code might fail if it can't handle the authentication request message type --- if so, ping should still say the server is up.
* Rewrite PQping to be more like what we agreed to last week.Tom Lane2010-11-27
| | | | | | | | | | | | | | | | | | | | | | | Basically, we want to distinguish all cases where the connection was not made from those where it was. A convenient proxy for this is to see if we got a message with a SQLSTATE code back from the postmaster. This presumes that the postmaster will always send us a SQLSTATE in a failure message, which is true for 7.4 and later postmasters in every case except fork failure. (We could possibly complicate the postmaster code to do something about that, but it seems not worth the trouble, especially since pg_ctl's response for that case should be to keep waiting anyway.) If we did get a SQLSTATE from the postmaster, there are basically only two cases, as per last week's discussion: ERRCODE_CANNOT_CONNECT_NOW and everything else. Any other error code implies that the postmaster is in principle willing to accept connections, it just didn't like or couldn't handle this particular request. We want to make a special case for ERRCODE_CANNOT_CONNECT_NOW so that "pg_ctl start -w" knows it should keep waiting. In passing, pick names for the enum constants that are a tad less likely to present collision hazards in future.
* Clean up IPv4 vs IPv6 bogosity in connectFailureMessage().Tom Lane2010-11-26
| | | | Newly added code was supposing that "struct sockaddr_in" applies to IPv6.
* Fix portability issues in new src/port/inet_net_ntop.c file.Tom Lane2010-11-26
| | | | | | | | | | 1. Don't #include postgres.h in a frontend build. 2. Don't assume that the backend's symbol PGSQL_AF_INET6 has anything to do with the constant that will be used by system library functions (because, in point of fact, it usually doesn't). Fortunately, PGSQL_AF_INET is equal to AF_INET, so we can just cater for both sets of values in one case construct without fear of conflict.
* Add more ALTER <object> .. SET SCHEMA commands.Robert Haas2010-11-26
| | | | | | | | This adds support for changing the schema of a conversion, operator, operator class, operator family, text search configuration, text search dictionary, text search parser, or text search template. Dimitri Fontaine, with assorted corrections and other kibitzing.
* Remove bogus use of PGDLLIMPORT.Tom Lane2010-11-26
| | | | | That macro should be attached to extern declarations, not actual definitions of variables.
* Add inet_net_ntop.c as needed by MSVC, per Magnus.Bruce Momjian2010-11-26
|
* Use conn->raddr consistently for non-connect libpq error reporting.Bruce Momjian2010-11-26
|
* Update comment that says we only report last libpq connection failure,Bruce Momjian2010-11-26
| | | | per Peter.
* Use only addr_cur when reporting connection failures in libpq.Bruce Momjian2010-11-26
|
* Abandon use of Makefile variables in libpq/Makefile because MSVC scrapesBruce Momjian2010-11-26
| | | | | | the OBJS lines from that file. Cleanup where possible.
* In libpq/Makefile, merge PERM_PGPORT and OPT_PGPORT into a singleBruce Momjian2010-11-26
| | | | Makefile variable PGPORT, for clarity.
* Improve pg_ctl "cannot connect" spacing, per Tom, and wording.Bruce Momjian2010-11-26
|
* Improve pg_ctl "cannot connect" warning, per suggestion from Magnus.Bruce Momjian2010-11-25
|
* For libpq/Makefile OPT_PGPORT, remove .o extension after we testBruce Momjian2010-11-25
| | | | configure's LIBOBJS. Should fix buildfarm failures.
* Add PQping and PQpingParams to libpq to allow detection of the server'sBruce Momjian2010-11-25
| | | | | | | | | status, including a status where the server is running but refuses a postgres connection. Have pg_ctl use this new function. This fixes the case where pg_ctl reports that the server is not running (cannot connect) but in fact it is running.
* Fix getaddrinfo() in pgport to use proper parameters, as detected byBruce Momjian2010-11-25
| | | | Win32 buildfarm members.