aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
...
* Limit the number of rel sets considered in consider_index_join_outer_rels.Tom Lane2012-11-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | In bug #7626, Brian Dunavant exposes a performance problem created by commit 3b8968f25232ad09001bf35ab4cc59f5a501193e: that commit attempted to consider *all* possible combinations of indexable join clauses, but if said clauses join to enough different relations, there's an exponential increase in the number of outer-relation sets considered. In Brian's example, all the clauses come from the same equivalence class, which means it's redundant to use more than one of them in an indexscan anyway. So we can prevent the problem in this class of cases (which is probably the majority of real examples) by rejecting combinations that would only serve to add a known-redundant clause. But that still leaves us exposed to exponential growth of planning time when the query has a lot of non-equivalence join clauses that are usable with the same index. I chose to prevent such cases by setting an upper limit on the number of relation sets considered, equal to ten times the number of index clauses considered so far. (This sliding limit still allows new relsets to be added on as we move to additional index columns, which is probably more important than considering even more combinations of clauses for the previous column.) This should keep the amount of work done roughly linear rather than exponential in the apparent query complexity. This part of the fix is pretty ad-hoc; but without a clearer idea of real-world cases for which this would result in markedly inferior plans, it's hard to see how to do better.
* Have make never delete intermediate files automaticallyPeter Eisentraut2012-10-31
| | | | | Several hacks in certain modes already thought this was a bad idea, so just disable it globally.
* Fix erroneous choice of timeline variable, tooAlvaro Herrera2012-10-31
|
* Fix erroneous choices of segNo variablesAlvaro Herrera2012-10-31
| | | | | | | | | | | | | | Commit dfda6eba (which changed segment numbers to use a single 64 bit variable instead of log/seg) introduced a couple of bogus choices of exactly which log segment number variable to use in each case. This is currently pretty harmless; in one place, the bogus number was only being used in an error message for a pretty unlikely condition (failure to fsync a WAL segment file). In the other, it was using a global variable instead of the local variable; but all callsites were passing the value of the global variable anyway. No need to backpatch because that commit is not on earlier branches.
* Fix ALTER EXTENSION / SET SCHEMAAlvaro Herrera2012-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | In its original conception, it was leaving some objects into the old schema, but without their proper pg_depend entries; this meant that the old schema could be dropped, causing future pg_dump calls to fail on the affected database. This was originally reported by Jeff Frost as #6704; there have been other complaints elsewhere that can probably be traced to this bug. To fix, be more consistent about altering a table's subsidiary objects along the table itself; this requires some restructuring in how tables are relocated when altering an extension -- hence the new AlterTableNamespaceInternal routine which encapsulates it for both the ALTER TABLE and the ALTER EXTENSION cases. There was another bug lurking here, which was unmasked after fixing the previous one: certain objects would be reached twice via the dependency graph, and the second attempt to move them would cause the entire operation to fail. Per discussion, it seems the best fix for this is to do more careful tracking of objects already moved: we now maintain a list of moved objects, to avoid attempting to do it twice for the same object. Authors: Alvaro Herrera, Dimitri Fontaine Reviewed by Tom Lane
* Preserve intermediate .c files in coverage modePeter Eisentraut2012-10-28
| | | | | | | | The introduction of the .y -> .c pattern rule causes some .c files such as bootparse.c to be considered intermediate files in the .y -> .c -> .o rule chain, which make would automatically delete. But in coverage mode, the processing tools such as genhtml need those files, so mark them as "precious" so that make preserves them.
* Throw error if expiring tuple is again updated or deleted.Kevin Grittner2012-10-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This prevents surprising behavior when a FOR EACH ROW trigger BEFORE UPDATE or BEFORE DELETE directly or indirectly updates or deletes the the old row. Prior to this patch the requested action on the row could be silently ignored while all triggered actions based on the occurence of the requested action could be committed. One example of how this could happen is if the BEFORE DELETE trigger for a "parent" row deleted "children" which had trigger functions to update summary or status data on the parent. This also prevents similar surprising problems if the query has a volatile function which updates a target row while it is already being updated. There are related issues present in FOR UPDATE cursors and READ COMMITTED queries which are not handled by this patch. These issues need further evalution to determine what change, if any, is needed. Where the new error messages are generated, in most cases the best fix will be to move code from the BEFORE trigger to an AFTER trigger. Where this is not feasible, the trigger can avoid the error by re-issuing the triggering statement and returning NULL. Documentation changes will be submitted in a separate patch. Kevin Grittner and Tom Lane with input from Florian Pflug and Robert Haas, based on problems encountered during conversion of Wisconsin Circuit Court trigger logic to plpgsql triggers.
* Prefer actual constants to pseudo-constants in equivalence class machinery.Tom Lane2012-10-26
| | | | | | | | generate_base_implied_equalities_const() should prefer plain Consts over other em_is_const eclass members when choosing the "pivot" value that all the other members will be equated to. This makes it more likely that the generated equalities will be useful in constraint-exclusion proofs. Per report from Rushabh Lathia.
* In pg_dump, dump SEQUENCE SET items in the data not pre-data section.Tom Lane2012-10-26
| | | | | | | | | | | | | | Represent a sequence's current value as a separate TableDataInfo dumpable object, so that it can be dumped within the data section of the archive rather than in pre-data. This fixes an undesirable inconsistency between the meanings of "--data-only" and "--section=data", and also fixes dumping of sequences that are marked as extension configuration tables, as per a report from Marko Kreen back in July. The main cost is that we do one more SQL query per sequence, but that's probably not very meaningful in most databases. Back-patch to 9.1, since it has the extension configuration issue even though not the --section switch.
* Tweak genericcostestimate's fudge factor for index size.Tom Lane2012-10-24
| | | | | | | | | | | | | | | | To provide some bias against using a large index when a small one would do as well, genericcostestimate adds a "fudge factor", which for a long time was random_page_cost * index_pages/10000. However, this can grow to be the dominant term in indexscan cost estimates when the index involved is large enough, a behavior that was never intended. Change to a ln(1 + n/10000) formulation, which has nearly the same behavior up to a few hundred pages but tails off significantly thereafter. (A log curve seems correct on first principles, since what we're trying to account for here is index descent costs, which are typically logarithmic.) Per bug #7619 from Niko Kiirala. Possibly this change should get back-patched, but I'm hesitant to mess with cost estimates in stable branches.
* When converting a table to a view, remove its system columns.Tom Lane2012-10-24
| | | | | | | | | | | Views should not have any pg_attribute entries for system columns. However, we forgot to remove such entries when converting a table to a view. This could lead to crashes later on, if someone attempted to reference such a column, as reported by Kohei KaiGai. Patch in HEAD only. This bug has been there forever, but in the back branches we will have to defend against existing mis-converted views, so it doesn't seem worthwhile to change the conversion code too.
* Add context info to OAT_POST_CREATE security hookAlvaro Herrera2012-10-23
| | | | | | | | | ... and have sepgsql use it to determine whether to check permissions during certain operations. Indexes that are being created as a result of REINDEX, for instance, do not need to have their permissions checked; they were already checked when the index was created. Author: KaiGai Kohei, slightly revised by me
* Correct predicate locking for DROP INDEX CONCURRENTLY.Kevin Grittner2012-10-21
| | | | | | | | | | | | | | | | | | | | | | For the non-concurrent case there is an AccessExclusiveLock lock on both the index and the heap at a time during which no other process is using either, before which the index is maintained and used for scans, and after which the index is no longer used or maintained. Predicate locks can safely be moved from the index to the related heap relation under the protection of these locks. This was done prior to the introductin of DROP INDEX CONCURRENTLY and continues to be done for non-concurrent index drops. For concurrent index drops, the predicate locks must be moved when there are no index scans in progress on that index and no more can subsequently start, and before heap inserts stop maintaining the index. As long as these conditions are guaranteed when the TransferPredicateLocksToHeapRelation() function is called, stronger locks are not needed for correctness. Kevin Grittner based on questions by Tom Lane in reviewing the DROP INDEX CONCURRENTLY patch and in cooperation with Andres Freund and Simon Riggs.
* Fix pg_dump's handling of DROP DATABASE commands in --clean mode.Tom Lane2012-10-20
| | | | | | | | | | | | | | | | | | | In commit 4317e0246c645f60c39e6572644cff1cb03b4c65, I accidentally broke this behavior while rearranging code to ensure that --create wouldn't affect whether a DATABASE entry gets put into archive-format output. Thus, 9.2 would issue a DROP DATABASE command in --clean mode, which is either useless or dangerous depending on the usage scenario. It should not do that, and no longer does. A bright spot is that this refactoring makes it easy to allow the combination of --clean and --create to work sensibly, ie, emit DROP DATABASE then CREATE DATABASE before reconnecting. Ordinarily we'd consider that a feature addition and not back-patch it, but it seems silly to not include the extra couple of lines required in the 9.2 version of the code. Per report from Guillaume Lelarge, though this is slightly more extensive than his proposed patch.
* Fix UtilityContainsQuery() to handle CREATE TABLE AS EXECUTE correctly.Tom Lane2012-10-19
| | | | | | | | | | | The code seems to have been written to handle the pre-parse-analysis representation, where an ExecuteStmt would appear directly under CreateTableAsStmt. But in reality the function is only run on already-parse-analyzed statements, so there will be a Query node in between. We'd not noticed the bug because the function is generally not used at all except in extended query protocol. Per report from Robert Haas and Rushabh Lathia.
* Fix hash_search to avoid corruption of the hash table on out-of-memory.Tom Lane2012-10-19
| | | | | | | | | | | | | | | | An out-of-memory error during expand_table() on a palloc-based hash table would leave a partially-initialized entry in the table. This would not be harmful for transient hash tables, since they'd get thrown away anyway at transaction abort. But for long-lived hash tables, such as the relcache hash, this would effectively corrupt the table, leading to crash or other misbehavior later. To fix, rearrange the order of operations so that table enlargement is attempted before we insert a new entry, rather than after adding it to the hash table. Problem discovered by Hitoshi Harada, though this is a bit different from his proposed patch.
* Fix ruleutils to print "INSERT INTO foo DEFAULT VALUES" correctly.Tom Lane2012-10-19
| | | | | Per bug #7615 from Marko Tiikkaja. Apparently nobody ever tried this case before ...
* Fix orphan on cancel of drop index concurrently.Simon Riggs2012-10-19
| | | | | | | | | | Canceling DROP INDEX CONCURRENTLY during wait could allow an orphaned index to be left behind which could not be dropped. Backpatch to 9.2 Andres Freund, tested by Abhijit Menon-Sen
* Further cleanup of catcache.c ilist changes.Tom Lane2012-10-18
| | | | | | Remove useless duplicate initialization of bucket headers, don't use a dlist_mutable_iter in a performance-critical path that doesn't need it, make some other cosmetic changes for consistency's sake.
* Remove unnecessary "head" arguments from some dlist/slist functions.Tom Lane2012-10-18
| | | | | | | | dlist_delete, dlist_insert_after, dlist_insert_before, slist_insert_after do not need access to the list header, and indeed insisting on that negates one of the main advantages of a doubly-linked list. In consequence, revert addition of "cache_bucket" field to CatCTup.
* Code review for inline-list patch.Tom Lane2012-10-18
| | | | | | | Make foreach macros less syntactically dangerous, and fix some typos in evidently-never-tested ones. Add missing slist_next_node and slist_head_node functions. Fix broken dlist_check code. Assorted comment improvements.
* Further tweaking of the readfile() function in pg_ctl.Heikki Linnakangas2012-10-18
| | | | | | | | | | | | Don't leak a file descriptor if the file is empty or we can't read its size. Expect there to be a newline at the end of the last line, too. If there isn't, ignore anything after the last newline. This makes it a tiny bit more robust in case the file is appended to concurrently, so that we don't return the last line if it hasn't been fully written yet. And this makes the code a bit less obscure, anyway. Per Tom Lane's suggestion. Backpatch to all supported branches.
* Isolation test for DROP INDEX CONCURRENTLYSimon Riggs2012-10-18
| | | | | | for recent concurrent changes. Abhijit Menon-Sen
* Re-think guts of DROP INDEX CONCURRENTLY.Simon Riggs2012-10-18
| | | | | | | | | | | Concurrent behaviour was flawed when using a two-step process, so add an additional phase of processing to ensure concurrency for both SELECTs and INSERT/UPDATE/DELETEs. Backpatch to 9.2 Andres Freund, tweaked by me
* Fix planning of non-strict equivalence clauses above outer joins.Tom Lane2012-10-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a potential equivalence clause references a variable from the nullable side of an outer join, the planner needs to take care that derived clauses are not pushed to below the outer join; else they may use the wrong value for the variable. (The problem arises only with non-strict clauses, since if an upper clause can be proven strict then the outer join will get simplified to a plain join.) The planner attempted to prevent this type of error by checking that potential equivalence clauses aren't outerjoin-delayed as a whole, but actually we have to check each side separately, since the two sides of the clause will get moved around separately if it's treated as an equivalence. Bugs of this type can be demonstrated as far back as 7.4, even though releases before 8.3 had only a very ad-hoc notion of equivalence clauses. In addition, we neglected to account for the possibility that such clauses might have nonempty nullable_relids even when not outerjoin-delayed; so the equivalence-class machinery lacked logic to compute correct nullable_relids values for clauses it constructs. This oversight was harmless before 9.2 because we were only using RestrictInfo.nullable_relids for OR clauses; but as of 9.2 it could result in pushing constructed equivalence clauses to incorrect places. (This accounts for bug #7604 from Bill MacArthur.) Fix the first problem by adding a new test check_equivalence_delay() in distribute_qual_to_rels, and fix the second one by adding code in equivclass.c and called functions to set correct nullable_relids for generated clauses. Although I believe the second part of this is not currently necessary before 9.2, I chose to back-patch it anyway, partly to keep the logic similar across branches and partly because it seems possible we might find other reasons why we need valid values of nullable_relids in the older branches. Add regression tests illustrating these problems. In 9.0 and up, also add test cases checking that we can push constants through outer joins, since we've broken that optimization before and I nearly broke it again with an overly simplistic patch for this problem.
* pg_dump: Output functions deterministically sortedAlvaro Herrera2012-10-18
| | | | | | | Implementation idea from Tom Lane Author: Joel Jacobson Reviewed by Joachim Wieland
* Revert tests for drop index concurrently.Simon Riggs2012-10-18
|
* Add isolation tests for DROP INDEX CONCURRENTLY.Simon Riggs2012-10-18
| | | | | | Backpatch to 9.2 to ensure bugs are fixed. Abhijit Menon-Sen
* Close un-owned SMgrRelations at transaction end.Tom Lane2012-10-17
| | | | | | | | | | | | | | | | | | If an SMgrRelation is not "owned" by a relcache entry, don't allow it to live past transaction end. This design allows the same SMgrRelation to be used for blind writes of multiple blocks during a transaction, but ensures that we don't hold onto such an SMgrRelation indefinitely. Because an SMgrRelation typically corresponds to open file descriptors at the fd.c level, leaving it open when there's no corresponding relcache entry can mean that we prevent the kernel from reclaiming deleted disk space. (While CacheInvalidateSmgr messages usually fix that, there are cases where they're not issued, such as DROP DATABASE. We might want to add some more sinval messaging for that, but I'd be inclined to keep this type of logic anyway, since allowing VFDs to accumulate indefinitely for blind-written relations doesn't seem like a good idea.) This code replaces a previous attempt towards the same goal that proved to be unreliable. Back-patch to 9.1 where the previous patch was added.
* Revert "Use "transient" files for blind writes, take 2".Tom Lane2012-10-17
| | | | | | | | | | This reverts commit fba105b1099f4f5fa7283bb17cba6fed2baa8d0c. That approach had problems with the smgr-level state not tracking what we really want to happen, and with the VFD-level state not tracking the smgr-level state very well either. In consequence, it was still possible to hold kernel file descriptors open for long-gone tables (as in recent report from Tore Halset), and yet there were also cases of FDs being closed undesirably soon. A replacement implementation will follow.
* Embedded list interfaceAlvaro Herrera2012-10-17
| | | | | | | | | | | | | | | | | | | Provide a common implementation of embedded singly-linked and doubly-linked lists. "Embedded" in the sense that the nodes' next/previous pointers exist within some larger struct; this design choice reduces memory allocation overhead. Most of the implementation uses inlineable functions (where supported), for performance. Some existing uses of both types of lists have been converted to the new code, for demonstration purposes. Other uses can (and probably will) be converted in the future. Since dllist.c is unused after this conversion, it has been removed. Author: Andres Freund Some tweaks by me Reviewed by Tom Lane, Peter Geoghegan
* When outputting the session id in log_line_prefix (%c) or in CSV logBruce Momjian2012-10-16
| | | | | output mode, cause the hex digits after the period to always be at least four hex digits, with zero-padding.
* alter_generic regression test cannot run concurrently with privileges test.Tom Lane2012-10-15
| | | | | | | | ... because the latter plays games with the privileges for language SQL. It looks like running alter_generic in parallel with "misc" is OK though. Also, adjust serial_schedule to maintain the same test ordering (up to parallelism) as parallel_schedule.
* Fix typo in comment.Heikki Linnakangas2012-10-15
| | | | Fujii Masao
* Remove comment that is no longer true.Heikki Linnakangas2012-10-15
| | | | | AddToDataDirLockFile() supports out-of-order updates of the lockfile nowadays.
* Fix race condition in pg_ctl reading postmaster.pid.Heikki Linnakangas2012-10-15
| | | | | | | | | | | | If postmaster changed postmaster.pid while pg_ctl was reading it, pg_ctl could overrun the buffer it allocated for the file. Fix by reading the whole file to memory with one read() call. initdb contains an identical copy of the readfile() function, but the files that initdb reads are static, not modified concurrently. Nevertheless, add a simple bounds-check there, if only to silence static analysis tools. Per report from Dave Vitek. Backpatch to all supported branches.
* Split up process latch initialization for more-fail-soft behavior.Tom Lane2012-10-14
| | | | | | | | | | | | | | | | | | | | In the previous coding, new backend processes would attempt to create their self-pipe during the OwnLatch call in InitProcess. However, pipe creation could fail if the kernel is short of resources; and the system does not recover gracefully from a FATAL error right there, since we have armed the dead-man switch for this process and not yet set up the on_shmem_exit callback that would disarm it. The postmaster then forces an unnecessary database-wide crash and restart, as reported by Sean Chittenden. There are various ways we could rearrange the code to fix this, but the simplest and sanest seems to be to split out creation of the self-pipe into a new function InitializeLatchSupport, which must be called from a place where failure is allowed. For most processes that gets called in InitProcess or InitAuxiliaryProcess, but processes that don't call either but still use latches need their own calls. Back-patch to 9.1, which has only a part of the latch logic that 9.2 and HEAD have, but nonetheless includes this bug.
* Fix oversight in new code for printing rangetable aliases.Tom Lane2012-10-12
| | | | | | | | In commit 11e131854f8231a21613f834c40fe9d046926387, I missed the case of a CTE RTE that doesn't have a user-defined alias, but does have an alias assigned by set_rtable_names(). Per report from Peter Eisentraut. While at it, refactor slightly to reduce code duplication.
* In our source code, make a copy of getopt's 'optarg' string arguments,Bruce Momjian2012-10-12
| | | | rather than just storing a pointer.
* Get rid of COERCE_DONTCARE.Tom Lane2012-10-12
| | | | We don't need this hack any more.
* Make equal() ignore CoercionForm fields for better planning with casts.Tom Lane2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | This change ensures that the planner will see implicit and explicit casts as equivalent for all purposes, except in the minority of cases where there's actually a semantic difference (as reflected by having a 3-argument cast function). In particular, this fixes cases where the EquivalenceClass machinery failed to consider two references to a varchar column as equivalent if one was implicitly cast to text but the other was explicitly cast to text, as seen in bug #7598 from Vaclav Juza. We have had similar bugs before in other parts of the planner, so I think it's time to fix this problem at the core instead of continuing to band-aid around it. Remove set_coercionform_dontcare(), which represents the band-aid previously in use for allowing matching of index and constraint expressions with inconsistent cast labeling. (We can probably get rid of COERCE_DONTCARE altogether, but I don't think removing that enum value in back branches would be wise; it's possible there's third party code referring to it.) Back-patch to 9.2. We could go back further, and might want to once this has been tested more; but for the moment I won't risk destabilizing plan choices in long-since-stable branches.
* Unbreak MSVC builds after recent Makefile refactoring.Andrew Dunstan2012-10-11
| | | | Based on a suggestion by Peter Eisentraut.
* Fix cross-type case in partial row matching for hashed subplans.Tom Lane2012-10-11
| | | | | | | | | | | | | | When hashing a subplan like "WHERE (a, b) NOT IN (SELECT x, y FROM ...)", findPartialMatch() attempted to match rows using the hashtable's internal equality operators, which of course are for x and y's datatypes. What we need to use are the potentially cross-type operators for a=x, b=y, etc. Failure to do that leads to wrong answers or even crashes. The scope for problems is limited to cases where we have different types with compatible hash functions (else we'd not be using a hashed subplan), but for example int4 vs int8 can cause the problem. Per bug #7597 from Bo Jensen. This has been wrong since the hashed-subplan code was written, so patch all the way back.
* Improve replication connection timeouts.Heikki Linnakangas2012-10-11
| | | | | | | | | | | | | | | | Rename replication_timeout to wal_sender_timeout, and add a new setting called wal_receiver_timeout that does the same at the walreceiver side. There was previously no timeout in walreceiver, so if the network went down, for example, the walreceiver could take a long time to notice that the connection was lost. Now with the two settings, both sides of a replication connection will detect a broken connection similarly. It is no longer necessary to manually set wal_receiver_status_interval to a value smaller than the timeout. Both wal sender and receiver now automatically send a "ping" message if more than 1/2 of the configured timeout has elapsed, and it hasn't received any messages from the other end. Amit Kapila, heavily edited by me.
* Refactor flex and bison make rulesPeter Eisentraut2012-10-11
| | | | | | | | Numerous flex and bison make rules have appeared in the source tree over time, and they are all virtually identical, so we can replace them by pattern rules with some variables for customization. Users of pgxs will also be able to benefit from this.
* Remove _FORTIFY_SOURCEPeter Eisentraut2012-10-10
| | | | | | | | Apparently, on some glibc versions this causes warnings when optimization is not enabled. Altogether, there appear to be too many incompatibilities surrounding this.
* Update obsolete comment.Tom Lane2012-10-10
| | | | | | | We no longer use GetNewOidWithIndex on pg_largeobject; rather, pg_largeobject_metadata's regular OID column is considered the repository of OIDs for large objects. The special functionality is still needed for TOAST tables however.
* Set procost to 10 for each of the pg_foo_is_visible() functions.Tom Lane2012-10-10
| | | | | | | | | | | The idea here is to make sure the planner will evaluate these functions last not first among the filter conditions in psql pattern search and tab-completion queries. We've discussed this several times, and there was consensus to do it back in August, but we didn't want to do it just before a release. Now seems like a safer time. No catversion bump, since this catalog change doesn't create a backend incompatibility nor any regression test result changes.
* Fix PGXS support for building loadable modules on AIX.Tom Lane2012-10-09
| | | | | | | | | | Building a shlib on AIX requires use of the mkldexport.sh script, but we failed to install that, preventing its use from non-source-tree contexts. Also, Makefile.aix had the wrong idea about where to find the installed copy of the postgres.imp symbol file used by AIX. Per report from John Pierce. Patch all the way back, since this has been broken since the beginning of PGXS.
* Remove unnecessary overhead in backend's large-object operations.Tom Lane2012-10-09
| | | | | | | | | | | | | | | | | | Do read/write permissions checks at most once per large object descriptor, not once per lo_read or lo_write call as before. The repeated tests were quite useless in the read case since the snapshot-based tests were guaranteed to produce the same answer every time. In the write case, the extra tests could in principle detect revocation of write privileges after a series of writes has started --- but there's a race condition there anyway, since we'd check privileges before performing and certainly before committing the write. So there's no real advantage to checking every single time, and we might as well redefine it as "only check the first time". On the same reasoning, remove the LargeObjectExists checks in inv_write and inv_truncate. We already checked existence when the descriptor was opened, and checking again doesn't provide any real increment of safety that would justify the cost.