aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
...
* Use SET TRANSACTION READ ONLY in pg_dump, if server supports it.Tom Lane2013-01-19
| | | | | | | | | | This currently does little except serve as documentation. (The one case where it has a performance benefit, SERIALIZABLE mode in 9.1 and up, was already using READ ONLY mode.) However, it's possible that it might have performance benefits in future, and in any case it seems like good practice since it would catch any accidentally non-read-only operations. Pavan Deolasee
* Modernize string literal syntax in tutorial example.Tom Lane2013-01-19
| | | | | | | | | | Un-double the backslashes in the LIKE patterns, since standard_conforming_strings is now the default. Just to be sure, include a command to set standard_conforming_strings to ON in the example. Back-patch to 9.1, where standard_conforming_strings became the default. Josh Kupershmidt, reviewed by Jeff Janes
* Make pgxs build executables with the right suffix.Andrew Dunstan2013-01-19
| | | | | | | | | | | | | | | Complaint and patch from Zoltán Böszörményi. When cross-compiling, the native make doesn't know about the Windows .exe suffix, so it only builds with it when explicitly told to do so. The native make will not see the link between the target name and the built executable, and might this do unnecesary work, but that's a bigger problem than this one, if in fact we consider it a problem at all. Back-patch to all live branches.
* Protect against SnapshotNow race conditions in pg_tablespace scans.Tom Lane2013-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use of SnapshotNow is known to expose us to race conditions if the tuple(s) being sought could be updated by concurrently-committing transactions. CREATE DATABASE and DROP DATABASE are particularly exposed because they do heavyweight filesystem operations during their scans of pg_tablespace, so that the scans run for a very long time compared to most. Furthermore, the potential consequences of a missed or twice-visited row are nastier than average: * createdb() could fail with a bogus "file already exists" error, or silently fail to copy one or more tablespace's worth of files into the new database. * remove_dbtablespaces() could miss one or more tablespaces, thus failing to free filesystem space for the dropped database. * check_db_file_conflict() could likewise miss a tablespace, leading to an OID conflict that could result in data loss either immediately or in future operations. (This seems of very low probability, though, since a duplicate database OID would be unlikely to start with.) Hence, it seems worth fixing these three places to use MVCC snapshots, even though this will someday be superseded by a generic solution to SnapshotNow race conditions. Back-patch to all active branches. Stephen Frost and Tom Lane
* Rename new latex longtable function name, for consistencyBruce Momjian2013-01-18
|
* Unbreak lock conflict detection for Hot Standby.Robert Haas2013-01-18
| | | | | | | | | | This got broken in the original fast-path locking patch, because I failed to account for the fact that Hot Standby startup process might take a strong relation lock on a relation in a database to which it is not bound, and confused MyDatabaseId with the database ID of the relation being locked. Report and diagnosis by Andres Freund. Final form of patch by me.
* Fix off-by-one bug in xlog reading logicAlvaro Herrera2013-01-18
| | | | | | Bug reported by Michael Paquier Author: Andres Freund
* psql latex fixesBruce Momjian2013-01-18
| | | | | | Remove extra line at bottom of table for new 'latex' mode border=3. Also update 'latex'-longtable 'tableattr' docs to say 'whitespace-separated' instead of 'space'.
* Now that START_REPLICATION returns the next timeline's ID after reaching endHeikki Linnakangas2013-01-18
| | | | | | | | | | | | | | of timeline, take advantage of that in walreceiver. Startup process is still in control of choosign the target timeline, by scanning the timeline history files present in pg_xlog, but walreceiver now uses the next timeline's ID to fetch its history file immediately after it has finished streaming the old timeline. Before, the standby would first try to restart streaming on the old timeline, which fetches the missing timeline history file as a side-effect, and only then restart from the new timeline. This patch eliminates the extra iteration, which speeds up the timeline switch and reduces the noise in the log caused by the extra restart on the old timeline.
* Use the right timeline when beginning to stream from master.Heikki Linnakangas2013-01-18
| | | | | | | | | | | | | | | | | | | | | The xlogreader refactoring broke the logic to decide which timeline to start streaming from. XLogPageRead() uses the timeline history to check which timeline the requested WAL position falls into. However, after the refactoring, XLogPageRead() is always first called with the first page in the segment, to verify the segment header, and only then with the actual WAL position we're interested in. That first read of the segment's header made XLogPageRead() to always start streaming from the old timeline containing the segment header, not the timeline containing the actual record, if there was a timeline switch within the segment. I thought I fixed this yesterday, but that fix was too narrow and only fixed this for the corner-case that the timeline switch happened in the first page of the segment. To fix this more robustly, pass explicitly the position of the record we're actually interested in to XLogPageRead, and use that to decide which timeline to read from, rather than deduce it from the page and offset. Per report from Fujii Masao.
* When xlogreader asks the callback function to read a page, make sure weHeikki Linnakangas2013-01-17
| | | | | | | | | get a large enough part of the page to include the beginning of the next record we're interested in. The XLogPageRead callback uses the requested length to decide which timeline to stream WAL from, and if the first call is short, and the page contains a timeline switch, we'll repeatedly try to stream that page from the old timeline, and never get across the timeline switch.
* I added a result set to START_STREAMING command, but neglected walreceiver.Heikki Linnakangas2013-01-17
| | | | | | | The patch to allow pg_receivexlog to switch timeline added a result set after copy has ended in START_STREAMING command, to return the next timeline's ID to the client. But walreceived didn't get the memo, and threw an error on the unexpected result set. Fix.
* Accelerate end-of-transaction dropping of relationsAlvaro Herrera2013-01-17
| | | | | | | | | | | | | | | | | | | | | | When relations are dropped, at end of transaction we need to remove the files and clean the buffer pool of buffers containing pages of those relations. Previously we would scan the buffer pool once per relation to clean up buffers. When there are many relations to drop, the repeated scans make this process slow; so we now instead pass a list of relations to drop and scan the pool once, checking each buffer against the passed list. When the number of relations is larger than a threshold (which as of this patch is being set to 20 relations) we sort the array before starting, and bsearch the array; when it's smaller, we simply scan the array linearly each time, because that's faster. The exact optimal threshold value depends on many factors, but the difference is not likely to be significant enough to justify making it user-settable. This has been measured to be a significant win (a 15x win when dropping 100,000 relations; an extreme case, but reportedly a real one). Author: Tomas Vondra, some tweaks by me Reviewed by: Robert Haas, Shigeru Hanada, Andres Freund, Álvaro Herrera
* Make pg_receivexlog and pg_basebackup -X stream work across timeline switches.Heikki Linnakangas2013-01-17
| | | | | | | | | | | | | | | | | | | | | | This mirrors the changes done earlier to the server in standby mode. When receivelog reaches the end of a timeline, as reported by the server, it fetches the timeline history file of the next timeline, and restarts streaming from the new timeline by issuing a new START_STREAMING command. When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the last segment on the old timeline. This helps you to tell apart a partial segment left in the directory because of a timeline switch, and a completed segment. If you just follow a single server, it won't make a difference, but it can be significant in more complicated scenarios where new WAL is still generated on the old timeline. This includes two small changes to the streaming replication protocol: First, when you reach the end of timeline while streaming, the server now sends the TLI of the next timeline in the server's history to the client. pg_receivexlog uses that as the next timeline, so that it doesn't need to parse the timeline history file like a standby server does. Second, when BASE_BACKUP command sends the begin and end WAL positions, it now also sends the timeline IDs corresponding the positions.
* Improve memory space management in tuplesort and tuplestore.Tom Lane2013-01-17
| | | | | | | | | | | | | | | | | | | | | The code originally just doubled the size of the tuple-pointer array so long as that would fit in allowedMem. This could result in failing to use as much as half of allowedMem, if (as is typical) the last doubling attempt didn't quite fit. Worse, we might double the array size but be unable to use most of the added slots, because there was no room left within the allowedMem limit for tuples the slots should point to. To fix, double only so long as we've used less than half of allowedMem in total. Then do one more array enlargement, but scale it based on total memory consumption so far. This will work nicely as long as the average tuple size is reasonably stable, and in any case should be better than the old method. This change will result in large sort operations consuming a larger fraction of work_mem than they typically did in the past. The release notes should mention that users may want to revisit their work_mem settings, if they'd tuned those settings based on the old behavior of sorting. Jeff Janes, reviewed by Peter Geoghegan and Robert Haas
* Fix a couple of error-handling bugs in the xlogreader patch.Heikki Linnakangas2013-01-17
| | | | | | | | | | | | XLogReadRecord should reset its state on every error, to make sure it re-reads the page on next call. It was inconsistent in that some errors did that, but some did not. In ReadRecord(), don't give up on an error if we're in standby mode. The loop was set up to retry, but the checks within the loop broke out of the loop on any error. Andres Freund, with some tweaking by me.
* Add a latex-longtable output format to psqlBruce Momjian2013-01-17
| | | | | latex longtable is more powerful than the 'tabular' output format 'latex' uses. Also add border=3 support to 'latex'.
* Silence compiler warningsMagnus Hagander2013-01-17
|
* Make GiST indexes on-disk compatible with 9.2 again.Heikki Linnakangas2013-01-17
| | | | | | | | | | | The patch that turned XLogRecPtr into a uint64 inadvertently changed the on-disk format of GiST indexes, because the NSN field in the GiST page opaque is an XLogRecPtr. That breaks pg_upgrade. Revert the format of that field back to the two-field struct that XLogRecPtr was before. This is the same we did to LSNs in the page header to avoid changing on-disk format. Bump catversion, as this invalidates any existing GiST indexes built on 9.3devel.
* Base the default SSL ciphers on DEFAULT instead of ALLMagnus Hagander2013-01-17
| | | | | | | | It's better to start from what the OpenSSL people consider a good default and then remove insecure things (low encryption, exportable encryption and md5 at this point) from that, instead of starting from everything that exists and remove from that. We trust the OpenSSL people to make good choices about what the default is.
* Make size-output fixed length in pg_basebackup verbose modeMagnus Hagander2013-01-17
| | | | | This way the line doesn't shift right as the amount of data processed increases.
* Truncate filenames in the leadning end in pg_basebackup verbose outputMagnus Hagander2013-01-17
| | | | | | | | When truncating at the end, like before, the output would often end up just showing the path instead of the filename. Also increase the length of the filename by 5, which still keeps us at less than 80 characters in most outputs.
* Support multiple -t/--table arguments for more commandsMagnus Hagander2013-01-17
| | | | | | | | On top of the previous support in pg_dump, add support to specify multiple tables (by using the -t option multiple times) to pg_restore, clsuterdb, reindexdb and vacuumdb. Josh Kupershmidt, reviewed by Karl O. Pinc
* Get rid of pg_dump's READMEPeter Eisentraut2013-01-16
| | | | | | | | It was largely full of outdated and incorrect information. Move the few notes which were still relevant into header comments of pg_backup_tar.c and pg_dumpall.c. Josh Kupershmidt
* Split out XLog reading as an independent facilityAlvaro Herrera2013-01-16
| | | | | | | | | | | | | | | | | This new facility can not only be used by xlog.c to carry out crash recovery, but also by external programs. By supplying a function to read XLog pages from somewhere, all the WAL reading can be used for completely different purposes. For the standard backend use, the behavior should be pretty much the same as previously. As for non-backend programs, an hypothetical pg_xlogdump program is now closer to reality, but some more backend support is still necessary. This patch was originally submitted by Andres Freund in a different form, but Heikki Linnakangas opted for and authored another design of the concept. Andres has advanced the patch since Heikki's initial version. Review and some (mostly cosmetics) changes by me.
* Make \? help message more clear when not connected.Heikki Linnakangas2013-01-15
| | | | | | | On second thought, "none" could mislead to think that you're connected a database with that name. Duplicate the whole string, so that it can be more easily translated. In back-branches, thought, just use an empty string in place of the database name, to avoid adding a translatable string.
* Don't pass NULL to fprintf, if not currently connected to a database.Heikki Linnakangas2013-01-15
| | | | | Backpatch all the way to 8.3. Fixes bug #7811, per report and diagnosis by Meng Qingzhong.
* Rework order of checks in ALTER / SET SCHEMAAlvaro Herrera2013-01-15
| | | | | | | | | | | | | | | | | | | | | | | When attempting to move an object into the schema in which it already was, for most objects classes we were correctly complaining about exactly that ("object is already in schema"); but for some other object classes, such as functions, we were instead complaining of a name collision ("object already exists in schema"). The latter is wrong and misleading, per complaint from Robert Haas in CA+TgmoZ0+gNf7RDKRc3u5rHXffP=QjqPZKGxb4BsPz65k7qnHQ@mail.gmail.com To fix, refactor the way these checks are done. As a bonus, the resulting code is smaller and can also share some code with Rename cases. While at it, remove use of getObjectDescriptionOids() in error messages. These are normally disallowed because of translatability considerations, but this one had slipped through since 9.1. (Not sure that this is worth backpatching, though, as it would create some untranslated messages in back branches.) This is loosely based on a patch by KaiGai Kohei, heavily reworked by me.
* Give a proper error message if connecting to incompatible server.Heikki Linnakangas2013-01-15
| | | | | The WAL streaming message format changed in 9.3, so 9.3 pg_basebackup or pg_receivelog won't work against older servers.
* Fix hash_update_hash_key() to handle same-bucket case correctly.Tom Lane2013-01-14
| | | | | | Original coding would corrupt the hashtable if the item being updated was at the end of its bucket chain and the new hash key hashed to that same bucket. Diagnosis and fix by Heikki Linnakangas.
* Return value of lseek() can be negative on failure.Heikki Linnakangas2013-01-15
| | | | | | | | Because the return value of lseek() was assigned to an unsigned size_t variable, we'd fail to notice an error return code -1. Compiler gave a warning about this. Andres Freund
* Fix obsolete SQL syntax in comment.Tom Lane2013-01-14
| | | | | | | This was legal back in the days of add_missing_from, though perhaps never good style. It's not legal anymore ... Jan Urbański
* Reject out-of-range dates in to_date().Tom Lane2013-01-14
| | | | | | | | | Dates outside the supported range could be entered, but would not print reasonably, and operations such as conversion to timestamp wouldn't behave sanely either. Since this has the potential to result in undumpable table data, it seems worth back-patching. Hitoshi Harada
* Add new timezone abbrevation "FET".Tom Lane2013-01-14
| | | | | | | This seems to have been invented in 2011 to represent GMT+3, non daylight savings rules, as now used in Europe/Kaliningrad and Europe/Minsk. There are no conflicts so might as well add it to the Default list. Per bug #7804 from Ruslan Izmaylov.
* Remove spurious spaceAlvaro Herrera2013-01-14
| | | | Andres Freund
* Prevent very-low-probability PANIC during PREPARE TRANSACTION.Tom Lane2013-01-13
| | | | | | | | | | | | | | | | | | | | | The code in PostPrepare_Locks supposed that it could reassign locks to the prepared transaction's dummy PGPROC by deleting the PROCLOCK table entries and immediately creating new ones. This was safe when that code was written, but since we invented partitioning of the shared lock table, it's not safe --- another process could steal away the PROCLOCK entry in the short interval when it's on the freelist. Then, if we were otherwise out of shared memory, PostPrepare_Locks would have to PANIC, since it's too late to back out of the PREPARE at that point. Fix by inventing a dynahash.c function to atomically update a hashtable entry's key. (This might possibly have other uses in future.) This is an ancient bug that in principle we ought to back-patch, but the odds of someone hitting it in the field seem really tiny, because (a) the risk window is small, and (b) nobody runs servers with maxed-out lock tables for long, because they'll be getting non-PANIC out-of-memory errors anyway. So fixing it in HEAD seems sufficient, at least until the new code has gotten some testing.
* Make spelling more uniformPeter Eisentraut2013-01-13
|
* Update comments for elog_start().Tom Lane2013-01-13
| | | | Forgot I was going to do this as part of the previous patch ...
* Improve handling of ereport(ERROR) and elog(ERROR).Tom Lane2013-01-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 71450d7fd6c7cf7b3e38ac56e363bff6a681973c, we added code to inform suitably-intelligent compilers that ereport() doesn't return if the elevel is ERROR or higher. This patch extends that to elog(), and also fixes a double-evaluation hazard that the previous commit created in ereport(), as well as reducing the emitted code size. The elog() improvement requires the compiler to support __VA_ARGS__, which should be available in just about anything nowadays since it's required by C99. But our minimum language baseline is still C89, so add a configure test for that. The previous commit assumed that ereport's elevel could be evaluated twice, which isn't terribly safe --- there are already counterexamples in xlog.c. On compilers that have __builtin_constant_p, we can use that to protect the second test, since there's no possible optimization gain if the compiler doesn't know the value of elevel. Otherwise, use a local variable inside the macros to prevent double evaluation. The local-variable solution is inferior because (a) it leads to useless code being emitted when elevel isn't constant, and (b) it increases the optimization level needed for the compiler to recognize that subsequent code is unreachable. But it seems better than not teaching non-gcc compilers about unreachability at all. Lastly, if the compiler has __builtin_unreachable(), we can use that instead of abort(), resulting in a noticeable code savings since no function call is actually emitted. However, it seems wise to do this only in non-assert builds. In an assert build, continue to use abort(), so that the behavior will be predictable and debuggable if the "impossible" happens. These changes involve making the ereport and elog macros emit do-while statement blocks not just expressions, which forces small changes in a few call sites. Andres Freund, Tom Lane, Heikki Linnakangas
* Extend and improve use of EXTRA_REGRESS_OPTS.Andrew Dunstan2013-01-12
| | | | | | | | | | This is now used by ecpg tests, and not clobbered by pg_upgrade tests. This change won't affect anything that doesn't set this environment variable, but will enable the buildfarm to control exactly what port regression test installs will be running on, and thus to detect possible rogue postmasters more easily. Backpatch to release 9.2 where EXTRA_REGRESS_OPTS was first used.
* Redesign the planner's handling of index-descent cost estimation.Tom Lane2013-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically we've used a couple of very ad-hoc fudge factors to try to get the right results when indexes of different sizes would satisfy a query with the same number of index leaf tuples being visited. In commit 21a39de5809cd3050a37d2554323cc1d0cbeed9d I tweaked one of these fudge factors, with results that proved disastrous for larger indexes. Commit bf01e34b556ff37982ba2d882db424aa484c0d07 fudged it some more, but still with not a lot of principle behind it. What seems like a better way to address these issues is to explicitly model index-descent costs, since that's what's really at stake when considering diferent indexes with similar leaf-page-level costs. We tried that once long ago, and found that charging random_page_cost per page descended through was way too much, because upper btree levels tend to stay in cache in real-world workloads. However, there's still CPU costs to think about, and the previous fudge factors can be seen as a crude attempt to account for those costs. So this patch replaces those fudge factors with explicit charges for the number of tuple comparisons needed to descend the index tree, plus a small charge per page touched in the descent. The cost multipliers are chosen so that the resulting charges are in the vicinity of the historical (pre-9.2) fudge factors for indexes of up to about a million tuples, while not ballooning unreasonably beyond that, as the old fudge factor did (even more so in 9.2). To make this work accurately for btree indexes, add some code that allows extraction of the known root-page height from a btree. There's no equivalent number readily available for other index types, but we can use the log of the number of index pages as an approximate substitute. This seems like too much of a behavioral change to risk back-patching, but it should improve matters going forward. In 9.2 I'll just revert the fudge-factor change.
* Detect Windows perl linkage parameters in configure script.Andrew Dunstan2013-01-09
| | | | | | This means we can now construct a configure test for the library presence. Previously these parameters were only figured out at build time in plperl's GnuMakefile.
* Properly install ecpg_compat and pgtypes libraries on msvcMagnus Hagander2013-01-09
| | | | JiangGuiqing
* Don't attempt to write recovery.conf when -R is not specifiedMagnus Hagander2013-01-09
| | | | | | Fixes segmentation fault during regular use. Fujii Masao
* Fix potential corruption of lock table in CREATE/DROP INDEX CONCURRENTLY.Tom Lane2013-01-08
| | | | | | | | | | | | | | | | If VirtualXactLock() has to wait for a transaction that holds its VXID lock as a fast-path lock, it must first convert the fast-path lock to a regular lock. It failed to take the required "partition" lock on the main shared-memory lock table while doing so. This is the direct cause of the assert failure in GetLockStatusData() recently observed in the buildfarm, but more worryingly it could result in arbitrary corruption of the shared lock table if some other process were concurrently engaged in modifying the same partition of the lock table. Fortunately, VirtualXactLock() is only used by CREATE INDEX CONCURRENTLY and DROP INDEX CONCURRENTLY, so the opportunities for failure are fewer than they might have been. In passing, improve some comments and be a bit more consistent about order of operations.
* Fix typoPeter Eisentraut2013-01-07
|
* Fix a logic bug in pgindent.Andrew Dunstan2013-01-07
|
* Fix incorrect error message when schema-CREATE permission is absent.Robert Haas2013-01-07
| | | | Report by me. Fix by KaiGai Kohei.
* Add support for generating minimal recovery.conf when doing base backupsMagnus Hagander2013-01-05
| | | | | | | | | Adds commandline option -R to pg_basebackup that creates a recovery.conf which enables standby mode using the same parameters that pg_basebackup used to connect to the master, and writes it into the output directory (or injects it in the tar file when tar format is used). Zoltan Boszormenyi, modified by Magnus Hagander, reviewed by Amit Kapila & Fujii Masao
* Centralize single quote escaping in src/port/quotes.cMagnus Hagander2013-01-05
| | | | | | For code-reuse in upcoming functionality in pg_basebackup. Zoltan Boszormenyi