aboutsummaryrefslogtreecommitdiff
path: root/src/backend/storage/buffer/bufmgr.c
Commit message (Collapse)AuthorAge
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Add an EXPLAIN (BUFFERS) option to show buffer-usage statistics.Robert Haas2009-12-15
| | | | | | | | This patch also removes buffer-usage statistics from the track_counts output, since this (or the global server statistics) is deemed to be a better interface to this information. Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Add a comment documenting the question of whether PrefetchBuffer shouldTom Lane2009-04-03
| | | | | | | try to protect an already-existing buffer from being evicted. This was left as an open issue when the posix_fadvise patch was committed. I'm not sure there's any evidence to justify more work in this area, but we should have some record about it in the source code.
* Modify the relcache to record the temp status of both local and nonlocalTom Lane2009-03-31
| | | | | | | | | | temp relations; this is no more expensive than before, now that we have pg_class.relistemp. Insert tests into bufmgr.c to prevent attempting to fetch pages from nonlocal temp relations. This provides a low-level defense against bugs-of-omission allowing temp pages to be loaded into shared buffers, as in the contrib/pgstattuple problem reported by Stuart Bishop. While at it, tweak a bunch of places to use new relcache tests (instead of expensive probes into pg_namespace) to detect local or nonlocal temp tables.
* More fixes for 8.4 DTrace probes. Remove useless BUFFER_HIT/BUFFER_MISSTom Lane2009-03-23
| | | | | | | probes --- the BUFFER_READ_DONE probe provides the same information and more besides. Expand the LOCK_WAIT_START/DONE probe arguments so that there's actually some chance of telling what is being waited for. Update and clean up the documentation.
* Add isExtend to the parameters of the buffer_read_start and buffer_read_doneTom Lane2009-03-22
| | | | | | | | | | | | | | | DTrace probes, so that ordinary reads can be distinguished from relation extension operations. Move buffer_read_start probe to before the smgrnblocks() call that's needed in the isExtend case, since really that step should be charged as part of the time needed for the extension operation. (This makes it slightly harder to match the read_start with the associated read_done, since now you can't match them on blockNumber, but it should still be possible since isExtend operations on the same relation can never be interleaved.) Per recent discussion. In passing, add the page identity (forkNum/blockNum) to the parameters of the buffer_flush_start/buffer_flush_done probes, which were unaccountably lacking the info.
* Restore previous ordering of BUFFER_FLUSH_START probe. I had wanted toTom Lane2009-03-13
| | | | | | | | | | | make it include the time for the possible smgropen() call, but that results in a null pointer dereference :-(. An alternative solution would be to fetch the buffer tag instead of looking at *reln, but I'll just put it back as it was for the moment. BTW, this indicates that DTrace probes evaluate their arguments even when nominally inactive. What was that about "zero cost", again?
* Code review for dtrace probes added (so far) to 8.4. Adjust placement ofTom Lane2009-03-11
| | | | | | | some bufmgr probes, take out redundant and memory-leak-inducing path arguments to smgr__md__read__done and smgr__md__write__done, fix bogus attempt to recalculate space used in sort__done, clean up formatting in places where I'm not sure pgindent will do a nice job by itself.
* Implement prefetching via posix_fadvise() for bitmap index scans. A newTom Lane2009-01-12
| | | | | | | | | | GUC variable effective_io_concurrency controls how many concurrent block prefetch requests will be issued. (The best way to handle this for plain index scans is still under debate, so that part is not applied yet --- tgl) Greg Stark
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* The attached patch contains a couple of fixes in the existing probes andBruce Momjian2008-12-17
| | | | | | | | | | | | | includes a few new ones. - Fixed compilation errors on OS X for probes that use typedefs - Fixed a number of probes to pass ForkNumber per the relation forks patch - The new probes are those that were taken out from the previous submitted patch and required simple fixes. Will submit the other probes that may require more discussion in a separate patch. Robert Lor
* Rethink the way FSM truncation works. Instead of WAL-logging FSMHeikki Linnakangas2008-11-19
| | | | | | | | | | | | | | | truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To make that cleaner from modularity point of view, move the WAL-logging one level up to RelationTruncate, and move RelationTruncate and all the related WAL-logging to new src/backend/catalog/storage.c file. Introduce new RelationCreateStorage and RelationDropStorage functions that are used instead of calling smgrcreate/smgrscheduleunlink directly. Move the pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new functions. This leaves smgr.c as a thin wrapper around md.c; all the transactional stuff is now in storage.c. This will make it easier to add new forks with similar truncation logic, like the visibility map.
* Change error messages to print the physical path, likeHeikki Linnakangas2008-11-11
| | | | | | "base/11517/3767_fsm", instead of symbolic names like "1663/11517/3767/1", per Alvaro's suggestion. I didn't change the messages in the higher-level index, heap and FSM routines, though, where the fork is implicit.
* Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBufferHeikki Linnakangas2008-10-31
| | | | | | | | | | | | functions into one ReadBufferExtended function, that takes the strategy and mode as argument. There's three modes, RBM_NORMAL which is the default used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages without throwing an error. The FSM needs the new mode to recover from corrupt pages, which could happend if we crash after extending an FSM file, and the new page is "torn". Add fork number to some error messages in bufmgr.c, that still lacked it.
* Properly access a buffer's LSN using existing access macros instead of abusingAlvaro Herrera2008-10-20
| | | | | | knowledge of page layout. Stolen from Jonah Harris' CRC patch
* Allow ShowBufferUsage() to report the number of reads/writes that haveTom Lane2008-09-17
| | | | | | | occurred to temporary files. This replaces the unused NDirectFileRead/NDirectFileWrite counters. Itagaki Takahiro
* Introduce the concept of relation forks. An smgr relation can now consistHeikki Linnakangas2008-08-11
| | | | | | | | | | | | | | | | of multiple forks, and each fork can be created and grown separately. The bulk of this patch is about changing the smgr API to include an extra ForkNumber argument in every smgr function. Also, smgrscheduleunlink and smgrdounlink no longer implicitly call smgrclose, because other forks might still exist after unlinking one. The callers of those functions have been modified to call smgrclose instead. This patch in itself doesn't have any user-visible effect, but provides the infrastructure needed for upcoming patches. The additional forks envisioned are a rewritten FSM implementation that doesn't rely on a fixed-size shared memory block, and a visibility map to allow skipping portions of a table in VACUUM that have no dead tuples.
* In ReadOrZeroBuffer (and related entry points), don't bother to callTom Lane2008-08-05
| | | | | | | | PageHeaderIsValid when we zero the buffer instead of reading the page in. The actual performance improvement is probably marginal since this function isn't very heavily used, but a cycle saved is a cycle earned. Zdenek Kotala
* Add a few more DTrace probes to the backend.Alvaro Herrera2008-08-01
| | | | Robert Lor
* Clean up the use of some page-header-access macros: principally, useTom Lane2008-07-13
| | | | | | | | | | SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that makes the code clearer, and avoid casting between Page and PageHeader where possible. Zdenek Kotala, with some additional cleanup by Heikki Linnakangas. I did not apply the parts of the proposed patch that would have resulted in slightly changing the on-disk format of hash indexes; it seems to me that's not a win as long as there's any chance of having in-place upgrade for 8.4.
* Improve our #include situation by moving pointer types away from theAlvaro Herrera2008-06-19
| | | | | | | corresponding struct definitions. This allows other headers to avoid including certain highly-loaded headers such as rel.h and relscan.h, instead using just relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less unnecessary dependencies.
* Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relationHeikki Linnakangas2008-06-12
| | | | | | | | | | forks. XLogOpenRelation() and the associated light-weight relation cache in xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument, instead of Relation. For functions that still need a Relation struct during WAL replay, there's a new function called CreateFakeRelcacheEntry() that returns a fake entry like XLogOpenRelation() used to.
* Move BufferGetPageSize and BufferGetPage from bufpage.h to bufmgr.h. It isAlvaro Herrera2008-06-08
| | | | | | | | | | more logical that way, and also it reduces the amount of unnecessary includes in bufpage.h, which is widely used. Zdenek Kotala. My previous patch to bufpage.h should also have credited him as author, but I forgot (sorry about that).
* Put back bufmgr.h in bufpage.h -- it is needed by some macros.Alvaro Herrera2008-05-12
| | | | | Remove #include bufmgr.h from (most?) source files which already include bufpage.h.
* Restructure some header files a bit, in particular heapam.h, by removing someAlvaro Herrera2008-05-12
| | | | | | | | | | | | unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Dept. of second thoughts: fix loop in BgBufferSync so that the exit whenTom Lane2007-09-25
| | | | | | | bgwriter_lru_maxpages is exceeded leaves the loop variables in the expected state. In the original coding, we'd fail to advance next_to_clean, causing that buffer to be probably-uselessly rechecked next time, and also have an off-by-one idea of the number of buffers scanned.
* Just-in-time background writing strategy. This code avoids re-scanningTom Lane2007-09-25
| | | | | | | | | | | buffers that cannot possibly need to be cleaned, and estimates how many buffers it should try to clean based on moving averages of recent allocation requests and density of reusable buffers. The patch also adds a couple more columns to pg_stat_bgwriter to help measure the effectiveness of the bgwriter. Greg Smith, building on his own work and ideas from several other people, in particular a much older patch from Itagaki Takahiro.
* HOT updates. When we update a tuple without changing any of its indexedTom Lane2007-09-20
| | | | | | | | | | | | columns, and the new version can be stored on the same heap page, we no longer generate extra index entries for the new version. Instead, index searches follow the HOT-chain links to ensure they find the correct tuple version. In addition, this patch introduces the ability to "prune" dead tuples on a per-page basis, without having to do a complete VACUUM pass to recover space. VACUUM is still needed to clean up dead index entries, however. Pavan Deolasee, with help from a bunch of other people.
* Improve logging of checkpoints. Patch by Greg Smith, worked overTom Lane2007-06-30
| | | | by Heikki and a little bit by me.
* Implement "distributed" checkpoints in which the checkpoint I/O is spreadTom Lane2007-06-28
| | | | | | | | | | | | | over a fairly long period of time, rather than being spat out in a burst. This happens only for background checkpoints carried out by the bgwriter; other cases, such as a shutdown checkpoint, are still done at full speed. Remove the "all buffers" scan in the bgwriter, and associated stats infrastructure, since this seems no longer very useful when the checkpoint itself is properly throttled. Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas, and some minor API editorialization by me.
* Update obsolete comment: it's no longer the case that mdread() will allowTom Lane2007-06-18
| | | | reads beyond EOF, except by special coercion.
* Make large sequential scans and VACUUMs work in a limited-size "ring" ofTom Lane2007-05-30
| | | | | | | | | | | | | | | | | | | | | | | buffers, rather than blowing out the whole shared-buffer arena. Aside from avoiding cache spoliation, this fixes the problem that VACUUM formerly tended to cause a WAL flush for every page it modified, because we had it hacked to use only a single buffer. Those flushes will now occur only once per ring-ful. The exact ring size, and the threshold for seqscans to switch into the ring usage pattern, remain under debate; but the infrastructure seems done. The key bit of infrastructure is a new optional BufferAccessStrategy object that can be passed to ReadBuffer operations; this replaces the former StrategyHintVacuum API. This patch also changes the buffer usage-count methodology a bit: we now advance usage_count when first pinning a buffer, rather than when last unpinning it. To preserve the behavior that a buffer's lifetime starts to decrease when it's released, the clock sweep code is modified to not decrement usage_count of pinned buffers. Work not done in this commit: teach GiST and GIN indexes to use the vacuum BufferAccessStrategy for vacuum-driven fetches. Original patch by Simon, reworked by Heikki and again by Tom.
* Fix up pgstats counting of live and dead tuples to recognize that committedTom Lane2007-05-27
| | | | | | | | | | | and aborted transactions have different effects; also teach it not to assume that prepared transactions are always committed. Along the way, simplify the pgstats API by tying counting directly to Relations; I cannot detect any redeeming social value in having stats pointers in HeapScanDesc and IndexScanDesc structures. And fix a few corner cases in which counts might be missed because the relation's pgstat_info pointer hadn't been set.
* Dept. of second thoughts: add comments cautioning against usingTom Lane2007-05-02
| | | | | | | ReadOrZeroBuffer to fetch pages from beyond physical EOF. This would usually work, but would cause problems for md.c if writes occurred beyond a segment boundary when the previous segment file hadn't been fully extended.
* During WAL recovery, when reading a page that we intend to overwrite completelyTom Lane2007-05-02
| | | | | | | | | | | | from the WAL data, don't bother to physically read it; just have bufmgr.c return a zeroed-out buffer instead. This speeds recovery significantly, and also avoids unnecessary failures when a page-to-be-overwritten has corrupt page headers on disk. This replaces a former kluge that accomplished the latter by pretending zero_damaged_pages was always ON during WAL recovery; which was OK when the kluge was put in, but is unsafe when restoring a WAL log that was written with full_page_writes off. Heikki Linnakangas
* Add some instrumentation to the bgwriter, through the stats collector.Magnus Hagander2007-03-30
| | | | New view pg_stat_bgwriter, and the functions required to build it.
* Wording cleanup for error messages. Also change can't -> cannot.Bruce Momjian2007-02-01
| | | | | | | | | | | | | | Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-05
| | | | back-stamped for this.
* Remove an unnecessary HOLD_INTERRUPTS/RESUME_INTERRUPTS pair.Tom Lane2006-10-22
| | | | | This was required back when RESUME_INTERRUPTS could actually execute ProcessInterrupts, but that hasn't been true since 2001...
* pgindent run for 8.2.Bruce Momjian2006-10-04
|
* Add a check to prevent overwriting valid data if smgrnblocks() gives aTom Lane2006-09-25
| | | | | | wrong answer, as has been seen to occur with a buggy Linux kernel. Not really our bug, but it's a simple test in a seldom-used control path, so might as well have a defense.
* Marginal cleanup in arrangements for ensuring StrategyHintVacuum is clearedTom Lane2006-09-17
| | | | | | | | | after an error during VACUUM. We have a PG_TRY block anyway around the only call sites, so just reset it in the CATCH clause instead of having AtEOXact_Buffers blindly do it during xact end. I think the old code was actively wrong for the case of a failure during ANALYZE inside a subtransaction --- the flag wouldn't get cleared until main transaction end. Probably not worth back-patching though.
* Split the buffer mapping table into multiple separately lockableTom Lane2006-07-23
| | | | | partitions, as per discussion. Passes functionality checks, but I don't have any performance data yet.
* Remove 576 references of include files that were not needed.Bruce Momjian2006-07-14
|
* Repair a low-probability race condition identified by Qingqing Zhou.Tom Lane2006-04-14
| | | | | | | | | | | | If a process abandons a wait in LockBufferForCleanup (in practice, only happens if someone cancels a VACUUM) just before someone else sends it a signal indicating the buffer is available, it was possible for the wakeup to remain in the process' semaphore, causing misbehavior next time the process waited for an lmgr lock. Rather than try to prevent the race condition directly, it seems best to make the lock manager robust against leftover wakeups, by having it repeat waiting on the semaphore if the lock has not actually been granted or denied yet.
* Clean up WAL/buffer interactions as per my recent proposal. Get rid of theTom Lane2006-03-31
| | | | | | | | | | | | | | | | misleadingly-named WriteBuffer routine, and instead require routines that change buffer pages to call MarkBufferDirty (which does exactly what it says). We also require that they do so before calling XLogInsert; this takes care of the synchronization requirement documented in SyncOneBuffer. Note that because bufmgr takes the buffer content lock (in shared mode) while writing out any buffer, it doesn't matter whether MarkBufferDirty is executed before the buffer content change is complete, so long as the content change is completed before releasing exclusive lock on the buffer. So it's OK to set the dirtybit before we fill in the LSN. This eliminates the former kluge of needing to set the dirtybit in LockBuffer. Aside from making the code more transparent, we can also add some new debugging assertions, in particular that the caller of MarkBufferDirty must hold the buffer content lock, not merely a pin.
* Clean up and document the API for XLogOpenRelation and XLogReadBuffer.Tom Lane2006-03-29
| | | | | | | | This commit doesn't make much functional change, but it does eliminate some duplicated code --- for instance, PageIsNew tests are now done inside XLogReadBuffer rather than by each caller. The GIST xlog code still needs a lot of love, but I'll worry about that separately.