aboutsummaryrefslogtreecommitdiff
path: root/src/backend/storage/buffer/buf_init.c
Commit message (Collapse)AuthorAge
* Checkpoint sorting and balancing.Andres Freund2016-03-10
| | | | | | | | | | | | | | | | | | | | | | | | Up to now checkpoints were written in the order they're in the BufferDescriptors. That's nearly random in a lot of cases, which performs badly on rotating media, but even on SSDs it causes slowdowns. To avoid that, sort checkpoints before writing them out. We currently sort by tablespace, relfilenode, fork and block number. One of the major reasons that previously wasn't done, was fear of imbalance between tablespaces. To address that balance writes between tablespaces. The other prime concern was that the relatively large allocation to sort the buffers in might fail, preventing checkpoints from happening. Thus pre-allocate the required memory in shared memory, at server startup. This particularly makes it more efficient to have checkpoint flushing enabled, because that'll often result in a lot of writes that can be coalesced into one flush. Discussion: alpine.DEB.2.10.1506011320000.28433@sto Author: Fabien Coelho and Andres Freund
* Allow to trigger kernel writeback after a configurable number of writes.Andres Freund2016-03-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently writes to the main data files of postgres all go through the OS page cache. This means that some operating systems can end up collecting a large number of dirty buffers in their respective page caches. When these dirty buffers are flushed to storage rapidly, be it because of fsync(), timeouts, or dirty ratios, latency for other reads and writes can increase massively. This is the primary reason for regular massive stalls observed in real world scenarios and artificial benchmarks; on rotating disks stalls on the order of hundreds of seconds have been observed. On linux it is possible to control this by reducing the global dirty limits significantly, reducing the above problem. But global configuration is rather problematic because it'll affect other applications; also PostgreSQL itself doesn't always generally want this behavior, e.g. for temporary files it's undesirable. Several operating systems allow some control over the kernel page cache. Linux has sync_file_range(2), several posix systems have msync(2) and posix_fadvise(2). sync_file_range(2) is preferable because it requires no special setup, whereas msync() requires the to-be-flushed range to be mmap'ed. For the purpose of flushing dirty data posix_fadvise(2) is the worst alternative, as flushing dirty data is just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages from the page cache. Thus the feature is enabled by default only on linux, but can be enabled on all systems that have any of the above APIs. While desirable and likely possible this patch does not contain an implementation for windows. With the infrastructure added, writes made via checkpointer, bgwriter and normal user backends can be flushed after a configurable number of writes. Each of these sources of writes controlled by a separate GUC, checkpointer_flush_after, bgwriter_flush_after and backend_flush_after respectively; they're separate because the number of flushes that are good are separate, and because the performance considerations of controlled flushing for each of these are different. A later patch will add checkpoint sorting - after that flushes from the ckeckpoint will almost always be desirable. Bgwriter flushes are most of the time going to be random, which are slow on lots of storage hardware. Flushing in backends works well if the storage and bgwriter can keep up, but if not it can have negative consequences. This patch is likely to have negative performance consequences without checkpoint sorting, but unfortunately so has sorting without flush control. Discussion: alpine.DEB.2.10.1506011320000.28433@sto Author: Fabien Coelho and Andres Freund
* Make builtin lwlock tranche names consistent.Robert Haas2016-02-12
| | | | | | Previously, we had a mix of styles. Amit Kapila
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Move buffer I/O and content LWLocks out of the main tranche.Robert Haas2015-12-15
| | | | | | | | | | | | | | | | | | | | | | | Move the content lock directly into the BufferDesc, so that locking and pinning a buffer touches only one cache line rather than two. Adjust the definition of BufferDesc slightly so that this doesn't make the BufferDesc any larger than one cache line (at least on platforms where a spinlock is only 1 or 2 bytes). We can't fit the I/O locks into the BufferDesc and stay within one cache line, so move those to a completely separate tranche. This leaves a relatively limited number of LWLocks in the main tranche, so increase the padding of those remaining locks to a full cache line, rather than allowing adjacent locks to share a cache line, hopefully reducing false sharing. Performance testing shows that these changes make little difference on laptop-class machines, but help significantly on larger servers, especially those with more than 2 sockets. Andres Freund, originally based on an earlier patch by Simon Riggs. Review and cosmetic adjustments (including heavy rewriting of the comments) by me.
* pgindent run for 9.5Bruce Momjian2015-05-23
|
* Collection of typo fixes.Heikki Linnakangas2015-05-20
| | | | | | | | | | | | | | | Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.
* Align buffer descriptors to cache line boundaries.Andres Freund2015-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Benchmarks has shown that aligning the buffer descriptor array to cache lines is important for scalability; especially on bigger, multi-socket, machines. Currently the array sometimes already happens to be aligned by happenstance, depending how large previous shared memory allocations were. That can lead to wildly varying performance results after minor configuration changes. In addition to aligning the start of descriptor array, also force the size of individual descriptors to be of a common cache line size (64 bytes). That happens to already be the case on 64bit platforms, but this way we can change the struct BufferDesc more easily. As the alignment primarily matters in highly concurrent workloads which probably all are 64bit these days, and the space wastage of element alignment would be a bit more noticeable on 32bit systems, we don't force the stride to be cacheline sized on 32bit platforms for now. If somebody does actual performance testing, we can reevaluate that decision by changing the definition of BUFFERDESC_PADDED_SIZE. Discussion: 20140202151319.GD32123@awork2.anarazel.de Per discussion with Bruce Momjan, Tom Lane, Robert Haas, and Peter Geoghegan.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Make backend local tracking of buffer pins memory efficient.Andres Freund2014-08-30
| | | | | | | | | | | | | | | | | | | | | | | | Since the dawn of time (aka Postgres95) multiple pins of the same buffer by one backend have been optimized not to modify the shared refcount more than once. This optimization has always used a NBuffer sized array in each backend keeping track of a backend's pins. That array (PrivateRefCount) was one of the biggest per-backend memory allocations, depending on the shared_buffers setting. Besides the waste of memory it also has proven to be a performance bottleneck when assertions are enabled as we make sure that there's no remaining pins left at the end of transactions. Also, on servers with lots of memory and a correspondingly high shared_buffers setting the amount of random memory accesses can also lead to poor cpu cache efficiency. Because of these reasons a backend's buffers pins are now kept track of in a small statically sized array that overflows into a hash table when necessary. Benchmarks have shown neutral to positive performance results with considerably lower memory usage. Patch by me, review by Robert Haas. Discussion: 20140321182231.GA17111@alap3.anarazel.de
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Add an EXPLAIN (BUFFERS) option to show buffer-usage statistics.Robert Haas2009-12-15
| | | | | | | | This patch also removes buffer-usage statistics from the track_counts output, since this (or the global server statistics) is deemed to be a better interface to this information. Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Allow ShowBufferUsage() to report the number of reads/writes that haveTom Lane2008-09-17
| | | | | | | occurred to temporary files. This replaces the unused NDirectFileRead/NDirectFileWrite counters. Itagaki Takahiro
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-05
| | | | back-stamped for this.
* Update copyright for 2006. Update scripts.Bruce Momjian2006-03-05
|
* Standard pgindent run for 8.1.Bruce Momjian2005-10-15
|
* Convert the arithmetic for shared memory size calculation from 'int'Tom Lane2005-08-20
| | | | | | | | | | | to 'Size' (that is, size_t), and install overflow detection checks in it. This allows us to remove the former arbitrary restrictions on NBuffers etc. It won't make any difference in a 32-bit machine, but in a 64-bit machine you could theoretically have terabytes of shared buffers. (How efficiently we could manage 'em remains to be seen.) Similarly, num_temp_buffers, work_mem, and maintenance_work_mem can be set above 2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional work by moi.
* Remove BufferBlockPointers array in favor of a base + (bufnum) * BLCKSZTom Lane2005-08-12
| | | | | | | | computation. On modern machines this is as fast if not faster, and we don't have to clog the CPU's L2 cache with a tens-of-KB pointer array. If we ever decide to adopt a more dynamic allocation method for shared buffers, we'll probably have to revert this patch, but in the meantime we might as well save a few bytes and nanoseconds. Per Qingqing Zhou.
* Cause ShutdownPostgres to do a normal transaction abort during backendTom Lane2005-08-08
| | | | | | | | exit, instead of trying to take shortcuts. Introduce some additional shutdown callback routines to eliminate kluges like having ProcKill be responsible for shutting down the buffer manager. Ensure that the order of operations during shutdown is predictable and what you would expect given the module layering.
* Split the shared-memory array of PGPROC pointers out of the sinvalTom Lane2005-05-19
| | | | | | communication structure, and make it its own module with its own lock. This should reduce contention at least a little, and it definitely makes the code seem cleaner. Per my recent proposal.
* Replace the BufMgrLock with separate locks on the lookup hashtable andTom Lane2005-03-04
| | | | | | | | the freelist, plus per-buffer spinlocks that protect access to individual shared buffer headers. This requires abandoning a global freelist (since the freelist is a global contention point), which shoots down ARC and 2Q as well as plain LRU management. Adopt a clock sweep algorithm instead. Preliminary results show substantial improvement in multi-backend situations.
* Ensure that all details of the ARC algorithm are hidden within freelist.c.Tom Lane2005-02-03
| | | | | This refactoring does not change any algorithms or data structures, just remove visibility of the ARC datastructures from other source files.
* Tag appropriate files for rc3PostgreSQL Daemon2004-12-31
| | | | | | | | Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...
* Remove BufferLocks[] array in favor of a single pointer to the bufferTom Lane2004-10-16
| | | | | | (if any) currently waited for by LockBufferForCleanup(), which is all that we were using it for anymore. Saves some space and eliminates proportional-to-NBuffers slowdown in UnlockBuffers().
* Pgindent run for 8.0.Bruce Momjian2004-08-29
|
* Update copyright to 2004.Bruce Momjian2004-08-29
|
* Code review for EXEC_BACKEND changes. Reduce the number of #ifdefs byTom Lane2004-05-28
| | | | | | | | | | | about a third, make it work on non-Windows platforms again. (But perhaps I broke the WIN32 code, since I have no way to test that.) Fold all the paths that fork postmaster child processes to go through the single routine SubPostmasterMain, which takes care of resurrecting the state that would normally be inherited from the postmaster (including GUC variables). Clean up some places where there's no particularly good reason for the EXEC and non-EXEC cases to work differently. Take care of one or two FIXMEs that remained in the code.
* Make LocalRefCount and PrivateRefCount arrays of int32, rather than long.Neil Conway2004-04-22
| | | | This saves a small amount of per-backend memory for LP64 machines.
* Another round of code cleanup on bufmgr. Use BM_VALID flag to keep trackTom Lane2004-04-21
| | | | | | | | | of whether we have successfully read data into a buffer; this makes the error behavior a bit more transparent (IMHO anyway), and also makes it work correctly for local buffers which don't use Start/TerminateBufferIO. Collapse three separate functions for writing a shared buffer into one. This overlaps a bit with cleanups that Neil proposed awhile back, but seems not to have committed yet.
* Code review for ARC patch. Eliminate static variables, improve handlingTom Lane2004-04-19
| | | | | | | | of VACUUM cases so that VACUUM requests don't affect the ARC state at all, avoid corner case where BufferSync would uselessly rewrite a buffer that no longer contains the page that was to be flushed. Make some minor other cleanups in and around the bufmgr as well, such as moving PinBuffer and UnpinBuffer into bufmgr.c where they really belong.
* Fixed bug where FlushRelationBuffers() did call StrategyInvalidateBuffer()Jan Wieck2004-02-12
| | | | | | | | | | | for already empty buffers because their buffer tag was not cleard out when the buffers have been invalidated before. Also removed the misnamed BM_FREE bufhdr flag and replaced the checks, which effectively ask if the buffer is unpinned, with checks against the refcount field. Jan
* Adjusted calculation of shared memory requirements to newJan Wieck2004-01-15
| | | | | | ARC buffer replacement strategy. Jan
* This patch is the next step towards (re)allowing fork/exec.Bruce Momjian2003-12-20
| | | | Claudio Natoli
* I posted some bufmgr cleanup a few weeks ago, but it conflicted withNeil Conway2003-12-14
| | | | | | | | | | | | | | | | | | | | | | | | some concurrent changes Jan was making to the bufmgr. Here's an updated version of the patch -- it should apply cleanly to CVS HEAD and passes the regression tests. This patch makes the following changes: - remove the UnlockAndReleaseBuffer() and UnlockAndWriteBuffer() macros, and replace uses of them with calls to the appropriate functions. - remove a bunch of #ifdef BMTRACE code: it is ugly & broken (i.e. it doesn't compile) - make BufferReplace() return a bool, not an int - cleanup some logic in bufmgr.c; should be functionality equivalent to the previous code, just cleaner now - remove the BM_PRIVATE flag as it is unused - improve a few comments, etc.
* $Header: -> $PostgreSQL Changes ...PostgreSQL Daemon2003-11-29
|
* 2nd try for the ARC strategy.Jan Wieck2003-11-13
| | | | | | | | | I added a couple more Assertions while tracking down the exact cause of the former bug. All 93 regression tests pass now. Jan
* ARC strategy backed out ... sorryJan Wieck2003-11-13
| | | | Jan
* Replacement of the buffer replacement strategy with an ARCJan Wieck2003-11-13
| | | | | | algorithm adopted for PostgreSQL. Jan
* Update copyrights to 2003.Bruce Momjian2003-08-04
|
* Remove ShutdownBufferPoolAccess exit callback, and do the work inTom Lane2002-09-25
| | | | | | | | | ProcKill instead, where we still have a PGPROC with which to wait on LWLocks. This fixes 'can't wait without a PROC structure' failures occasionally seen during backend shutdown (I'm surprised they weren't more frequent, actually). Add an Assert() to LWLockAcquire to help catch any similar mistakes in future. Fix failure to update MyProcPid for standalone backends and pgstat processes.
* pgindent run.Bruce Momjian2002-09-04
|
* Remove sys/types.h in files that include postgres.h, and hence c.h,Bruce Momjian2002-09-02
| | | | because c.h has sys/types.h.