aboutsummaryrefslogtreecommitdiff
path: root/doc/TODO.detail/pitr
blob: d4d4c4140477f4d7dbcac938f86ca84be51287dd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
From pgsql-admin-owner+M15281=pgman=candle.pha.pa.us@postgresql.org Thu Oct 21 18:57:36 2004
Return-path: <pgsql-admin-owner+M15281=pgman=candle.pha.pa.us@postgresql.org>
Received: from svr1.postgresql.org (svr1.postgresql.org [200.46.204.71])
	by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id i9LLvYf17059
	for <pgman@candle.pha.pa.us>; Thu, 21 Oct 2004 17:57:34 -0400 (EDT)
Received: from localhost (unknown [200.46.204.144])
	by svr1.postgresql.org (Postfix) with ESMTP id 79D9132A71A
	for <pgman@candle.pha.pa.us>; Thu, 21 Oct 2004 22:57:29 +0100 (BST)
Received: from svr1.postgresql.org ([200.46.204.71])
	by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
	with ESMTP id 80515-02 for <pgman@candle.pha.pa.us>;
	Thu, 21 Oct 2004 21:57:26 +0000 (GMT)
Received: from postgresql.org (svr1.postgresql.org [200.46.204.71])
	by svr1.postgresql.org (Postfix) with ESMTP id 1209432A70E
	for <pgman@candle.pha.pa.us>; Thu, 21 Oct 2004 22:57:29 +0100 (BST)
X-Original-To: pgsql-admin-postgresql.org@localhost.postgresql.org
Received: from localhost (unknown [200.46.204.144])
	by svr1.postgresql.org (Postfix) with ESMTP id 4B39932A6C3
	for <pgsql-admin-postgresql.org@localhost.postgresql.org>; Thu, 21 Oct 2004 22:51:01 +0100 (BST)
Received: from svr1.postgresql.org ([200.46.204.71])
	by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
	with ESMTP id 78125-02
	for <pgsql-admin-postgresql.org@localhost.postgresql.org>;
	Thu, 21 Oct 2004 21:50:48 +0000 (GMT)
Received: from news.hub.org (news.hub.org [200.46.204.72])
	by svr1.postgresql.org (Postfix) with ESMTP id 27F0632A6C2
	for <pgsql-admin@postgresql.org>; Thu, 21 Oct 2004 22:50:49 +0100 (BST)
Received: from news.hub.org (news.hub.org [200.46.204.72])
	by news.hub.org (8.12.9/8.12.9) with ESMTP id i9LLojJ7079086
	for <pgsql-admin@postgresql.org>; Thu, 21 Oct 2004 21:50:45 GMT
	(envelope-from news@news.hub.org)
Received: (from news@localhost)
	by news.hub.org (8.12.9/8.12.9/Submit) id i9LLnd7p078783
	for pgsql-admin@postgresql.org; Thu, 21 Oct 2004 21:49:39 GMT
From: Gaetano Mendola <mendola@bigfoot.com>
X-Newsgroups: comp.databases.postgresql.admin
Subject: Re: [ADMIN] replication using WAL archives
Date: Thu, 21 Oct 2004 23:49:35 +0200
Organization: PYRENET Midi-pyrenees Provider
Lines: 216
Message-ID: <41782EEF.5040708@bigfoot.com>
References: <002801c4b739$68450870$7201a8c0@mst1x5r347kymb> <1098384082.15573.14.camel@camel>
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="------------060900090803090101060101"
X-Complaints-To: abuse@pyrenet.fr
cc: iain@mst.co.jp
To: Robert Treat <xzilla@users.sourceforge.net>
User-Agent: Mozilla Thunderbird 0.8 (Windows/20040913)
X-Accept-Language: en-us, en
In-Reply-To: <1098384082.15573.14.camel@camel>
X-Enigmail-Version: 0.86.1.0
X-Enigmail-Supports: pgp-inline, pgp-mime
To: pgsql-admin@postgresql.org
X-Virus-Scanned: by amavisd-new at hub.org
X-Mailing-List: pgsql-admin
Precedence: bulk
Sender: pgsql-admin-owner@postgresql.org
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on 
	candle.pha.pa.us
X-Spam-Status: No, hits=-4.9 required=5.0 tests=BAYES_00 autolearn=ham 
	version=2.61
Status: OR

This is a multi-part message in MIME format.
--------------060900090803090101060101
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Robert Treat wrote:
> On Thu, 2004-10-21 at 02:44, Iain wrote:
> 
>>Hi,
>> 
>>I thought I read something about this in relation to v8, but I can't
>>find any reference to it now... is it (or will it be) possible to do
>>master-slave style database replication by transmitting log files to the
>>standby server and having it process them?
>> 
> 
> 
> I'm not certain if this is 8.0, but some folks have created a working
> version against the 8.0 code that will do something like this. Search
> the pgsql-hacker mail list archives for more information on it. 

I sent a post on hackers, I put it here:

=======================================================================
Hi all,
I seen that Eric Kerin did the work suggested by Tom about
how to use the PITR in order to have an hot spare postgres,
writing a C program.

I did the same writing 2 shell scripts, one of them perform
the restore the other one deliver the partial filled wal and
check if the postmaster is alive ( check if the pid process
still exist ).

With these two scripts I'm able to have an hot spare installation,
and the spare one go alive when the first postmaster dies.

How test it:

1) Master node:
	modify postgresql.conf using:

~        archive_command = 'cp %p /mnt/server/archivedir/%f'

~        launch postgres and perform a backup as doc

~        http://developer.postgresql.org/docs/postgres/backup-online.html

	suggest to do

	launch the script:

	partial_wal_deliver.sh <PID> /mnt/server/partialdir <pg_xlog path>

~        this script will delivery each 10 seconds the "current" wal file,
~        and touch the "alive" file in order to notify the spare node that
~        the master node is up and running


2) Spare node:
	create a recovery.conf with the line:

~        restore_command = 'restore.sh /mnt/server/archivedir/%f %p /mnt/server/partialdir'

~        replace the content of data directory with the backup performed at point 1,
~        remove any file present in the pg_xlog directory ( leaving there the archive_status
~        directory ) and remove the postmaster.pid file ( this is necessary if you are running
~        the spare postgres on the same hw ).

~        launch the postmaster, the restore will continue till the "alive" file present in the
~        /mnt/server/partialdir directory is not updated for 60 seconds ( you can modify this
~        values inside the restore.sh script ).

Be sure that restore.sh and all directories involved are accessible

Let me know.


This is a first step, of course, as Eric Kerin did, is better port these script
in C and make it more robust.

Postgres can help this process, as suggested by Tom creating a pg_current_wal()
or even better having two new GUC parameters: archive_current_wal_command and
archive_current_wal_delay.

I problem I discover during the tests is that if you shut down the spare node
and the restore_command is still waiting for a file then the postmaster will never
exit  :-(
==========================================================================

I hope that is clear.



Regards
Gaetano Mendola



--------------060900090803090101060101
Content-Type: text/plain;
 name="restore.sh"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="restore.sh"

#!/bin/bash


SOURCE=$1
TARGET=$2
PARTIAL=$3

SIZE_EXPECTED=16777216		#bytes     16 MB
DIED_TIME=60                    #seconds

function test_existence
{
    if [ -f ${SOURCE}   ]
    then
       COUNTER=0

       #I have to check if the file is begin copied
       #I assume that it will reach the right
       #size in a few seconds

       while [ $(stat -c '%s' ${SOURCE} ) -lt $SIZE_EXPECTED ]
       do
          sleep 1
          let COUNTER+=1
          if [ 20 -lt $COUNTER ]
          then
             exit 1    # BAILING OUT
          fi
       done

       cp $SOURCE $TARGET
       exit 0
    fi
    echo ${SOURCE}"> not found"
    
    #if is looking for a history file and not exist 
    #I have suddenly exit
    echo $SOURCE | grep history > /dev/null 2>&1 && exit 1
}


while [ 1 ]
do 

   test_existence

   #CHECK IF THE MASTER IS ALIVE
   DELTA_TIME=$(( $( date +'%s' ) - $( stat -c '%Z' ${PARTIAL}/alive ) ))
   if [ $DIED_TIME -lt $DELTA_TIME ]
   then
       echo "Master is dead..."
       # Master is dead
       CURRENT_WAL=$( basename $SOURCE )
       echo "Partial: " ${PARTIAL}
       echo "Current wal: " ${CURRENT_WAL}
       echo "Target: " ${TARGET}
       cp ${PARTIAL}/${CURRENT_WAL}.partial ${TARGET}  > /dev/null 2>&1 && exit 0
       exit 1
   fi

   sleep 1

done

--------------060900090803090101060101
Content-Type: text/plain;
 name="partial_wal_deliver.sh"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="partial_wal_deliver.sh"

#!/bin/bash

PID=$1
PARTIAL=$2
PGXLOG=$3

function copy_last_wal
{
   FILE=$( ls -t1p $PGXLOG | grep -v / | head -1 )

   echo "Last Wal> " $FILE

   cp ${PGXLOG}/${FILE} ${PARTIAL}/${FILE}.tmp
   mv ${PARTIAL}/${FILE}.tmp ${PARTIAL}/${FILE}.partial
   find ${PARTIAL} -name *.partial | grep -v ${FILE} | xargs -i rm -fr {}
}


while [ 1 ]
do 
   ps --pid $PID > /dev/null 2>&1
   ALIVE=$?
   
   if [ "${ALIVE}" == "1" ]
   then
       #The process is dead
       echo "Process dead"
       copy_last_wal 
       exit 1
   fi

   #The process still exist
   touch ${PARTIAL}/alive
   copy_last_wal 

   sleep 10
done

--------------060900090803090101060101
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faqs/FAQ.html

--------------060900090803090101060101--

From pgsql-admin-owner+M15295=pgman=candle.pha.pa.us@postgresql.org Fri Oct 22 06:32:38 2004
Return-path: <pgsql-admin-owner+M15295=pgman=candle.pha.pa.us@postgresql.org>
Received: from svr1.postgresql.org (svr1.postgresql.org [200.46.204.71])
	by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id i9M9Waf18397
	for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 05:32:36 -0400 (EDT)
Received: from localhost (unknown [200.46.204.144])
	by svr1.postgresql.org (Postfix) with ESMTP id 9C9A532AC61
	for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 10:32:32 +0100 (BST)
Received: from svr1.postgresql.org ([200.46.204.71])
	by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
	with ESMTP id 53654-01 for <pgman@candle.pha.pa.us>;
	Fri, 22 Oct 2004 09:32:26 +0000 (GMT)
Received: from postgresql.org (svr1.postgresql.org [200.46.204.71])
	by svr1.postgresql.org (Postfix) with ESMTP id 3132D32AC53
	for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 10:32:32 +0100 (BST)
X-Original-To: pgsql-admin-postgresql.org@localhost.postgresql.org
Received: from localhost (unknown [200.46.204.144])
	by svr1.postgresql.org (Postfix) with ESMTP id DC46E32A095
	for <pgsql-admin-postgresql.org@localhost.postgresql.org>; Fri, 22 Oct 2004 10:23:07 +0100 (BST)
Received: from svr1.postgresql.org ([200.46.204.71])
	by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
	with ESMTP id 49812-03
	for <pgsql-admin-postgresql.org@localhost.postgresql.org>;
	Fri, 22 Oct 2004 09:22:52 +0000 (GMT)
Received: from cmailg3.svr.pol.co.uk (cmailg3.svr.pol.co.uk [195.92.195.173])
	by svr1.postgresql.org (Postfix) with ESMTP id 5AEA6329F2F
	for <pgsql-admin@postgresql.org>; Fri, 22 Oct 2004 10:22:57 +0100 (BST)
Received: from modem-21.monkey.dialup.pol.co.uk ([217.135.208.21] helo=Nightingale)
	by cmailg3.svr.pol.co.uk with smtp (Exim 4.41)
	id 1CKvdM-0005eh-NO; Fri, 22 Oct 2004 10:22:53 +0100
Message-ID: <011a01c4b818$b7370a20$06e887d9@Nightingale>
From: "Simon Riggs" <simon@2ndquadrant.com>
To: "Gaetano Mendola" <mendola@bigfoot.com>,
   "Robert Treat" <xzilla@users.sourceforge.net>, <pgsql-admin@postgresql.org>
cc: <iain@mst.co.jp>
References: <002801c4b739$68450870$7201a8c0@mst1x5r347kymb> <1098384082.15573.14.camel@camel> <41782EEF.5040708@bigfoot.com>
Subject: Re: [ADMIN] replication using WAL archives
Date: Fri, 22 Oct 2004 10:22:54 +0100
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1409
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409
X-Virus-Scanned: by amavisd-new at hub.org
X-Mailing-List: pgsql-admin
Precedence: bulk
Sender: pgsql-admin-owner@postgresql.org
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on 
	candle.pha.pa.us
X-Spam-Status: No, hits=-4.9 required=5.0 tests=BAYES_00 autolearn=ham 
	version=2.61
Status: OR

> Gaetano Mendola wrote
> Postgres can help this process, as suggested by Tom creating a
pg_current_wal()
> or even better having two new GUC parameters: archive_current_wal_command
and
> archive_current_wal_delay.

OK, we can modify the archiver to do this as well as the archive-when-full
functionality. I'd already agreed to do something similar for 8.1

PROPOSAL:
By default, archive_max_delay would be 10 seconds.
By default, archive_current_wal_command is not set.
If archive_current_wal_command is not set, the archiver will archive a file
using archive_command only when the file is full.
If archive_current_wal_command is set, the archiver would archive a file
whichever of these occurs first...
- it is full
- the archive_max_delay timeout occurs (default: disabled)
...as you can see I've renamed archive_current_wal_delay to reflect the fact
that there is an interaction between the current mechanism (only when full)
and this additional mechanism (no longer than X secs between log files).
With that design, if the logs are being created quickly enough, then a
partial log file is never created, only full ones.

When an xlog file is archived because it is full, then it is sent to both
archive_current_wal_command and archive_command (in that order). When the
timeout occurs and we have a partial xlog file, it would only be sent to
archive_current_wal_command. It may also be desirable to not use
archive_command at all, only to use archive_current_wal_command. That's not
currently possible because archive_command is the switch by which all of the
archive functioanlity is enabled, so you can't actually turn this off.

There is already a timeout feature designed into archiver for safety...so I
can make that read the GUCs, above and act accordingly.

There is an unresolved resilience issue: if the archiver goes down (or
whatever does the partial_wal copy functionality) then it it is possible
that users will continue writing to the database and creating xlog records.
It would be up to the user to define how to handle records that had been
committed to the first database in the interim before cutover. It would also
be up to the user to shut down the first node from the second - Shoot the
Other Node in the Head, as its known. All of that is up to the second node,
and as Tom says, is "the hard part"....I'm not proposing to do anything
about that at this stage, since it is implementation dependant.

I was thinking perhaps to move to having variable size xlog files, since
their contents are now variable - no padded records at EOF. If we did that,
then the archiver could simply issue a "switch logfile" and then the
archiver would cut in anyway to copy away the xlog. Having said that it is
lots easier just to put a blind timeout in the archiver and copy the file -
though I'm fairly uneasy about the point that we'd be ignoring the fact that
many people are still writing to it. But I propose doing the easy way....

Thoughts?

= - = - =

Gaetano - skim-reading your script, how do you handle the situation when a
new xlog file has been written within 10 seconds? That way the current file
number will have jumped by 2, so when your script looks for the "Last wal"
using head -1 it will find the N+2 and the intermediate file will never be
copied. Looks like a problem to me...

> I problem I discover during the tests is that if you shut down the spare
node
> and the restore_command is still waiting for a file then the postmaster
will never
> exit  :-(

Hmm....Are you reporting this as a bug for 8.0? It's not on the bug list...

Do we consider that to be desirable or not?

Best Regards, Simon Riggs



---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

From pgsql-admin-owner+M15302=pgman=candle.pha.pa.us@postgresql.org Fri Oct 22 13:56:14 2004
Return-path: <pgsql-admin-owner+M15302=pgman=candle.pha.pa.us@postgresql.org>
Received: from svr1.postgresql.org (svr1.postgresql.org [200.46.204.71])
	by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id i9MGuCf28637
	for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 12:56:13 -0400 (EDT)
Received: from localhost (unknown [200.46.204.144])
	by svr1.postgresql.org (Postfix) with ESMTP id 54E77EAEDAA
	for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 17:55:51 +0100 (BST)
Received: from svr1.postgresql.org ([200.46.204.71])
	by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
	with ESMTP id 86116-09 for <pgman@candle.pha.pa.us>;
	Fri, 22 Oct 2004 16:55:57 +0000 (GMT)
Received: from postgresql.org (svr1.postgresql.org [200.46.204.71])
	by svr1.postgresql.org (Postfix) with ESMTP id 0EC98EAEDA7
	for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 17:55:51 +0100 (BST)
X-Original-To: pgsql-admin-postgresql.org@localhost.postgresql.org
Received: from localhost (unknown [200.46.204.144])
	by svr1.postgresql.org (Postfix) with ESMTP id 5DB98EAEDBE
	for <pgsql-admin-postgresql.org@localhost.postgresql.org>; Fri, 22 Oct 2004 17:45:13 +0100 (BST)
Received: from svr1.postgresql.org ([200.46.204.71])
	by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
	with ESMTP id 82473-08
	for <pgsql-admin-postgresql.org@localhost.postgresql.org>;
	Fri, 22 Oct 2004 16:45:11 +0000 (GMT)
Received: from pns.mm.eutelsat.org (pns.mm.eutelsat.org [194.214.173.227])
	by svr1.postgresql.org (Postfix) with ESMTP id E49F0EAEDB5
	for <pgsql-admin@postgresql.org>; Fri, 22 Oct 2004 17:45:00 +0100 (BST)
Received: from nts-03.mm.eutelsat.org (localhost [127.0.0.1])
	by pns.mm.eutelsat.org (8.11.6/linuxconf) with ESMTP id i9MGh0U26124;
	Fri, 22 Oct 2004 18:43:01 +0200
Received: from [127.0.0.1] (accesspoint.mm.eutelsat.org [194.214.173.4])
	by nts-03.mm.eutelsat.org (8.11.6/linuxconf) with ESMTP id i9MGj5f09681;
	Fri, 22 Oct 2004 18:45:05 +0200
Message-ID: <4179390B.10700@bigfoot.com>
Date: Fri, 22 Oct 2004 18:44:59 +0200
From: Gaetano Mendola <mendola@bigfoot.com>
User-Agent: Mozilla Thunderbird 0.8 (Windows/20040913)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Simon Riggs <simon@2ndquadrant.com>
cc: Robert Treat <xzilla@users.sourceforge.net>, pgsql-admin@postgresql.org,
   iain@mst.co.jp
Subject: Re: [ADMIN] replication using WAL archives
References: <002801c4b739$68450870$7201a8c0@mst1x5r347kymb> <1098384082.15573.14.camel@camel> <41782EEF.5040708@bigfoot.com> <011a01c4b818$b7370a20$06e887d9@Nightingale>
In-Reply-To: <011a01c4b818$b7370a20$06e887d9@Nightingale>
X-Enigmail-Version: 0.86.1.0
X-Enigmail-Supports: pgp-inline, pgp-mime
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: by amavisd-new at hub.org
X-Mailing-List: pgsql-admin
Precedence: bulk
Sender: pgsql-admin-owner@postgresql.org
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on 
	candle.pha.pa.us
X-Spam-Status: No, hits=-4.9 required=5.0 tests=BAYES_00 autolearn=ham 
	version=2.61
Status: OR

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Simon Riggs wrote:
|>Gaetano Mendola wrote
|>Postgres can help this process, as suggested by Tom creating a
|
| pg_current_wal()
|
|>or even better having two new GUC parameters: archive_current_wal_command
|
| and
|
|>archive_current_wal_delay.
|
|
| OK, we can modify the archiver to do this as well as the archive-when-full
| functionality. I'd already agreed to do something similar for 8.1
|
| PROPOSAL:
| By default, archive_max_delay would be 10 seconds.
| By default, archive_current_wal_command is not set.
| If archive_current_wal_command is not set, the archiver will archive a file
| using archive_command only when the file is full.
| If archive_current_wal_command is set, the archiver would archive a file
| whichever of these occurs first...
| - it is full
| - the archive_max_delay timeout occurs (default: disabled)
| ...as you can see I've renamed archive_current_wal_delay to reflect the fact
| that there is an interaction between the current mechanism (only when full)
| and this additional mechanism (no longer than X secs between log files).
| With that design, if the logs are being created quickly enough, then a
| partial log file is never created, only full ones.
|
| When an xlog file is archived because it is full, then it is sent to both
| archive_current_wal_command and archive_command (in that order). When the
| timeout occurs and we have a partial xlog file, it would only be sent to
| archive_current_wal_command. It may also be desirable to not use
| archive_command at all, only to use archive_current_wal_command. That's not
| currently possible because archive_command is the switch by which all of the
| archive functioanlity is enabled, so you can't actually turn this off.

It seems good to me, the script behind archive command can be a nop if someone
want use the archive_current_wal_command


| = - = - =
|
| Gaetano - skim-reading your script, how do you handle the situation when a
| new xlog file has been written within 10 seconds? That way the current file
| number will have jumped by 2, so when your script looks for the "Last wal"
| using head -1 it will find the N+2 and the intermediate file will never be
| copied. Looks like a problem to me...


Yes, the only window failure I seen ( but I don't know if it's possible )

Master:
~        log N created
	log N filled
	archive log N
	log N+1 created
	log N+1 filled
~        log N+2 created
~                   <---- the master die here before to archive the log N+1
~        archive log N+1


in this case as you underline tha last log archived is the N and the N+2
partial wal is added to archived wal collection and in the recovery fase
the recovery stop after processing the log N.

Is it possible that the postmaster create the N+2 file without finish to archive
the N+1 ? ( I suspect yes :-(  )

The only cure I see here is to look for not archived WAL ( if possible ).


|>I problem I discover during the tests is that if you shut down the spare
|>node and the restore_command is still waiting for a file then the postmaster
|>will never exit  :-(
|
|
| Hmm....Are you reporting this as a bug for 8.0? It's not on the bug list...

For me is a behave to avoid.



Regards
Gaetano Mendola







-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBeTkJ7UpzwH2SGd4RAsMxAKCbV7W+wrGBocf2Ftlthm0egAlIWACgp87L
KU/YusyHuvT7jSFwZVKpP3M=
=rWZx
-----END PGP SIGNATURE-----


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

From pgsql-admin-owner+M15303=pgman=candle.pha.pa.us@postgresql.org Fri Oct 22 14:43:36 2004
Return-path: <pgsql-admin-owner+M15303=pgman=candle.pha.pa.us@postgresql.org>
Received: from svr1.postgresql.org (svr1.postgresql.org [200.46.204.71])
	by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id i9MHhZf06453
	for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 13:43:35 -0400 (EDT)
Received: from localhost (unknown [200.46.204.144])
	by svr1.postgresql.org (Postfix) with ESMTP id 01DD2EADBB7
	for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 18:43:13 +0100 (BST)
Received: from svr1.postgresql.org ([200.46.204.71])
	by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
	with ESMTP id 01872-03 for <pgman@candle.pha.pa.us>;
	Fri, 22 Oct 2004 17:43:19 +0000 (GMT)
Received: from postgresql.org (svr1.postgresql.org [200.46.204.71])
	by svr1.postgresql.org (Postfix) with ESMTP id 9E633EADAD4
	for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 18:43:12 +0100 (BST)
X-Original-To: pgsql-admin-postgresql.org@localhost.postgresql.org
Received: from localhost (unknown [200.46.204.144])
	by svr1.postgresql.org (Postfix) with ESMTP id C1133EAED89
	for <pgsql-admin-postgresql.org@localhost.postgresql.org>; Fri, 22 Oct 2004 18:31:20 +0100 (BST)
Received: from svr1.postgresql.org ([200.46.204.71])
	by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
	with ESMTP id 97130-03
	for <pgsql-admin-postgresql.org@localhost.postgresql.org>;
	Fri, 22 Oct 2004 17:31:17 +0000 (GMT)
Received: from cmailm2.svr.pol.co.uk (cmailm2.svr.pol.co.uk [195.92.193.210])
	by svr1.postgresql.org (Postfix) with ESMTP id 276CAEADBBD
	for <pgsql-admin@postgresql.org>; Fri, 22 Oct 2004 18:31:07 +0100 (BST)
Received: from modem-558.snake.dialup.pol.co.uk ([62.137.114.46] helo=[192.168.0.102])
	by cmailm2.svr.pol.co.uk with esmtp (Exim 4.41)
	id 1CL3G3-0001Tx-K5; Fri, 22 Oct 2004 18:31:20 +0100
Subject: Re: [ADMIN] replication using WAL archives
From: Simon Riggs <simon@2ndquadrant.com>
To: Gaetano Mendola <mendola@bigfoot.com>
cc: Robert Treat <xzilla@users.sourceforge.net>, pgsql-admin@postgresql.org,
   iain@mst.co.jp
In-Reply-To: <4179390B.10700@bigfoot.com>
References: <002801c4b739$68450870$7201a8c0@mst1x5r347kymb>
  <1098384082.15573.14.camel@camel> <41782EEF.5040708@bigfoot.com>
  <011a01c4b818$b7370a20$06e887d9@Nightingale>  <4179390B.10700@bigfoot.com>
Content-Type: text/plain
Organization: 2nd Quadrant
Message-ID: <1098466150.20926.13.camel@localhost.localdomain>
MIME-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 (1.4.6-2) 
Date: Fri, 22 Oct 2004 18:29:10 +0100
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: by amavisd-new at hub.org
X-Mailing-List: pgsql-admin
Precedence: bulk
Sender: pgsql-admin-owner@postgresql.org
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on 
	candle.pha.pa.us
X-Spam-Status: No, hits=-4.9 required=5.0 tests=BAYES_00 autolearn=ham 
	version=2.61
Status: OR

On Fri, 2004-10-22 at 17:44, Gaetano Mendola wrote:
> | Gaetano - skim-reading your script, how do you handle the situation when a
> | new xlog file has been written within 10 seconds? That way the current file
> | number will have jumped by 2, so when your script looks for the "Last wal"
> | using head -1 it will find the N+2 and the intermediate file will never be
> | copied. Looks like a problem to me...
> 
> 
> Yes, the only window failure I seen ( but I don't know if it's possible )
> 
> Master:
> ~        log N created
> 	log N filled
> 	archive log N
> 	log N+1 created
> 	log N+1 filled
> ~        log N+2 created
> ~                   <---- the master die here before to archive the log N+1
> ~        archive log N+1
> 
> 
> in this case as you underline tha last log archived is the N and the N+2
> partial wal is added to archived wal collection and in the recovery fase
> the recovery stop after processing the log N.
> 
> Is it possible that the postmaster create the N+2 file without finish to archive
> the N+1 ? ( I suspect yes :-(  )
> 
> The only cure I see here is to look for not archived WAL ( if possible ).
> 

Hmm...well you aren't looking for archived wal, you're just looking at
wal...which is a different thing...

Situation I thought I saw was:

- copy away current partial filled xlog N
- xlog N fills, N+1 starts
- xlog N+1 fills, N+2 starts
- copy away current partial filled xlog: N+2 (+10 secs later)

i.e. if time to fill xlog (is ever) < time to copy away current xlog,
then you miss one.

So problem: you can miss one and never know you've missed one until the
recovery can't find it, which it never returns from...so it just hangs.

[Just so we're all clear: we're talking about Gaetano's script, not the
PostgreSQL archver. The postgresql archiver doesn't do it that way, so
it never misses one.]

-- 
Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

From pgsql-admin-owner+M15306=pgman=candle.pha.pa.us@postgresql.org Fri Oct 22 17:56:07 2004
Return-path: <pgsql-admin-owner+M15306=pgman=candle.pha.pa.us@postgresql.org>
Received: from svr1.postgresql.org (svr1.postgresql.org [200.46.204.71])
	by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id i9MKu6f05264
	for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 16:56:06 -0400 (EDT)
Received: from localhost (unknown [200.46.204.144])
	by svr1.postgresql.org (Postfix) with ESMTP id 4F4C2EAE4AE
	for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 21:55:41 +0100 (BST)
Received: from svr1.postgresql.org ([200.46.204.71])
	by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
	with ESMTP id 62857-05 for <pgman@candle.pha.pa.us>;
	Fri, 22 Oct 2004 20:55:48 +0000 (GMT)
Received: from postgresql.org (svr1.postgresql.org [200.46.204.71])
	by svr1.postgresql.org (Postfix) with ESMTP id 095CEEAE4AC
	for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 21:55:41 +0100 (BST)
X-Original-To: pgsql-admin-postgresql.org@localhost.postgresql.org
Received: from localhost (unknown [200.46.204.144])
	by svr1.postgresql.org (Postfix) with ESMTP id 3FC9BEAE486
	for <pgsql-admin-postgresql.org@localhost.postgresql.org>; Fri, 22 Oct 2004 21:50:48 +0100 (BST)
Received: from svr1.postgresql.org ([200.46.204.71])
	by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
	with ESMTP id 62565-02
	for <pgsql-admin-postgresql.org@localhost.postgresql.org>;
	Fri, 22 Oct 2004 20:50:48 +0000 (GMT)
Received: from news.hub.org (news.hub.org [200.46.204.72])
	by svr1.postgresql.org (Postfix) with ESMTP id 06C49EAE46B
	for <pgsql-admin@postgresql.org>; Fri, 22 Oct 2004 21:50:40 +0100 (BST)
Received: from news.hub.org (news.hub.org [200.46.204.72])
	by news.hub.org (8.12.9/8.12.9) with ESMTP id i9MKolJB062812
	for <pgsql-admin@postgresql.org>; Fri, 22 Oct 2004 20:50:48 GMT
	(envelope-from news@news.hub.org)
Received: (from news@localhost)
	by news.hub.org (8.12.9/8.12.9/Submit) id i9MKoRHh062731
	for pgsql-admin@postgresql.org; Fri, 22 Oct 2004 20:50:27 GMT
From: Gaetano Mendola <mendola@bigfoot.com>
X-Newsgroups: comp.databases.postgresql.admin
Subject: Re: [ADMIN] replication using WAL archives
Date: Fri, 22 Oct 2004 22:50:34 +0200
Organization: PYRENET Midi-pyrenees Provider
Lines: 39
Message-ID: <4179729A.5020401@bigfoot.com>
References: <002801c4b739$68450870$7201a8c0@mst1x5r347kymb>	 <1098384082.15573.14.camel@camel> <41782EEF.5040708@bigfoot.com>	 <011a01c4b818$b7370a20$06e887d9@Nightingale>  <4179390B.10700@bigfoot.com> <1098466150.20926.13.camel@localhost.localdomain>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@pyrenet.fr
To: Simon Riggs <simon@2ndquadrant.com>
User-Agent: Mozilla Thunderbird 0.8 (Windows/20040913)
X-Accept-Language: en-us, en
In-Reply-To: <1098466150.20926.13.camel@localhost.localdomain>
X-Enigmail-Version: 0.86.1.0
X-Enigmail-Supports: pgp-inline, pgp-mime
To: pgsql-admin@postgresql.org
X-Virus-Scanned: by amavisd-new at hub.org
X-Mailing-List: pgsql-admin
Precedence: bulk
Sender: pgsql-admin-owner@postgresql.org
X-Virus-Scanned: by amavisd-new at hub.org
X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on 
	candle.pha.pa.us
X-Spam-Status: No, hits=-4.9 required=5.0 tests=BAYES_00 autolearn=ham 
	version=2.61
Status: OR

Simon Riggs wrote:

 > Situation I thought I saw was:
 >
 > - copy away current partial filled xlog N
 > - xlog N fills, N+1 starts
 > - xlog N+1 fills, N+2 starts
 > - copy away current partial filled xlog: N+2 (+10 secs later)
 >
 > i.e. if time to fill xlog (is ever) < time to copy away current xlog,
 > then you miss one.
 >
 > So problem: you can miss one and never know you've missed one until the
 > recovery can't find it, which it never returns from...so it just hangs.

No. The restore.sh is not smart enough to know the last wal that must be
replayed, the only "smart thing" is to copy the supposed "current wal" in the
archive directory.

The script hang (and is a feature not a bug) if and only if the master is alive
( at least I'm not seeing any other hang ).

In your example in the archived directory will be present the files until logN
and logN+2 ( the current wal ) is in the partial directory, if the master die,
the restore.sh will copy logN+2 in the archived directory, the spare node will
execute restore.sh with file logN+1 as argument and if is not found then the
restore.sh will exit.


Regards
Gaetano Mendola









---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend