Skip to content

Commit 7d68f22

Browse files
committed
Make PostgresNode.pm check server status more carefully.
PostgresNode blithely ignored the exit status of pg_ctl, and in general made no effort to be sure that the server was running when it should be. This caused it to miss server crashes, which is a serious shortcoming in a test scaffold. Make it complain if pg_ctl fails, and modify the start and stop logic to complain if the server doesn't start, or doesn't stop, when expected. Also, have it turn off the "restart_after_crash" configuration parameter in created clusters, as bitter experience has shown that leaving that on can mask crashes too. We might at some point need variant functions that allow for, eg, server start failure to be expected. But no existing test case appears to want that, and it surely shouldn't be the default behavior. Note that this *will* break the buildfarm, as it will expose known bugs that the previous testing failed to. I'm committing it despite that, to verify that we get the expected failures in the buildfarm not just in manual testing. Back-patch into 9.6 where PostgresNode was introduced. (The 9.6 branch is not expected to show any failures.) Discussion: https://postgr.es/m/21432.1492886428@sss.pgh.pa.us
1 parent 8a19c1a commit 7d68f22

File tree

1 file changed

+25
-16
lines changed

1 file changed

+25
-16
lines changed

src/test/perl/PostgresNode.pm

+25-16
Original file line numberDiff line numberDiff line change
@@ -402,6 +402,7 @@ sub init
402402
open my $conf, '>>', "$pgdata/postgresql.conf";
403403
print $conf "\n# Added by PostgresNode.pm\n";
404404
print $conf "fsync = off\n";
405+
print $conf "restart_after_crash = off\n";
405406
print $conf "log_line_prefix = '%m [%p] %q%a '\n";
406407
print $conf "log_statement = all\n";
407408
print $conf "port = $port\n";
@@ -644,18 +645,19 @@ sub start
644645
my $port = $self->port;
645646
my $pgdata = $self->data_dir;
646647
my $name = $self->name;
648+
BAIL_OUT("node \"$name\" is already running") if defined $self->{_pid};
647649
print("### Starting node \"$name\"\n");
648650
my $ret = TestLib::system_log('pg_ctl', '-D', $self->data_dir, '-l',
649651
$self->logfile, 'start');
650652

651653
if ($ret != 0)
652654
{
653-
print "# pg_ctl failed; logfile:\n";
655+
print "# pg_ctl start failed; logfile:\n";
654656
print TestLib::slurp_file($self->logfile);
655-
BAIL_OUT("pg_ctl failed");
657+
BAIL_OUT("pg_ctl start failed");
656658
}
657659

658-
$self->_update_pid;
660+
$self->_update_pid(1);
659661
}
660662

661663
=pod
@@ -664,6 +666,10 @@ sub start
664666
665667
Stop the node using pg_ctl -m $mode and wait for it to stop.
666668
669+
Note: if the node is already known stopped, this does nothing.
670+
However, if we think it's running and it's not, it's important for
671+
this to fail. Otherwise, tests might fail to detect server crashes.
672+
667673
=cut
668674

669675
sub stop
@@ -675,9 +681,8 @@ sub stop
675681
$mode = 'fast' unless defined $mode;
676682
return unless defined $self->{_pid};
677683
print "### Stopping node \"$name\" using mode $mode\n";
678-
TestLib::system_log('pg_ctl', '-D', $pgdata, '-m', $mode, 'stop');
679-
$self->{_pid} = undef;
680-
$self->_update_pid;
684+
TestLib::system_or_bail('pg_ctl', '-D', $pgdata, '-m', $mode, 'stop');
685+
$self->_update_pid(0);
681686
}
682687

683688
=pod
@@ -695,7 +700,7 @@ sub reload
695700
my $pgdata = $self->data_dir;
696701
my $name = $self->name;
697702
print "### Reloading node \"$name\"\n";
698-
TestLib::system_log('pg_ctl', '-D', $pgdata, 'reload');
703+
TestLib::system_or_bail('pg_ctl', '-D', $pgdata, 'reload');
699704
}
700705

701706
=pod
@@ -714,9 +719,9 @@ sub restart
714719
my $logfile = $self->logfile;
715720
my $name = $self->name;
716721
print "### Restarting node \"$name\"\n";
717-
TestLib::system_log('pg_ctl', '-D', $pgdata, '-l', $logfile,
718-
'restart');
719-
$self->_update_pid;
722+
TestLib::system_or_bail('pg_ctl', '-D', $pgdata, '-l', $logfile,
723+
'restart');
724+
$self->_update_pid(1);
720725
}
721726

722727
=pod
@@ -735,8 +740,8 @@ sub promote
735740
my $logfile = $self->logfile;
736741
my $name = $self->name;
737742
print "### Promoting node \"$name\"\n";
738-
TestLib::system_log('pg_ctl', '-D', $pgdata, '-l', $logfile,
739-
'promote');
743+
TestLib::system_or_bail('pg_ctl', '-D', $pgdata, '-l', $logfile,
744+
'promote');
740745
}
741746

742747
# Internal routine to enable streaming replication on a standby node.
@@ -814,22 +819,26 @@ archive_command = '$copy_command'
814819
# Internal method
815820
sub _update_pid
816821
{
817-
my $self = shift;
822+
my ($self, $is_running) = @_;
818823
my $name = $self->name;
819824

820825
# If we can open the PID file, read its first line and that's the PID we
821-
# want. If the file cannot be opened, presumably the server is not
822-
# running; don't be noisy in that case.
826+
# want.
823827
if (open my $pidfile, '<', $self->data_dir . "/postmaster.pid")
824828
{
825829
chomp($self->{_pid} = <$pidfile>);
826830
print "# Postmaster PID for node \"$name\" is $self->{_pid}\n";
827831
close $pidfile;
832+
833+
# If we found a pidfile when there shouldn't be one, complain.
834+
BAIL_OUT("postmaster.pid unexpectedly present") unless $is_running;
828835
return;
829836
}
830837

831838
$self->{_pid} = undef;
832-
print "# No postmaster PID\n";
839+
print "# No postmaster PID for node \"$name\"\n";
840+
# Complain if we expected to find a pidfile.
841+
BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running;
833842
}
834843

835844
=pod

0 commit comments

Comments
 (0)