Databases 24 min read

Greenplum Segment Failure Diagnosis and Recovery Procedures

This article explains how to simulate and diagnose segment failures in a Greenplum cluster, including identifying master, segment, and tablespace issues, generating recovery configuration files, and using gprecoverseg and gpstate commands to restore segment roles and ensure all nodes are operational.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Greenplum Segment Failure Diagnosis and Recovery Procedures

Greenplum clusters consist of master and segment servers, and failures can be categorized as master, segment, or data anomalies. This article focuses on diagnosing and resolving segment failures.

Local fault simulation

Two scenarios are demonstrated: (1) segment failure and (2) tablespace failure. The following commands are used to inspect the cluster state.

[gpadmin@master ~]$ gpstate
20221127:22:39:00:022659 gpstate:master:gpadmin-[INFO]:-Starting gpstate with args:
... (output truncated for brevity) ...
[gpadmin@master ~]$ gpstate -m
20221127:22:44:55:023196 gpstate:master:gpadmin-[INFO]:-Starting gpstate with args: -m
... (output truncated for brevity) ...

For the tablespace fault, the problematic tablespace directory is removed:

[gpadmin@data05 ~]$ cd /greenplum/gpdata/mirror/gpseg10
[gpadmin@data05 gpseg10]$ ls
... (directory listing) ...
[gpadmin@data05 gpseg10]$ rm -rf pg_tblspc/

After reproducing the failures, the recovery process involves generating a configuration file with gprecoverseg -o and applying it with gprecoverseg -i ... -a . The cluster status is then verified using gpstate -e and psql queries.

[gpadmin@master ~]$ gprecoverseg -o ./recover1
20221127:22:48:41:023405 gprecoverseg:master:gpadmin-[INFO]:-Starting gprecoverseg with args: -o ./recover1
... (output truncated) ...
[gpadmin@master ~]$ more recover1
data05|55000|/greenplum/gpdata/primary/gpseg12
data05|55001|/greenplum/gpdata/primary/gpseg13
data05|55002|/greenplum/gpdata/primary/gpseg14
data05|55003|/greenplum/gpdata/primary/gpseg19
[gpadmin@master ~]$ gprecoverseg -i ./recover1 -a
[gpadmin@master ~]$ gpstate -e
20221127:22:56:57:024771 gpstate:master:gpadmin-[INFO]:-All segments are running normally

The segment mirroring status report shows all segments up, though some roles may be swapped. The role correction is performed with gprecoverseg -r , followed by another status check.

[gpadmin@master ~]$ gprecoverseg -r
[gpadmin@master ~]$ gpstate -e
... (final status output confirming all segments up) ...

For the tablespace issue, a manual recovery file can be created and applied similarly:

[gpadmin@master ~]$ vi recover2
data05|56001|/greenplum/gpdata/mirror/gpseg10
[gpadmin@master ~]$ gprecoverseg -i ./recover2 -a
... (recovery output) ...

Final checks confirm that all segments are running normally and data is consistent across nodes.

[gpadmin@master ~]$ psql -c "select * from gp_segment_configuration order by content asc,dbid;"
... (configuration table output) ...
Database RecoveryGreenplumgprecoverseggpstateSegment Failure
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.