How to Kill session or Cancel SQL query on Oracle , MySQL, PostgreSQL(数据库比较系列八)
数据库维护过程中难免会遇到一些不正常的SQL或会话进程正在占用系统大量资源,临时需要终止查询或kill会话,在Oracle, MySQL, Postgresql数据库中不同的操作。
Oracle
KILL会话的基本语法。
SQL> ALTER SYSTEM KILL SESSION 'sid,serial#';
在 RAC 环境中,您可以选择指定INST_ID。这允许您终止不同 RAC 节点上的会话。
SQL> ALTER SYSTEM KILL SESSION 'sid,serial#,@inst_id';
除了上述语法之外,您还可以添加IMMEDIATE子句。
SQL> ALTER SYSTEM KILL SESSION 'sid,serial#' IMMEDIATE;
这不会影响命令执行的工作,但它会立即将控制权返回给当前会话,而不是等待确认终止。
该ALTER SYSTEM DISCONNECT SESSION语法是杀死 Oracle 会话的另一种方法。与KILL SESSION要求会话杀死自己的命令不同,该DISCONNECT SESSION命令杀死专用服务器进程
相关会话的SID和SERIAL#值可以替换为以下语句之一。
SQL> ALTER SYSTEM DISCONNECT SESSION 'sid,serial#' POST_TRANSACTION; SQL> ALTER SYSTEM DISCONNECT SESSION 'sid,serial#' IMMEDIATE;
该POST_TRANSACTION子句在断开会话之前等待正在进行的事务完成,而该IMMEDIATE子句断开会话并且正在进行的事务立即回滚。
操作系统层
SQL> select spid from v$process where addr in( select paddr from v$session where sid=1); SPID ------------------------ 7110 $ kill -9 7110
How to release still “killed“ status session in v$session? (释放killed的session) (二)记录”killed” 状态有时需要creator_addr字段 select spid from v$process where addr in( select CREATOR_ADDR from v$session where sid=1); ps -ef|grep xx kill -9 要在 Windows 操作系统上终止会话,首先识别会话,然后将相关的值替换为SID
从命令行SPID
发出的以下命令。
C:\> orakill ORACLE_SID spid
终止查询
该ALTER SYSTEM CANCEL SQL
命令是在 Oracle Database 18c 中引入的,用于取消会话中的 SQL 语句,而不用终止会话连接。该ALTER SYSTEM CANCEL SQL
语句的基本语法如下所示:
ALTER SYSTEM CANCEL SQL 'SID, SERIAL[, @INST_ID][, SQL_ID]';
# session a SQL> conn / as sysdba Connected. USERNAME INST_NAME HOST_NAME I# SID SERIAL# VERSION STARTED SPID OPID CPID SADDR PADDR -------------------- -------------------- ------------------------- --- ----- -------- ---------- -------- ---------- ----- --------------- ---------------- ---------------- SYS CDB$ROOT-anbob19c oel7db1 1 1 51226 19.0.0.0.0 20220707 20124 45 7109 0000000078481028 0000000079107FC8 SQL> select count(*) from dba_objects,dba_objects,dba_objects; select count(*) from dba_objects,dba_objects,dba_objects * ERROR at line 1: ORA-01013: user requested cancel of current operation SQL> select sid from v$mystat where rownum=1; SID ---------- 1 # session b SQL> @usid 1 USERNAME SID AUDSID OSUSER MACHINE PROGRAM SPID OPID CPID SQL_ID HASH_VALUE LASTCALL STATUS SADDR PADDR TADDR LOGON_TIME ----------------------- -------------- ----------- ---------------- ------------------ -------------------- -------------- ------ ------------------------ --------------- ----------- ---------- -------- ---------------- ---------------- ---------------- ------------------- SYS '1,51226' 4294967295 oracle oel7db1 (TNS V1-V3) 20124 45 7109 f5kskn9df2h2p 1524711509 48 ACTIVE 0000000078481028 0000000079107FC8 2022-07-07 14:50:53 SQL> alter system cancel sql '1,51226,f5kskn9df2h2p'; System altered.
MySQL
查询mySQL进程
To see what thread it is then kill it:
KILL [CONNECTION | QUERY] thread_id;
也可以查询information_schema.processlist,与show processlist等同
mysql> select * from information_schema.processlist; +----+-----------------+-----------+--------------------+---------+------+------------------------+----------------------------------------------+ | ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO | +----+-----------------+-----------+--------------------+---------+------+------------------------+----------------------------------------------+ | 8 | root | localhost | NULL | Query | 381 | User sleep | select sleep(1000) | | 9 | root | localhost | performance_schema | Query | 0 | executing | select * from information_schema.processlist | | 5 | event_scheduler | localhost | NULL | Daemon | 401 | Waiting on empty queue | NULL | +----+-----------------+-----------+--------------------+---------+------+------------------------+----------------------------------------------+ 3 rows in set (0.00 sec) mysql> show processlist; +----+-----------------+-----------+--------------------+---------+------+------------------------+--------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-----------------+-----------+--------------------+---------+------+------------------------+--------------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 434 | Waiting on empty queue | NULL | | 8 | root | localhost | NULL | Query | 414 | User sleep | select sleep(1000) | | 9 | root | localhost | performance_schema | Query | 0 | starting | show processlist | +----+-----------------+-----------+--------------------+---------+------+------------------------+--------------------+ 3 rows in set (0.00 sec) mysql> SELECT CONNECTION_ID(); +-----------------+ | CONNECTION_ID() | +-----------------+ | 9 | +-----------------+ 1 row in set (0.00 sec)
终止连接
mysql> kill 8; Query OK, 0 rows affected (0.00 sec) mysql> select sleep(1000); ERROR 2013 (HY000): Lost connection to MySQL server during query
终止查询
mysql> kill query 8; Query OK, 0 rows affected (0.00 sec) mysql> select sleep(1000); +-------------+ | sleep(1000) | +-------------+ | 1 | +-------------+ 1 row in set (9.27 sec)
MYSQL PROCESSLISTID, MYSQL_OS_THREAD, MYSQL THREAD关系
[root@oel7db1 ~]# ps -ef|grep mysql root 1940 1 0 20:38 ? 00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/usr/local/mysql/data --pid-file=/usr/local/mysql/data/oel7db1.pid mysql 2027 1940 1 20:38 ? 00:00:08 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=oel7db1.err --pid-file=/usr/local/mysql/data/oel7db1.pid root 2089 1897 0 20:38 pts/1 00:00:00 mysql -uroot -px xxxxxxxxxxx -hlocalhost root 2138 2114 0 20:39 pts/2 00:00:00 mysql -uroot -px xxxxxxxxxxx -hlocalhost [root@oel7db1 ~]# ps -elT|head -n 1|cat -n ;ps -elT|grep 2027|cat -n 1 F S UID PID SPID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 1 4 S 997 2027 2027 1940 0 80 0 - 323127 poll_s ? 00:00:01 mysqld 2 1 S 997 2027 2036 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld 3 1 S 997 2027 2037 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld 4 1 S 997 2027 2038 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld 5 1 S 997 2027 2039 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld 6 1 S 997 2027 2040 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld 7 1 S 997 2027 2041 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld 8 1 S 997 2027 2042 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld 9 1 S 997 2027 2043 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld 10 1 S 997 2027 2044 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld 11 1 S 997 2027 2045 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld 12 1 S 997 2027 2046 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 13 1 S 997 2027 2049 1940 0 80 0 - 323127 futex_ ? 00:00:06 mysqld 14 1 S 997 2027 2050 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 15 1 S 997 2027 2051 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 16 1 S 997 2027 2052 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 17 1 S 997 2027 2053 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 18 1 S 997 2027 2054 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 19 1 S 997 2027 2056 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 20 1 S 997 2027 2057 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 21 1 S 997 2027 2058 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 22 1 S 997 2027 2061 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 23 1 S 997 2027 2062 1940 0 80 0 - 323127 hrtime ? 00:00:00 mysqld 24 1 S 997 2027 2063 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 25 1 S 997 2027 2064 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 26 1 S 997 2027 2066 1940 0 80 0 - 323127 futex_ ? 00:00:00 xpl_worker1 27 1 S 997 2027 2067 1940 0 80 0 - 323127 futex_ ? 00:00:00 xpl_worker0 28 1 S 997 2027 2068 1940 0 80 0 - 323127 ep_pol ? 00:00:00 mysqld 29 1 S 997 2027 2072 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 30 1 S 997 2027 2073 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 31 1 S 997 2027 2074 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 32 1 S 997 2027 2075 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 33 1 S 997 2027 2076 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 34 1 S 997 2027 2077 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 35 1 S 997 2027 2079 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 36 1 S 997 2027 2080 1940 0 80 0 - 323127 do_sig ? 00:00:00 mysqld 37 1 S 997 2027 2081 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 38 1 S 997 2027 2083 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 39 1 S 997 2027 2090 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld 40 1 S 997 2027 2139 1940 0 80 0 - 323127 poll_s ? 00:00:00 mysqld
mysql会为每个connection创建一个对应mysql thread,连接关闭后,mysql thread生命周期也终止。这个mysql thread可以在processlist、threads表中查看 每个mysql threard将与一个os thread关联在一起,mysql thread销毁后,os thread不会被销毁,可以继续给其他mysql thread使用 如果所有os thread都被mysql thread用光了,下一个connection请求时将会创建新的os thread
mysql> select thread_id,thread_os_id,name,type,PROCESSLIST_ID from performance_schema.threads where PROCESSLIST_ID is not null; +-----------+--------------+--------------------------------+------------+----------------+ | thread_id | thread_os_id | name | type | PROCESSLIST_ID | +-----------+--------------+--------------------------------+------------+----------------+ | 43 | 2079 | thread/sql/event_scheduler | FOREGROUND | 5 | | 45 | 2083 | thread/sql/compress_gtid_table | FOREGROUND | 7 | | 48 | 2090 | thread/sql/one_connection | FOREGROUND | 8 | | 49 | 2139 | thread/sql/one_connection | FOREGROUND | 9 | | 50 | 3309 | thread/sql/one_connection | FOREGROUND | 10 | +-----------+--------------+--------------------------------+------------+----------------+ 5 rows in set (0.00 sec) [root@oel7db1 ~]# gdb attach 2027 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7 Copyright (C) 2013 Free Software Foundation, Inc. (gdb) info threads Id Target Id Frame 41 Thread 0x7fe1fe5d9700 (LWP 2036) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1 40 Thread 0x7fe1fd3cc700 (LWP 2037) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1 39 Thread 0x7fe1fcbcb700 (LWP 2038) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1 38 Thread 0x7fe1f7fff700 (LWP 2039) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1 37 Thread 0x7fe1f77fe700 (LWP 2040) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1 36 Thread 0x7fe1f6ffd700 (LWP 2041) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1 35 Thread 0x7fe1f67fc700 (LWP 2042) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1 34 Thread 0x7fe1f5ffb700 (LWP 2043) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1 33 Thread 0x7fe1f57fa700 (LWP 2044) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1 32 Thread 0x7fe1f4ff9700 (LWP 2045) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1 31 Thread 0x7fe1e7cef700 (LWP 2046) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 30 Thread 0x7fe1e54ec700 (LWP 2049) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 ---Type to continue, or q to quit--- 29 Thread 0x7fe1e4ceb700 (LWP 2050) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 28 Thread 0x7fe1e44ea700 (LWP 2051) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 27 Thread 0x7fe1e3ce9700 (LWP 2052) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 26 Thread 0x7fe1e34e8700 (LWP 2053) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 25 Thread 0x7fe1e2ce7700 (LWP 2054) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 24 Thread 0x7fe1e24e6700 (LWP 2056) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 23 Thread 0x7fe1e1ce5700 (LWP 2057) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 22 Thread 0x7fe1e14e4700 (LWP 2058) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 21 Thread 0x7fe1e0ce3700 (LWP 2061) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 20 Thread 0x7fe1e04e2700 (LWP 2062) "mysqld" 0x00007fe20d5abe9d in nanosleep () from /lib64/libpthread.so.0 19 Thread 0x7fe1dfce1700 (LWP 2063) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 18 Thread 0x7fe1df4e0700 (LWP 2064) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 17 Thread 0x7fe1fdb74700 (LWP 2066) "xpl_worker1" 0x00007fe20d5a8de2 in pthr---Type to continue, or q to quit--- ead_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 16 Thread 0x7fe1fdb2d700 (LWP 2067) "xpl_worker0" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 15 Thread 0x7fe1fdae6700 (LWP 2068) "mysqld" 0x00007fe20b909be9 in syscall () from /lib64/libc.so.6 14 Thread 0x7fe1ff8ca700 (LWP 2072) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 13 Thread 0x7fe1decdf700 (LWP 2073) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 12 Thread 0x7fe1de4de700 (LWP 2074) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 11 Thread 0x7fe1ddcdd700 (LWP 2075) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 10 Thread 0x7fe1dd4dc700 (LWP 2076) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 9 Thread 0x7fe1dccdb700 (LWP 2077) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 8 Thread 0x7fe1fda9f700 (LWP 2079) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 7 Thread 0x7fe1fda57700 (LWP 2080) "mysqld" 0x00007fe20b84857a in sigwaitinfo () from /lib64/libc.so.6 6 Thread 0x7fe1fda0f700 (LWP 2081) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 5 Thread 0x7fe1c7fff700 (LWP 2083) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 ---Type to continue, or q to quit--- 4 Thread 0x7fe1fc3a9700 (LWP 2090) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 Thread 0x7fe1fc361700 (LWP 2139) "mysqld" 0x00007fe20b904cef in ppoll () from /lib64/libc.so.6 2 Thread 0x7fe1fc219700 (LWP 3309) "mysqld" 0x00007fe20b904cef in ppoll () from /lib64/libc.so.6 * 1 Thread 0x7fe20d96a840 (LWP 2027) "mysqld" 0x00007fe20b904c2d in poll () from /lib64/libc.so.6 (gdb)
因为MYSQL是线程模式,用户进程是mysql进程中的一个线程,线程是进程不可分割的一部分,不能在进程之外被杀死。有pthread_kill函数但它仅适用于线程本身的上下文。请注意, pthread_kill() 仅导致在给定线程的上下文中处理信号;信号动作(终止或停止)会影响整个进程。如果kill 线程会导致mysql server crash重启。
(gdb) call pthread_kill(6307)
PostgreSQL
PostgreSQL pg_terminate_backend 和 pg_cancel_backend 它们用于终止正在运行的查询或会话。PostgreSQL 维护任务期间需要这个脚本,我们需要关闭所有连接和会话。
通过指定数据库名称终止所有正在运行的连接的脚本:
SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE datname = 'datbase_name' AND pid <> pg_backend_pid();
终止当前数据库的所有正在运行的连接的脚本:
SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE datname = current_database() AND pid <> pg_backend_pid();
pg_cancel_backend():
如果你想杀死那些长时间运行的查询,可使用 pg_cancel_backend() 来杀死它。取消一个查询而不破坏连接,停止其他查询。
pg_terminate_backend():
它终止整个进程和数据库连接。
最佳实践是:
首先找到长时间运行的查询及其进程 id (pid),使用 pg_cancel_backend 取消这些查询,如果它没有释放,你应该使用 pg_terminate_backend。
也可以使用自带的pg_ctl工具:
# pg_ctl kill SIGNALNAME PID
Allowed signal names for kill:
ABRT HUP INT KILL QUIT TERM USR1 USR2
如
# pg_ctl kill TERM 1234
不要是使用kill -9
The PostgreSQL architecture works like this: when you start PostgreSQL you are starting a process called postmaster. Whenever a new connection comes in, this postmaster forks and creates a so-called backend process (BE). This process is in charge of handling exactly one connection. In a working system, you might see hundreds of processes serving hundreds of users. The important thing here is that all of those processes are synchronized through some common chunk of memory (traditionally, shared memory, and in the more recent versions, mapped memory), and all of them have access to this chunk. What might happen if a database connection or any other process in the PostgreSQL infrastructure is killed with kill -9? A process modifying this common chunk of memory might die while making a change. The process killed cannot defend itself against the onslaught, so who can guarantee that the shared memory is not corrupted due to the interruption?
This is exactly when the postmaster steps in. It ensures that one of these backend processes has died unexpectedly. To prevent the potential corruption from spreading, it kills every other database connection, goes into recovery mode, and fixes the database instance. Then new database connections are allowed again.
While this makes a lot of sense, it can be quite disturbing to those users who are connected to the database system. Therefore, it is highly recommended not to use kill -9. A normal kill will be fine.
# check OS pid anbob=# select datname,pid,usename,state from pg_stat_activity; [root@oel7db1 shm]# ps -ef|grep local postgres 2627 1870 0 15:13 ? 00:00:00 postgres: postgres anbob [local] idle postgres 2634 1870 0 15:13 ? 00:00:00 postgres: postgres anbob [local] idle root 3148 2035 0 15:23 pts/2 00:00:00 grep --color=auto local [root@oel7db1 shm]# [root@oel7db1 shm]# kill -9 2627 [root@oel7db1 shm]# # on session anbob=# select * from tt; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. anbob=# ## another session anbob=# select * from tt; WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. anbob=# [root@oel7db1 shm]# ps -ef|grep local postgres 3172 1870 0 15:23 ? 00:00:00 postgres: postgres anbob [local] idle postgres 3176 1870 0 15:24 ? 00:00:00 postgres: postgres anbob [local] idle root 3446 2035 0 15:28 pts/2 00:00:00 grep --color=auto local [root@oel7db1 shm]# kill 3172 [root@oel7db1 shm]# ## one session anbob=# select * from tt; FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. anbob=# ## another session anbob=# select * from tt; id | name | language ----+------+---------- | a | | b | (2 rows)
在数据库PostgreSQL中,一个客户端到服务器连接实际上是一个tcp socket连接,tcp连接是虚连接,一方非正常退出(如断电),另一方会继续维持这个连接。
最近遇到一个案例1个posgresql 正在跑的一个存储过程进程无法kill, pg_terminate_backend()无法终止,使用pstack 查看进程在调用plugin_debugger,strace -tt -p pid 发现一直在loop 执行recvform(39, xxxx),等待接受文件句柄fd 39, ls -l /proc/pid/fd/39 发现fd 39指向一个 socket连接
查看进程对应用tcp socket
进程连接处理CLOSE_WAIT,我们知道TCP建立一个连接需要三次握手,而终止一个连接要经过四次挥手, 通常来讲,CLOSE_WAIT状态的持续时间应该很短,为被动关闭连接,主要原因是某种情况下对方关闭了socket链接,但是我方忙与读或者写,没有关闭连接。
注意因为postgresql没有Pmon进程(or SMON)没有办法在kill进程后释放资源,所以即使PG是进程模式,也不建议通过OS kill进程, 防止PG SERVER crash. 解决这个问题有一个小技巧,可以再不kill进程的前提下,释放SOCKET。
1,找到进程,或netstat根据port找进程
2, lsof 根据进程找socket连接,找到FD文件描述符
3,使用gdb –p PID 连接到进程
4,关闭 Socket 连接
(gdb) call close(#fd)
这样就结束了案例中recvform的循环,会话报错后退出。
对不起,这篇文章暂时关闭评论。