MC/SG TOC system due to heavy CPU loading on directory scan.
NFSv3 server nfsktcpd (multi-threaded nfs/tcp server kernel daemon)
takes much CPU times and following MC/SG TOC system due to safety timer
timedout.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
|
System crash dump analysis report
=================================
Symptom
=======
NFSv3 server nfsktcpd (multi-threaded nfs/tcp server kernel daemon)
takes much CPU times and following MC/SG TOC system due to safety timer
timedout.
Root cause
==========
Cmcld is starved for cpu, but cpu spends too much time in vx_dirscan()
Action plan
===========
Directory rearrangement (to reduce number of files in each directories)
are suggested. (/dc_home/[mntpt]/ft/summary/fleft/summary)
Detail analysis
===============
=======================
= General Information =
=======================
Dump time Sat May 17 12:33:17 2008 UTC-8
System has been up 1 day, 15 hours, 46 minutes.
Node Name : localhost
Model : server rx2620
BIOS revision : 04.27
HP-UX version : B.11.23
Kernel whatstring : @(#) $Revision: vmunix:
B11.23_LR FLAVOR=perf Fri Aug 29 22:35:38 PDT 2003 $
Number of CPU's : 2
Disabled CPU's : 0
CPU type : IA64 (1.6 Ghz)
CPU Architecture : IA-64 0
Load average : 36.88 21.37 10.69
================
= Crash Events =
================
This is a Service Guard INIT !
Stack Trace for crash event 0
=============================
============== EVENT ============================
= Event #0 is CT_INIT on CPU #0;
= p crash_event_t 0xe000000100007000
= p rpb_t 0xe000000100fb8090
= Process at 0xe00000012d752000, pid 10, "nfsktcpd"
============== EVENT ============================
RR0=0x00ffff31 RR1=0x00ffff31 RR2=0x00ffff31 RR3=0x00ffff31
RR4=0x07ff0031 RR5=0x00ffff31 RR6=0x00ffff31 RR7=0x00dead31
BSP SP IP
0x9fffffff7f729490 0x9fffffff7f7274d0 0xe00000000068d211 vx_dirscan+0x4d1
0x9fffffff7f7293b0 0x9fffffff7f727510 0xe000000000664e80 vx_dirlook+0x360
0x9fffffff7f729300 0x9fffffff7f727530 0xe000000000633c90 vx_lookup+0x310
0x9fffffff7f729270 0x9fffffff7f727550 0xe0000000012e5050 rfs3_lookup+0x190
0x9fffffff7f729198 0x9fffffff7f727690 0xe0000000012f1a00 rfsexp_dispatch+0x460
0x9fffffff7f729100 0x9fffffff7f727980 0xe000000001428130 tcp_svc_getreq+0x1f0
0x9fffffff7f729010 0x9fffffff7f727a30 0xe000000000733c80 tcp_svc_run+0x3e0
0x9fffffff7f729000 0x9fffffff7f727bf0 0xe0000000013c6f50 kthread_daemon_startup+0x40
------------------------------------------------------
tid = 279095 kt_stat = TSRUN
local operation on vnodep = 0xe00000016f6705c0
(/dc_home/[mntpt]/ft/summary/fleft/summary)
local vnode belongs to vfsp = 0xe00000012a8ddac0 mnt_pt = '/dc_home'
local component being looked up =
'FLEX04_SC74899ADW_Hot_FT_BRX_SC74899ADW_W_HOT_XG632AQU00_FINAL_TEST1_95'C_Aug19_21_33_SUM.txt.Z'
SVCXPRT used = 0xe000000163ff2040
NFS filehandle corresponds to inode = 14534 of filesystem '/dc_home'
vfsp = 0xe00000012a8ddac0
--------------------------------------------------------
Processor #1
============== EVENT ============================
= Event #1 is CT_INIT on CPU #1;
= p crash_event_t 0xe000000100007030
= p rpb_t 0xe0000001015fb4c0
= Process at 0xe00000012d752000, pid 10, "nfsktcpd"
============== EVENT ============================
RR0=0x00ffff31 RR1=0x00ffff31 RR2=0x00ffff31 RR3=0x00ffff31
RR4=0x07ff0031 RR5=0x00ffff31 RR6=0x00ffff31 RR7=0x00dead31
BSP SP IP
0xe0000001017c47e8 0xe0000001017d7ae0 0xe000000001465df0 send_INIT_monarch+0x70
0xe0000001017c47d0 0xe0000001017d7ae0 0xe000000001465e40 Send_Monarch_TOC+0x40
0xe0000001017c4760 0xe0000001017d7ae0 0xe000000001062b20 safety_time_check+0x280
0xe0000001017c4740 0xe0000001017d7b10 0xe0000000006acb80 perform_clock_ints+0x60
0xe0000001017c4630 0xe0000001017d7b10 0xe0000000006acd90 per_spu_hardclock+0x1d0
0xe0000001017c4500 0xe0000001017d7b60 0xe0000000006aeae0 clock_int+0xd20
0xe0000001017c43e8 0xe0000001017d7bc0 0xe000000000800610 external_interrupt+0x3b0
0xe0000001017c43c0 0xe0000001017d7bf0 0xe000000000c2c780 bubbleup+0x880
+------------- TRAP ----------------------------
| External Interrupt in KERNEL mode
| IIP=0xe0000000016aa0b0:0
| p struct save_state 0xdead31.0xe0000001017d7c00
+------------- TRAP ----------------------------
BSP SP IP
0x9fffffff7f7c9af0 0x9fffffff7f7c7310 0xe0000000016aa0b0 vx_bread_typed+0x80
0x9fffffff7f7c9978 0x9fffffff7f7c7310 0xe00000000077dda0 vx_do_bmap_typed+0xaa0
0x9fffffff7f7c9808 0x9fffffff7f7c7340 0xe00000000077dd30 vx_do_bmap_typed+0xa30
0x9fffffff7f7c97c0 0x9fffffff7f7c7370 0xe00000000079e1e0 vx_bmap_typed+0xc0
0x9fffffff7f7c9738 0x9fffffff7f7c7410 0xe000000000629120 vx_bmap_lockheld2+0x120
0x9fffffff7f7c96b8 0x9fffffff7f7c7410 0xe00000000062e0c0 vx_bmap+0x140
0x9fffffff7f7c95d8 0x9fffffff7f7c7430 0xe0000000006894a0 vx_dirbread+0xb0
0x9fffffff7f7c9490 0x9fffffff7f7c74d0 0xe00000000068d800 vx_dirscan+0xac0
0x9fffffff7f7c93b0 0x9fffffff7f7c7510 0xe000000000664e80 vx_dirlook+0x360
0x9fffffff7f7c9300 0x9fffffff7f7c7530 0xe000000000633c90 vx_lookup+0x310
0x9fffffff7f7c9270 0x9fffffff7f7c7550 0xe0000000012e5050 rfs3_lookup+0x190
0x9fffffff7f7c9198 0x9fffffff7f7c7690 0xe0000000012f1a00 rfsexp_dispatch+0x460
0x9fffffff7f7c9100 0x9fffffff7f7c7980 0xe000000001428130 tcp_svc_getreq+0x1f0
0x9fffffff7f7c9010 0x9fffffff7f7c7a30 0xe000000000733c80 tcp_svc_run+0x3e0
0x9fffffff7f7c9000 0x9fffffff7f7c7bf0 0xe0000000013c6f50 kthread_daemon_startup+0x40
--------------------------------------------------------
tid = 278662 kt_stat = TSRUN
local operation on vnodep = 0xe00000016f6705c0
(/dc_home/[mntpt]/ft/summary/fleft/summary)
local vnode belongs to vfsp = 0xe00000012a8ddac0 mnt_pt = '/dc_home'
local component being looked up =
'FLEX07-CSR_1_bc4_pugwash_f_3v06a_x8_LCHS1-F_25C_RT0_GP5530XAAM_FT_GP5530.1_FT_Jan10_09_02.summary.Z'
SVCXPRT used = 0xe000000163ff2040
NFS filehandle corresponds to inode = 14534 of filesystem '/dc_home'
vfsp = 0xe00000012a8ddac0
--------------------------------------------------------
=================
= Load Averages =
=================
avenrun
=======
36.88 21.37 10.69
mp_avenrun
==========
cpu0 : 37.311754 21.748574 10.940800
cpu1 : 40.747521 22.928449 11.411689
Running Threads (TSRUNPROC)
===========================
TICKS TICKS I TICKS
SINCE SINCE C SINCE NREADY
TID PID PPID RUN IDLE PRI SPU STATE S MIGR FR LO AL COMMAND
------ ----- ----- ----- ----- ---- --- ----- - ----- -- -- -- -------
279095 10 0 4415 4509 143 0 SYS N 4509 43 4 0 nfsktcpd
278662 10 0 5187 16415 138 1 SYS Y 16416 44 2 0 nfsktcpd
Note:
FR: free to run on any processor (candidate for thread migration).
LO: locked (via processor affinity/mpctl) to this processor).
AL: Alpha semaphores misses (special scheduling when miss a sema).
Threads waiting on cpu (TSRUN) - sorted by cpu/pri/ticks-since-run
==================================================================
TID PID PPID TICKS PRI SPU BND GRP SYSCALL COMMAND
------ ----- ----- -------- ---- --- ---- --- -------------- -------
45 29 0 5192 -32 0 N -1 n/a ipmid
3395 3053 3052 4415 -27 0 N -1 sigwait cmcld
3398 3053 3052 4515 -27 0 N -1 select cmcld
1590 1476 1 4415 -16 0 LDOM -1 ki_call midaemon
3397 3057 3056 4415 -14 0 N -1 select cmlogd
3399 3053 3052 4515 -14 0 N -1 select cmcld
57 40 0 5194 100 0 N -1 n/a smpsched
1442 1328 1 4415 120 0 N -1 sigsuspend xntpd
0 0 0 4415 127 0 N -1 n/a swapper
724 641 640 4510 127 0 N -1 sigtimedwait netfmt
1641 1515 1 5190 127 0 N -1 write scopeux
3 3 0 4415 128 0 N -1 n/a statdaemon
4 4 0 4415 128 0 N -1 n/a unhashdaemon
63 37 0 4510 128 0 N -1 n/a schedcpu
64 37 0 4510 128 0 N -1 n/a schedcpu
483 19 0 4415 133 0 CPU 0 n/a ksyncer_daemon
36 20 0 5193 133 0 N -1 n/a lvmdevd
79 51 0 4415 134 0 N -1 n/a vxfsd
84 51 0 4415 134 0 N -1 n/a vxfsd
278661 10 0 4509 134 0 N -1 n/a nfsktcpd
35 19 0 5193 134 0 N -1 n/a ksyncer_daemon
279967 10321 10320 5191 137 0 N -1 ksleep java
75 51 0 4415 138 0 N -1 n/a vxfsd
280253 51 0 16415 138 0 N -1 n/a vxfsd
280256 51 0 22515 138 0 N -1 n/a vxfsd
280258 51 0 22515 138 0 N -1 n/a vxfsd
14 12 0 4510 147 0 N -1 n/a lvmkd
15 13 0 4510 147 0 N -1 n/a lvmkd
16 14 0 4510 147 0 N -1 n/a lvmkd
17 15 0 4510 147 0 N -1 n/a lvmkd
18 16 0 4510 147 0 N -1 n/a lvmkd
19 17 0 4510 147 0 N -1 n/a lvmkd
20 18 0 4510 148 0 N -1 n/a lvmschedd
12921 7440 0 4515 148 0 N -1 n/a lvmdevd
10 10 0 4515 152 0 N -1 n/a nfsktcpd
55 39 0 4515 152 0 N -1 n/a cmcd
502 421 1 4511 154 0 N 0 ksleep utmpd
1632 1507 1 4512 154 0 N -1 select spagent
2202 1870 1863 4512 154 0 N -1 ksleep agent
1513 1399 1 4513 154 0 N -1 select diagmond
1395 1303 1 5193 154 0 N -1 ksleep dced
2637 2307 1485 5193 154 0 N -1 ksleep rep_server
2670 2362 1485 5193 154 0 N -1 ksleep agdbserver
2402 2129 1399 4515 158 0 N -1 read diaglogd
19987 10321 10320 4509 168 0 N -1 nanosleep java
20346 10387 10386 4509 168 0 N -1 nanosleep java
20359 10387 10386 4509 168 0 N -1 nanosleep java
500 421 1 4512 168 0 N 0 sigtimedwait utmpd
2061 1881 1 4514 168 0 N -1 sigtimedwait cimserverd
653 572 1 4515 168 0 N -1 sigtimedwait ipmon
1276 1192 1 4515 168 0 N -1 select sendmail
2040 1870 1863 4515 168 0 N -1 sigtimedwait agent
65 38 0 4510 255* 0 CPU 0 n/a pagezerod
43 27 0 5195 -32 1 CPU 1 n/a progressdaemon
1592 1476 1 5195 -16 1 LDOM -1 nanosleep midaemon
484 19 0 5194 133 1 CPU 1 n/a ksyncer_daemon
280957 10321 10320 5190 137 1 N -1 stat64 java
279840 10321 10320 5191 137 1 N -1 ksleep java
281123 25535 1871 5191 138 1 N -1 unlink ls
281086 25516 908 5192 148 1 N -1 execve inetd
2850 2548 1 4509 152 1 N -1 ioctl ia64_corehw
281184 25553 1384 4512 152 1 N -1 fork cron
281182 25551 7475 4514 152 1 N -1 fork nfs.mon
281183 25552 7516 4514 152 1 N -1 fork samba.mon
281178 10321 10320 14321923 152 1 N -1 lwp_detached_e java
281185 10321 10320 14321923 152 1 N -1 lwp_detached_e java
281186 10321 10320 14321923 152 1 N -1 lwp_detached_e java
281187 10321 10320 14321923 152 1 N -1 lwp_detached_e java
19975 10321 10320 4509 154 1 N -1 read java
281156 10321 10320 4509 154 1 N -1 read java
281157 10321 10320 4509 154 1 N -1 read java
274336 10321 10320 4510 154 1 N -1 poll java
2204 1870 1863 4512 154 1 N -1 read agent
682 601 1 4514 154 1 N -1 select syslogd
1420 1313 1 4515 154 1 N -1 select cimserver
1479 1365 1 4515 154 1 N -1 poll pwgrd
2081 1650 1 4515 154 1 N -1 poll vxsvc
2628 2307 1485 5188 154 1 N -1 ksleep rep_server
503 421 1 5189 154 1 N -1 ksleep utmpd
280950 10321 10320 5189 154 1 N -1 ksleep java
1311 1224 1 5190 154 1 N -1 select ipv6agt
1320 1233 1 5190 154 1 N -1 select mib2agt
2403 2130 1399 5192 154 1 N -1 select psmctd
18475 9744 7510 5192 154 1 N -1 select smbd
281125 10321 10320 5192 154 1 N -1 read java
497 421 1 5193 154 1 N -1 ksleep utmpd
1658 1528 1 5193 154 1 N -1 ksleep swagentd
2041 1871 1 5193 154 1 N -1 select prngd
2646 1485 1 5193 154 1 N -1 ksleep perflbd
2687 2379 2362 5193 154 1 N -1 ksleep alarmgen
13033 7508 1 5193 154 1 N -1 select nmbd
281001 10321 10320 5193 154 1 N -1 poll java
19980 10321 10320 5194 154 1 N -1 ksleep java
280952 10321 10320 10503 154 1 N -1 ksleep java
2671 2362 1485 32899 154 1 N -1 select agdbserver
20345 10387 10386 38407 154 1 N -1 ksleep java
482 417 1 5191 168 1 N -1 sigtimedwait syncer
1976 1650 1 5191 168 1 N -1 nanosleep vxsvc
2079 1650 1 5191 168 1 N -1 nanosleep vxsvc
2985 1303 1 5188 178 1 N -1 sched* dced
54 38 0 16452 255* 1 CPU 1 n/a pagezerod
Top 10 threads sorted by %cpu
=============================
tid pid ticks pri cpu state %cpu command
=== === ===== === === ===== ==== =======
278662 10 5187 138 1 RUN 87.50 nfsktcpd
278663 10 5193 138 0 ZOMB 60.35 nfsktcpd
278661 10 4509 134 0 RUN 56.00 nfsktcpd
279095 10 4415 143 0 RUN 45.59 nfsktcpd
280703 10387 10502 154 0 SLEEP 13.26 java
2637 2307 5193 154 0 RUN 3.70 rep_server
3393 3053 5194 -27 1 SLEEP 3.29 cmcld
281063 10387 22119 154 0 SLEEP 1.49 java
281075 10387 22119 154 0 SLEEP 1.41 java
274134 10321 4510 154 0 SLEEP 1.32 java
==============================
= NFS Server side statistics =
==============================
Server rpc:
Connection oriented:
calls badcalls nullrecv
5167284 0 0
badlen xdrcall dupchecks
0 0 78
dupreqs
0
Connectionless oriented:
calls badcalls nullrecv
9713 0 0
badlen xdrcall dupchecks
0 0 0
dupreqs
0
Server nfs:
calls badcalls
5176998 0
Version 2: (9713 calls)
null getattr setattr
9713 100% 0 0% 0 0%
root lookup readlink
0 0% 0 0% 0 0%
read wrcache write
0 0% 0 0% 0 0%
create remove rename
0 0% 0 0% 0 0%
link symlink mkdir
0 0% 0 0% 0 0%
rmdir readdir statfs
0 0% 0 0% 0 0%
Version 3: (5167288 calls)
null getattr setattr
0 0% 71921 1% 0 0%
lookup access readlink
5072559 98% 81 0% 0 0%
read write create
54 0% 78 0% 0 0%
mkdir symlink mknod
0 0% 0 0% 0 0%
remove rmdir rename
0 0% 0 0% 0 0%
link readdir readdir+
0 0% 0 0% 21926 0%
fsstat fsinfo pathconf
669 0% 0 0% 0 0%
commit
0 0%
|