Contents

System panic with Panic string : Spinlock deadlock!

System panic with Panic string : Spinlock deadlock!

clock interrupts held off for approx 120 seconds.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
System crash dump analysis report
=================================
Symptom
-------
  System panic with Panic string : Spinlock deadlock!
  lock name : process lock
  Stack trace of lock holder (cpu 1)

Root Cause
----------
  Processor 1 appears to have had clock interrupts held off for 
  approx 120 seconds. Current SPL = 0xc400000000000000 (SPL6).
  which lead the processor 1 hold the lock too long.
 
  Site environment get over temp frequenty.
  After system reboot, system run well.

Action Plan
-----------
  1. Keep monitor for one month
  2. Site environment improvement as recommendation.

Detail Analysis
---------------
			=======================
			= General Information =
			=======================

Dump time Wed Jun 23 15:29:43 2004 UTC-8
System has been up 5 days, 3 hours, 2 minutes.

System Name      : HP-UX
Node Name        : localhost
Model            : 9000/800/A500-7X
HP-UX version    : B.11.11 (64-bit Kernel)
Number of CPU's  : 2
Disabled CPU's   : 0
CPU type         : PCXW+ (750 Mhz)
CPU Architecture : PA-RISC 2.0
Load average     : 0.32  0.16  0.11  

			================
			= Crash Events =
			================


Note: Crash event 0 was a PANIC !
Panic string : Spinlock deadlock!

Lock structure
--------------

0x57849500 lock_t {
0x57849500   uintptr_t sl_lock;                    0x00000000_00000000
0x57849508   uintptr_t sl_owner;                   0x00000000_0131d3c8
0x57849510   char    *sl_name_ptr;                 0x00000000_00944c70
0x57849518   uint16_t sl_flag;                     0x0001
0x5784951a   uint16_t sl_next_cpu;                 0x0000
0x5784951c   uint32_t sl_arena_alloc_flag;         0x9bbc4111
0x57849520 };
lock name : process lock


Stack trace of lock holder (cpu 1)
-----------------------------------

Processor #1
==============  EVENT  ============================
= Event #2 is TOC on CPU #1
= p crash_event_t 0xa81060
= p rpb_t 0x1319370
= Using pc from pim.wide.rp_pcoq_head_hi = 0xc3c6446f
= USER mode!
==============  EVENT  ============================
Can't unwind

Stack trace of lock holder (cpu 1) with all args
-------------------------------------------------

Processor #1
==============  EVENT  ============================
= Event #2 is TOC on CPU #1
= p crash_event_t 0xa81060
= p rpb_t 0x1319370
= Using pc from pim.wide.rp_pcoq_head_hi = 0xc3c6446f
= USER mode!
==============  EVENT  ============================
Can't unwind

Stack Traces for all other Crash events 
=======================================

==============  EVENT  ============================
= Event #0 is PANIC on CPU #0
= p crash_event_t 0xa81000
= p rpb_t 0xa7a100
= Using pc from pim.wide.rp_rp_hi = 0x239d84
==============  EVENT  ============================
SR5=0x0bc46000
                SP         RP Return Name
0x400003ffffff1198 0x00239d84 panic+0x6c
0x400003ffffff10f8 0x001bc9c8 too_much_time+0x2e8
0x400003ffffff1048 0x001bcd84 wait_for_lock+0x23c
0x400003ffffff0ed8 0x00034bfc sl_retry+0x1c
0x400003ffffff0e58 0x0014ff2c syscall+0x2a4
0x400003ffffff0c78 0x00033f5c syscallinit+0x554


==============  EVENT  ============================
= Event #1 is PANIC on CPU #0
= p crash_event_t 0xa81030
= p rpb_t 0xa7a470
= Using pc from pim.wide.rp_rp_hi = 0x239d84
==============  EVENT  ============================
SR4=0x00000000
                SP         RP Return Name
0x0000000000a33bc0 0x00239d84 panic+0x6c
0x0000000000a33b20 0x001bc9c8 too_much_time+0x2e8
0x0000000000a33a70 0x001bcd84 wait_for_lock+0x23c
0x0000000000a33900 0x00034bfc sl_retry+0x1c
0x0000000000a33880 0x000b3b70 sigtimedwait_expire+0x20
0x0000000000a337f0 0x008bde3c invoke_callouts_for_self+0x9c
0x0000000000a33730 0x0003fa88 sw_service+0x100
0x0000000000a33640 0x0016c250 mp_ext_interrupt+0x170
0x0000000000a33520 0x0016aa5c ihandler+0x90c
+-------------  TRAP  ----------------------------
|  Trap type 4 in KERNEL mode at 0x23d7a4 (panic_boot+0x13c)
|  p struct save_state 0.0xa33050
+-------------  TRAP  ----------------------------
SR5=0x0bc46000
                SP         RP Return Name
0x400003ffffff13c8 0x0023d7a4 panic_boot+0x13c
0x400003ffffff1318 0x0023e0b8 boot+0xe0
0x400003ffffff1198 0x00239e6c panic+0x154
0x400003ffffff10f8 0x001bc9c8 too_much_time+0x2e8
0x400003ffffff1048 0x001bcd84 wait_for_lock+0x23c
0x400003ffffff0ed8 0x00034bfc sl_retry+0x1c
0x400003ffffff0e58 0x0014ff2c syscall+0x2a4
0x400003ffffff0c78 0x00033f5c syscallinit+0x554


==============  EVENT  ============================
= Event #2 is TOC on CPU #1
= p crash_event_t 0xa81060
= p rpb_t 0x1319370
= Using pc from pim.wide.rp_pcoq_head_hi = 0xc3c6446f
= USER mode!
==============  EVENT  ============================
Can't unwind


			==================
			= Message Buffer =
			==================

.
.
.
Memory Information:
    physical page size = 4096 bytes, logical page size = 4096 bytes

Spinlock timeout failure:  
 The spinlock code has NOT failed! Instead, some spinlock
 using code has failed to release a spinlock soon enough.
 Address: 0x0000000057849500X ;  owner 0x000000000131d3c8X ;  
 lock 0x0000000000000000X ;  flag 0x1  
 next_cpu 0x0  
 Milliseconds spent spinning =60001
 Millseconds/sec = 1000

		=======================================
		= Crash Event / Processor Information =
		=======================================

Number of processors = 2


        s
        t
        a       spin reg eiem/spl         eirr             ipsw            
evt cpu t type  dpth src cr15             cr23             cr22            
--- --- - ----- ---- --- ---------------- ---------------- ----------------
0   0   E PANIC 0    rpb c400000000000000 0800000000002a20 0800001f WRQPDI
1   0   E PANIC 0    rpb c400000000000000 0800000800002a20 0800001b WRQDI
          4          s_s fffffff0ffffffff N/A              0804000b WCQDI
2   1   E TOC   2    rpb c400000000000000 0800040900001548 08000000 W
                     mpi fffffff0ffffffff

Trap Types :
Trap type 4 = External Interrupt (I_EXT_INTP)
Note:  PSW_I bit on processor 1 is off ! (Interrupts are disabled)

WARNING: Processor(s) with PSW_Q bit off !
A processor with the PSW_Q bit off is likely to be running in real mode,
either executing in PDC code or in a low-level trap handler (eg. Data TLB
miss handler). The table below gives either a PDC address, or the Interrupt
Parameter Registers last collected (with the most likely trap type).


cpu trap pcsq.pcoq                     iir        isr.ior                      
--- ---- ----------------------------- ---------- -----------------------------
1   15   0x042f7400.0x00000000c3c6446f 0x43570000 0x00000000.0x0000000000000428

Virtual address information:


VA 0x0.0x428 does not have a translation.
  Hashing details:
  Primary:
    pdirhash=0x02000000=htbl[0]
                              vtag=0x00000000 0x00000000
             hpde2_0_t  pde_next    pde_space  pde_vpage
             0x02000000 0x00000000 0xffffffff 0x00000000


SPL/EIEM values:

0xfffffffeffffffff = SPLPREEMPTOK - Default user mode SPL level.
0xfffffff0ffffffff = SPLNOPREEMPT - Disable kernel preemption (scheduling interrupt off).
0xffffff00ffffffff = SPL2 - Disable software interrupt (software triggers off).
0xed00080000000000 = SPL5 - Disable IO modules.
0xc500000000000000 = SPL6+CLOCK_RESYNC - Disable hardclock+enable clock-resync.
0xc400000000000000 = SPL6 - Disable hardclock.
0x0000000700000000 = SPL7/PSW_I=0 - Disable the world.


			========================
			= Processor Clock Info =
			========================

hardclock_late = 11
itick_per_tick = 7500000
lbolt          = 44301785 (0x2a3fdd9)

    event mpi                rpb                delta      clk eiem eirr PSW
cpu type  timeinval          interval timer     secs:ticks od  0,4  0,4  I
--- ----- ------------------ ------------------ ---------- --- ---- ---- ---
0   PANIC 0x12e51f5b8ea6c    0x12e51eed9f478         0:15  11  1 0  0 1  1 
1   TOC   0x12e4770372475    0x12e5c7617184d      -120:-39 22  1 0  0 1  0 

Note: For the PANIC crash event type it is normal to see a positive delta ticks.
This is due to the processor executing dumpsys() code after creating it's
crash event and rpb.


WARNING: Processor 1 appears to have had clock interrupts held off for 
approx 120 seconds. Current SPL = 0xc400000000000000 (SPL6).


			=================
			= Load Averages =
			=================


avenrun
=======
0.32  0.16  0.11  

mp_avenrun
==========
cpu0 : 0.319385  0.193175  0.159010  
cpu1 : 0.313927  0.118198  0.070324  

			======================
			= Thread Information =
			======================

0 Threads ran in the last second
0 Threads ran in the last 5 seconds
0 Threads ran in the last 10 seconds
0 Threads ran in the last minute
119 Threads ran in the last hour

statdaemon ran 6044 ticks ago

Note:  statdaemon didn't run in the last second !
The statdaemon should normally run every second.
If it has not run in the past second it may be an indication that
it is being starved by higher priority or looping process(es).


Most Common Wait Channels
=========================
                                                          ticks since run:
Wait Channel                                       count  longest    shortest
------------                                       -----  ---------- ----------
per_processor_selects                              29     44289021   6032      
per_processor_selects+0x40                         22     44288975   6026      
vx_worklist_thread_sv                              22     6199       6098      
lvmkd_q                                            6      24421      24421     
streams_blk_sync                                   2      44297143   44297143  
*uptr+0                                            2      17633103   6511      
streams_mp_sync                                    2      6933       6925      
ticks_since_boot                                   2      6044       6044      


Most Common Sleep Callers
=========================
                                                          ticks since run:
Sleep Caller                                       count  longest    shortest
------------                                       -----  ---------- ----------
ksleep()                                           101    44288493   6025      
select()                                           41     44289021   6053      
vx_worklist_thread()                               22     6199       6098      
poll()                                             11     44286863   6026      
wait1()                                            9      44286382   17623252  
pm_sigwait()                                       8      44288494   6045      
lvmkd_daemon()                                     6      24421      24421     
hpstreams_read_int()                               5      17628344   17508     
real_nanosleep()                                   5      49815      6025      
thread_wait()                                      3      44288483   44287151  
_lwp_sema_wait()                                   3      17633074   17632996  
fifo_rdwr()                                        2      44287876   43040981  
sosleep()                                          2      44287724   44287176  
hpstreams_read_tty()                               2      43271193   42634845  
sync_buffer_cache()                                2      8054       6421      


Idle Globals
============

candidate_idle_spu = -1
migration_cycles = 1500000

Note:  There are 56 threads with schedpolicy = SCHED_NOAGE!
Look for "*" after the PRI in the thread listings.



Running Threads (TSRUNPROC) and idle Processors
===============================================


                    TICKS      TICKS                     I TICKS
                    SINCE      SINCE                     C SINCE      NREADY
TID     PID   PPID  RUN        IDLE       PRI  SPU STATE S MIGR       FR LO AL COMMAND
------- ----- ----- ---------- ---------- ---- --- ----- - ---------  -- -- -- -------
14730   11232 11196 6025       6025       223* 0   SYS   Y 6032       0  0  0  java 
14727   11232 11196 6025       6026       223* 1   SYS   N 6028       1  0  0  java 

Note: 
FR: free to run on any processor (candidate for thread migration).
LO: locked (via processor affinity/mpctl) to this processor).
AL: Alpha semaphores misses (special scheduling when miss a sema).


Threads waiting on cpu (TSRUN) - sorted by cpu/pri/ticks-since-run
==================================================================

TID     PID   PPID  TICKS      PRI  SPU BND GRP SYSCALL        COMMAND
------- ----- ----- ---------- ---- --- --- --- -------------- -------
12      12    0     44297142   -32  0   N   -1  n/a            ttisr          
556     519   1     11862      154  1   N   -1  select         syslogd        

Note:  There is 1 thread in TSZOMB stat !