本文是羽千葉同學在實際工作過程中遇到的復雜的服務器宕機問題的真實案例。羽同學說奔跑吧死機黑屏專題對解決死機問題很有啟發(fā)和幫助很大。
OS:centos7.7
kernel:3.10.0-1062.7.1
[482661.362612] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
[482661.367822] IP: [<ffffffffba2e5b49>] check_preempt_wakeup+0xe9/0x220
[482661.368337] PGD 0
[482661.368522] Oops: 0000 [#1] SMP
[482661.368806] Modules linked in: veth vxlan ip6_udp_tunnel udp_tunnel ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables nf_conntrack_netlink xt_conntrack
[482661.376311] CPU: 1 PID: 29670 Comm: runc:[2:INIT] Kdump: loaded Tainted: G------------ T 3.10.0-1062.7.1.el7.x86_64 #1
[482661.377199] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[482661.377637] task: ffff96808d678000 ti: ffff968091738000 task.ti: ffff968091738000
[482661.378196] RIP: 0010:[<ffffffffba2e5b49>] [<ffffffffba2e5b49>] check_preempt_wakeup+0xe9/0x220
[482661.378863] RSP: 0018:ffff96809173be30 EFLAGS: 00010006
[482661.379270] RAX: 0000000000000002 RBX: ffff96809025af00 RCX: 0000000000000000
[482661.379806] RDX: 0000000000000002 RSI: ffff9680406f9070 RDI: ffff96809fc9ad00
[482661.380340] RBP: ffff96809173be68 R08: ffffffffbaa1e3c0 R09: 0000000000000000
[482661.380875] R10: 000000000000b8ff R11: f448000000000000 R12: 0000000000000000
[482661.381411] R13: ffff96808d678000 R14: ffff96809fc9ac80 R15: 0000000000000001
[482661.381946] FS: 00007f9ca6c4d740(0000) GS:ffff96809fc80000(0000) knlGS:0000000000000000
[482661.382550] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[482661.382983] CR2: 0000000000000070 CR3: 000000020f69c000 CR4: 00000000003606e0
[482661.383522] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[482661.384054] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[482661.384594] Call Trace:
[482661.384806] [<ffffffffba2d7782>] check_preempt_curr+0x92/0xa0
[482661.385250] [<ffffffffba2dad24>] wake_up_new_task+0x104/0x1a0
[482661.385700] [<ffffffffba29a9f1>] do_fork+0xf1/0x330
[482661.386095] [<ffffffffba988a26>] ? trace_do_page_fault+0x56/0x150
[482661.386565] [<ffffffffba29acb6>] SyS_clone+0x16/0x20
[482661.386950] [<ffffffffba98e2b4>] stub_clone+0x44/0x70
[482661.387343] [<ffffffffba98dede>] ? system_call_fastpath+0x25/0x2a
[482661.387791] Code: 00 00 83 e8 01 48 8b 5b 68 39 d0 75 f5 49 8b 7c 24 70 48 3b 7b 70 74 1e 66 2e 0f 1f 84 00 00 00 00 00 4d 8b 64 24 68 48 8b 5b 68 <49> 8b 7c 24 70 48 3b 7b 70 75 ec 48 85 ff 74 e7 89 4d d0 e8 9f
[482661.389736] RIP [<ffffffffba2e5b49>] check_preempt_wakeup+0xe9/0x220
[482661.390208] RSP <ffff96809173be30>
[482661.390475] CR2: 0000000000000070
出錯位置在第二行報出,位于check_preempt_wakeup偏移0xe9處;
Code這行代碼中<49>是出錯機器碼起始字節(jié).
crash> dis check_preempt_wakeup+233
0xffffffffba2e5b49 <check_preempt_wakeup+233>: mov 0x70(%r12),%rdi
crash> rd -8 0xffffffffba2e5b49 5
ffffffffba2e5b49: 49 8b 7c 24 70
下載kernel-debuginfo-3.10.0-1062.7.1.el7.x86_64.rpm、kernel-debuginfo-common-x86_64-3.10.0-1062.7.1.el7.x86_64.rpm調試包
解壓kenel-debuginfo包
rpm2cpio kernel-debuginfo-3.10.0-1062.7.1.el7.x86_64.rpm | cpio -div;
crash ./usr/lib/debug/lib/modules/3.10.0-1062.7.1.el7.x86_64/vmlinux vmcore
crash> bt
PID: 29670 TASK: ffff96808d678000 CPU: 1 COMMAND: 'runc:[2:INIT]'
#0 [ffff96809173ba90] machine_kexec at ffffffffba265b24
#1 [ffff96809173baf0] __crash_kexec at ffffffffba322422
#2 [ffff96809173bbc0] crash_kexec at ffffffffba322510
#3 [ffff96809173bbd8] oops_end at ffffffffba985798
#4 [ffff96809173bc00] no_context at ffffffffba275bb4
#5 [ffff96809173bc50] __bad_area_nosemaphore at ffffffffba275e82
#6 [ffff96809173bca0] bad_area_nosemaphore at ffffffffba275fa4
#7 [ffff96809173bcb0] __do_page_fault at ffffffffba988750
#8 [ffff96809173bd20] trace_do_page_fault at ffffffffba988a26
#9 [ffff96809173bd60] do_async_page_fault at ffffffffba987fa2
#10 [ffff96809173bd80] async_page_fault at ffffffffba9847a8
[exception RIP: check_preempt_wakeup+233]
RIP: ffffffffba2e5b49 RSP: ffff96809173be30 RFLAGS: 00010006
RAX: 0000000000000002 RBX: ffff96809025af00 RCX: 0000000000000000
RDX: 0000000000000002 RSI: ffff9680406f9070 RDI: ffff96809fc9ad00
RBP: ffff96809173be68 R8: ffffffffbaa1e3c0 R9: 0000000000000000
R10: 000000000000b8ff R11: f448000000000000 R12: 0000000000000000
R13: ffff96808d678000 R14: ffff96809fc9ac80 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#11 [ffff96809173be70] check_preempt_curr at ffffffffba2d7782
#12 [ffff96809173be88] wake_up_new_task at ffffffffba2dad24
#13 [ffff96809173bec0] do_fork at ffffffffba29a9f1
#14 [ffff96809173bf38] sys_clone at ffffffffba29acb6
#15 [ffff96809173bf48] stub_clone at ffffffffba98e2b4
#16 [ffff96809173bf50] system_call_fastpath at ffffffffba98dede
RIP: 00007f9ca630e851 RSP: 00007ffe97b8c2e8 RFLAGS: 00000202
RAX: 0000000000000038 RBX: 00007f9ca2ffc700 RCX: ffffffffffffffff
RDX: 00007f9ca2ffc9d0 RSI: 00007f9ca2ffbfb0 RDI: 00000000003d0f00
RBP: 00007ffe97b8c410 R8: 00007f9ca2ffc700 R9: 00007f9ca2ffc700
R10: 00007f9ca2ffc9d0 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000801000 R14: 0000000000000000 R15: 00007f9ca2ffc700
ORIG_RAX: 0000000000000038 CS: 0033 SS: 002b
crash> dis check_preempt_wakeup+233
0xffffffffba2e5b49 <check_preempt_wakeup+233>: mov 0x70(%r12),%rdi
將r12+0x70地址處內容取出來,賦值給rdi,而出錯時候r12寄存器值為0,即訪問了地址0x70,該地址是一個非法地址,所以導致內核宕機。
crash> dis check_preempt_wakeup+233 -l
/usr/src/debug/kernel-3.10.0-1062.7.1.el7/linux-3.10.0-1062.7.1.el7.x86_64/kernel/sched/fair.c: 343
0xffffffffba2e5b49 <check_preempt_wakeup+233>: mov 0x70(%r12),%rdi
報錯說是源碼中343行,其實不是這一行。通過此種方式得出的出錯位置不準確。下面通過兩種方式來獲取準確的出錯位置。
棧調用關系是:do_fork---->wake_up_new_task---->check_preempt_curr---->check_preempt_wakeup
wake_up_new_task函數(shù)中調用check_preempt_curr為:
void wake_up_new_task(struct task_struct *p)
{
……
check_preempt_curr(rq, p, WF_FORK);
……
}
void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
{
……
rq->curr->sched_class->check_preempt_curr(rq, p, flags);
……
}
check_preempt_curr就是check_preempt_wakeup函數(shù)
可以看到check_preempt_wakeup有三個參數(shù),是通過check_preempt_curr函數(shù)調用 rq->curr->sched_class->check_preempt_curr進行傳遞。
x86下參數(shù)傳遞通過寄存器傳遞:%rdi、%rsi、%rdx、%rcx、%r8、%r9依次對應第1個參數(shù)、第2個參數(shù)……,如果有超過的參數(shù)則通過棧傳遞。
通過上面函數(shù)調用關系,可以得知第三個參數(shù)wake_flags=WF_FORK;
下面我們通過匯編來推導第1個以及第2個參數(shù);第一個參數(shù)rq通過%rdi寄存器傳入;第二個參數(shù)通過%rsi寄存器傳入;
如果在函數(shù)入口處就知道這兩個寄存器的值,那么就可以確定函數(shù)參數(shù)值。
static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
{
……
}
check_preempt_wakeup函數(shù)對應的匯編代碼如下:
0xffffffffba2e5a60 <check_preempt_wakeup>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffba2e5a65 <check_preempt_wakeup+5>: push %rbp
0xffffffffba2e5a66 <check_preempt_wakeup+6>: mov %rsp,%rbp
0xffffffffba2e5a69 <check_preempt_wakeup+9>: push %r15
0xffffffffba2e5a6b <check_preempt_wakeup+11>: push %r14
0xffffffffba2e5a6d <check_preempt_wakeup+13>: mov %rdi,%r14
0xffffffffba2e5a70 <check_preempt_wakeup+16>: push %r13
0xffffffffba2e5a72 <check_preempt_wakeup+18>: push %r12
0xffffffffba2e5a74 <check_preempt_wakeup+20>: push %rbx
0xffffffffba2e5a75 <check_preempt_wakeup+21>: lea 0x68(%rsi),%rbx
0xffffffffba2e5a79 <check_preempt_wakeup+25>: sub $0x10,%rsp
……
第一個參數(shù)是通過寄存器%rdi傳入,在匯編代碼中偏移13位置處,看到將%rdi寄存器賦值給了%r14寄存器,
而%r14寄存器,從此時被賦值開始到發(fā)生宕機時刻,值都沒有發(fā)生改變,
說明%r14寄存器里面保存了函數(shù)入口的第一個參數(shù)值;
crash> dis check_preempt_wakeup | grep r14 //過濾出包含r14的指令
0xffffffffba2e5a6b <check_preempt_wakeup+11>: push %r14
0xffffffffba2e5a6d <check_preempt_wakeup+13>: mov %rdi,%r14
0xffffffffba2e5b9e <check_preempt_wakeup+318>: mov %r14,%rdi
0xffffffffba2e5bb0 <check_preempt_wakeup+336>: cmp 0x8b0(%r14),%r13
0xffffffffba2e5c09 <check_preempt_wakeup+425>: pop %r14
0xffffffffba2e5c27 <check_preempt_wakeup+455>: pop %r14
通過宕機時候打印出的寄存器信息,可以得到%r14=ffff96809fc9ac80,所以函數(shù)入口第一個參數(shù)值為rq=ffff96809fc9ac80;
第二個參數(shù)是通過寄存器%rsi傳入,過濾出函數(shù)中%rsi的指令,發(fā)現(xiàn)從進入check_preempt_wakeup函數(shù)開始到宕機時刻,
%rsi指令的值都沒有變化,所以宕機時刻寄存器%rsi中就保存著第2個參數(shù)值。%rsi=ffff9680406f9070,所以函數(shù)入口第二個參數(shù)值為
p=ffff9680406f9070;
crash> dis check_preempt_wakeup | grep rsi
0xffffffffba2e5a75 <check_preempt_wakeup+21>: lea 0x68(%rsi),%rbx
0xffffffffba2e5aa1 <check_preempt_wakeup+65>: mov 0xd8(%rsi),%rdi
0xffffffffba2e5adb <check_preempt_wakeup+123>: mov 0x188(%rsi),%r9d
0xffffffffba2e5af7 <check_preempt_wakeup+151>: mov 0x110(%rsi),%eax ----->此條指令及前面指令是出錯前的指令,可以被執(zhí)行,rsi沒有發(fā)生變化;
0xffffffffba2e5c47 <check_preempt_wakeup+487>: mov %rsi,-0x30(%rbp) ----->此條指令開始是出錯后指令,不會執(zhí)行到此;
0xffffffffba2e5c55 <check_preempt_wakeup+501>: mov -0x30(%rbp),%rsi
0xffffffffba2e5c6b <check_preempt_wakeup+523>: cmpl $0x5,0x188(%rsi)
通過上面推導函數(shù)check_preempt_wakeup的三個參數(shù)為:
struct rq *rq=ffff96809fc9ac80;
struct task_struct *p=ffff9680406f9070;
int wake_flags=WF_FORK;
如果直接在check_preempt_wakeup中無法推出函數(shù)參數(shù)值,則可以向上在調用者代碼中進行推斷。
struct rq *rq=ffff96809fc9ac80,struct task_struct *p=ffff9680406f9070,int wake_flags=WF_FORK;
static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
{
struct task_struct *curr = rq->curr; //crash> struct rq.curr ffff96809fc9ac80
//------>curr = 0xffff96808d678000
struct sched_entity *se = &curr->se, *pse = &p->se; //crash> struct task_struct -x -o | grep sched_entity
//[0x68] struct sched_entity se;
//------>se=0xffff96808d678068,pse=0xffff9680406f90d8
struct cfs_rq *cfs_rq = task_cfs_rq(curr); // curr->se.cfs_rq
//crash> struct task_struct.se.cfs_rq 0xffff96808d678000
//se.cfs_rq = 0xffff967f41400400
//------>cfs_rq=0xffff967f41400400
int scale = cfs_rq->nr_running >= sched_nr_latency; //crash> p sched_nr_latency crash> struct cfs_rq.nr_running 0xffff967f41400400
//sched_nr_latency = $1 = 2 nr_running = 2
//------>scale=(2>=2)=1;
int next_buddy_marked = 0;
if (unlikely(se == pse)) //此處se=0xffff96808d678068,pse=0xffff9680406f90d8,兩者不相等,所以不返回;
return;
if (unlikely(throttled_hierarchy(cfs_rq_of(pse)))) //cfs_rq_of(pse):pse->cfs_rq
return; //crash> struct sched_entity.cfs_rq 0xffff9680406f90d8 crash> struct cfs_rq.throttle_count 0xffff967f41400400
//cfs_rq = 0xffff967f41400400 throttle_count = 0
//所以此處條件也不成立,不返回;
if (sched_feat(NEXT_BUDDY) && scale && !(wake_flags & WF_FORK)) { //此處wake_flags=WF_FORK,所以!(wake_flags & WF_FORK)條件不成立,不執(zhí)行條件體語句。
set_next_buddy(pse);
next_buddy_marked = 1;
}
if (test_tsk_need_resched(curr)) //crash> struct task_struct.stack 0xffff96808d678000 crash> struct thread_info.flags 0xffff968091738000
return; //stack = 0xffff968091738000 flags = 128
//由于thread_info->flags中沒有設置TIF_NEED_RESCHED,所以此處條件不成立,不返回;
if (unlikely(curr->policy == SCHED_IDLE) && //crash> struct task_struct.policy 0xffff96808d678000
likely(p->policy != SCHED_IDLE)) //policy = 0
goto preempt; //SCHED_IDLE=5與curr->policy=0不相等,條件不成立,所以不跳轉;
if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION)) //該處條件也不成立,不跳轉;
return;
find_matching_se(&se, &pse); //------------>進入find_matching_se函數(shù)中執(zhí)行
update_curr(cfs_rq_of(se));
BUG_ON(!pse);
if (wakeup_preempt_entity(se, pse) == 1) {
if (!next_buddy_marked)
set_next_buddy(pse);
goto preempt;
}
return;
preempt:
resched_curr(rq);
if (unlikely(!se->on_rq || curr == rq->idle))
return;
if (sched_feat(LAST_BUDDY) && scale && entity_is_task(se))
set_last_buddy(se);
}
static void find_matching_se(struct sched_entity **se, struct sched_entity **pse)
{
int se_depth, pse_depth;
se_depth = (*se)->depth; //struct sched_entity.depth 0xffff96808d678068
//depth = 3
//------>se_depth=3
pse_depth = (*pse)->depth; //crash> struct sched_entity.depth 0xffff9680406f90d8
//depth = 2
//------>pse_depth=2
while (se_depth > pse_depth) { //se_depth=3,pse_depth=2,只進行一次循環(huán)
se_depth--; //se_depth=3-1=2
*se = parent_entity(*se); //crash> struct sched_entity.parent 0xffff96808d678068
} //parent = 0xffff968095227900
//------->*se=0xffff968095227900
while (pse_depth > se_depth) { //pse_depth=2,se_depth=2兩者相等,條件不成立,不執(zhí)行while中語句
pse_depth--;
*pse = parent_entity(*pse);
}
/*
1. 第一輪循環(huán) *se=0xffff968095227900,*pse=0xffff9680406f90d8
crash> struct sched_entity.cfs_rq 0xffff968095227900 crash> struct sched_entity.cfs_rq 0xffff9680406f90d8
cfs_rq = 0xffff96804d2cfc00 cfs_rq = 0xffff967f41400400
(*se)->cfs_rq=0xffff96804d2cfc00,(*pse)->cfs_rq=0xffff967f41400400,兩者不相等,進而獲取sched_entity->parent
crash> struct sched_entity.parent 0xffff968095227900 struct sched_entity.parent 0xffff9680406f90d8
parent = 0xffff967f2e069300 parent = 0xffff968095227900
*se=0xffff967f2e069300 *pse=0xffff968095227900
2. 第二輪循環(huán) *se=0xffff967f2e069300,*pse=0xffff968095227900
struct sched_entity.cfs_rq 0xffff967f2e069300 crash> struct sched_entity.cfs_rq 0xffff968095227900
cfs_rq = 0xffff967f540ee600 cfs_rq = 0xffff96804d2cfc00
(*se)->cfs_rq=0xffff967f540ee600,(*pse)->cfs_rq=0xffff96804d2cfc00,兩者不相等,進而獲取sched_entity->parent
crash> struct sched_entity.parent 0xffff967f2e069300 struct sched_entity.parent 0xffff968095227900
parent = 0xffff96809025af00 parent = 0xffff967f2e069300
*se=0xffff96809025af00 *pse=0xffff967f2e069300
3. 第三輪循環(huán) *se=0xffff96809025af00,*pse=0xffff967f2e069300
struct sched_entity.cfs_rq 0xffff96809025af00 crash> struct sched_entity.cfs_rq 0xffff967f2e069300
cfs_rq = 0xffff96809fc9ad00 cfs_rq = 0xffff967f540ee600
(*se)->cfs_rq=0xffff96809fc9ad00,(*pse)->cfs_rq=0xffff967f540ee600,兩者不相等,進而獲取sched_entity->parent
crash> struct sched_entity.parent 0xffff96809025af00 struct sched_entity.parent 0xffff967f2e069300
parent = 0x0 parent = 0xffff96809025af00
*se=0x0 *pse=0xffff96809025af00
4. 第四輪循環(huán) *se=0x0,*pse=0xffff96809025af00
由于*se=0,所以在執(zhí)行(*se)->cfs_rq就報錯了,所以出錯位置是在is_same_group函數(shù)中;
*/
while (!is_same_group(*se, *pse)) { //(*se)->cfs_rq是否和(*pse)->cfs_rq相等;
*se = parent_entity(*se);
*pse = parent_entity(*pse);
}
}
cfs_rq在sched_entity中偏移0x70,出錯位置處匯編指令是從以%r12為基地址,偏移0x70處取值,即獲取sched_entity->cfs_rq的值,所以經(jīng)過上述對源碼分析可以找到出錯位置。
crash> dis check_preempt_wakeup+233
0xffffffffba2e5b49 <check_preempt_wakeup+233>: mov 0x70(%r12),%rdi
crash> struct sched_entity -x -o | grep cfs_rq
[0x70] struct cfs_rq *cfs_rq;
出錯原因是由于父子進程的depth不匹配;主線中有相關patch,升級patch解決。
[upstream eeb61e53ea19be0c4015b00b2e8b3b2185436f2b]
父進程:depth = 0;parent = 0;
depth = 1;parent = 0xffff96809025af00;
depth = 2;parent = 0xffff967f2e069300;
depth = 3;parent = 0xffff968095227900;
子進程:depth = 0;parent = 0;
depth = 1;parent = 0xffff96809025af00;
depth = 2;parent = 0xffff967f2e069300;
depth = 2;parent = 0xffff968095227900;
crash> dis check_preempt_wakeup
0xffffffffba2e5a60 <check_preempt_wakeup>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffba2e5a65 <check_preempt_wakeup+5>: push %rbp
0xffffffffba2e5a66 <check_preempt_wakeup+6>: mov %rsp,%rbp
0xffffffffba2e5a69 <check_preempt_wakeup+9>: push %r15
0xffffffffba2e5a6b <check_preempt_wakeup+11>: push %r14
0xffffffffba2e5a6d <check_preempt_wakeup+13>: mov %rdi,%r14 #此處將函數(shù)入口第一個參數(shù)保存到%r14寄存器中;
0xffffffffba2e5a70 <check_preempt_wakeup+16>: push %r13
0xffffffffba2e5a72 <check_preempt_wakeup+18>: push %r12
0xffffffffba2e5a74 <check_preempt_wakeup+20>: push %rbx
0xffffffffba2e5a75 <check_preempt_wakeup+21>: lea 0x68(%rsi),%rbx #此處將以%rsi為基地址,偏移0x68處地址賦值給%rbx,即%rsi+0x68值給%rbx
#%rsi作為函數(shù)第二個參數(shù),即struct task_struct,其中偏移0x68處剛好是se
#所以該條匯編指令對應源碼:pse = &p->se
#%rbx = %rsi + 0x68 = 0xffff9680406f9070 + 0x68 = 0xffff9680406f90d8;
0xffffffffba2e5a79 <check_preempt_wakeup+25>: sub $0x10,%rsp
0xffffffffba2e5a7d <check_preempt_wakeup+29>: mov 0x8a8(%rdi),%r13 #%rdi作為函數(shù)第一個參數(shù),struct rq,其中偏移0x8a8剛好是struct task_struct *curr結構
#此處獲取rq->curr值保存到%r13寄存器中;
#%r13 = [0xffff96809fc9ac80 + 0x8a8] = 0xffff96808d678000
0xffffffffba2e5a84 <check_preempt_wakeup+36>: mov 0xd8(%r13),%rax #此處是以%r13為基地址,偏移0xd8,將其中內容取出,賦值給%rax
#%r13值是rq->curr,即在struct task_struct中偏移0xd8地址處值,直接看偏移難以計算出該處是哪個變量
#task_struct中偏移0x68是sched_entity,而sched_entity中偏移0x70是cfs_rq,0xd8=0x68+0x70
#所以該條指令是獲取task_struct中sched_entity中cfs_rq的值,對應源碼struct cfs_rq *cfs_rq = task_cfs_rq(curr);
#所以%rax保存有curr->se->cfs_rq值;
#%rax = [0xffff96808d678000 + 0xd8] = 0xffff967f41400400
0xffffffffba2e5a8b <check_preempt_wakeup+43>: lea 0x68(%r13),%r12 #%r13中保存的是rq->curr值,此處以其為基地址,偏移0x68處地址賦值給%r12,即對應源碼se = &curr->se
#%r12=%r13+0x68=0xffff96808d678000+0x68=0xffff96808d678068
0xffffffffba2e5a8f <check_preempt_wakeup+47>: cmp %rbx,%r12 #%rbx=0xffff9680406f90d8 ---> pse = &p->se; %r12=0xffff96808d678068 ----> se = &curr->se;兩者不相等
#對應源碼if (unlikely(se == pse))
0xffffffffba2e5a92 <check_preempt_wakeup+50>: mov 0x10(%rax),%ecx #%rax保存的是curr->se->cfs_rq值,cfs_rq中偏移0x10處是nr_running,此處獲取nr_running值保存到%ecx中
#對應源碼:cfs_rq->nr_running=2
#%ecx=2
0xffffffffba2e5a95 <check_preempt_wakeup+53>: mov 0xb76b09(%rip),%eax #%rip為下一條指令的地址,即0xffffffffba2e5a9b,%rax=[0xffffffffba2e5a9b+0xb76b09]=0x0032dcd500000002
#%eax=2
0xffffffffba2e5a9b <check_preempt_wakeup+59>: je 0xffffffffba2e5c00 <check_preempt_wakeup+416> #由于上面%rbx與%r12不相等,所以此處不跳轉;
0xffffffffba2e5aa1 <check_preempt_wakeup+65>: mov 0xd8(%rsi),%rdi #%rdi=[%rsi+0xd8]=[%rsi+0x68+0x70],%rsi為task_struct基地址;偏移0x68處是sched_entity成員地址;
#sched_entity中偏移0x70處是cfs_rq地址,此條匯編指令是獲取task_struct中sched_entity成員的cfs_rq的值給%rdi
#對應源碼cfs_rq_of(pse) ----> pse->cfs_rq
#%rdi=0xffff967f41400400
0xffffffffba2e5aa8 <check_preempt_wakeup+72>: jmpq 0xffffffffba2e5c10 <check_preempt_wakeup+432> #跳轉到0xffffffffba2e5c10處執(zhí)行 ------------->@@@@@@@@@@ 1 @@@@@@@@@
0xffffffffba2e5aad <check_preempt_wakeup+77>: xor %r15d,%r15d #------->@@@@@@@@@@ 2 @@@@@@@@@
#%r15d=0
0xffffffffba2e5ab0 <check_preempt_wakeup+80>: cmp %eax,%ecx #比較%eax與%ecx
0xffffffffba2e5ab2 <check_preempt_wakeup+82>: setae %r15b #如果%eax=%ecx則設置%r15b寄存器,此處%eax=2,%ecx=2,兩者相等,所以%r15b=1;
#對應源碼:int scale = cfs_rq->nr_running >= sched_nr_latency;
0xffffffffba2e5ab6 <check_preempt_wakeup+86>: nopl 0x0(%rax,%rax,1)
0xffffffffba2e5abb <check_preempt_wakeup+91>: xor %ecx,%ecx #%ecx=0
0xffffffffba2e5abd <check_preempt_wakeup+93>: mov 0x8(%r13),%rax #%rax=[%r13+0x8],%r13是rq->curr,偏移0x8處是棧地址stack,curr->stack=%rax=ffff968091738000
0xffffffffba2e5ac1 <check_preempt_wakeup+97>: mov 0x10(%rax),%rax #task_struct->stack處地址也就是thread_info地址,thread_info中偏移0x10處是flags成員,此處獲取flags值
#對應源碼test_tsk_need_resched(curr)
#%rax=[%rax+0x10]=[ffff968091738000+0x10]=0000000000000080,即thread_info->flags=0x80
0xffffffffba2e5ac5 <check_preempt_wakeup+101>: test $0x8,%al #判斷thread_info->flags的bit 3是否置1,也就是判斷thread_info->flags中是否設置TIF_NEED_RESCHED標記
#對應源碼test_tsk_need_resched(curr)
0xffffffffba2e5ac7 <check_preempt_wakeup+103>: jne 0xffffffffba2e5c00 <check_preempt_wakeup+416> #此處條件不成立,不跳轉
0xffffffffba2e5acd <check_preempt_wakeup+109>: cmpl $0x5,0x188(%r13) #%r13對應rq->curr,task_struct中偏移0x188處是policy成員,此處比較rq->curr->policy是否為5;
#對應源碼unlikely(curr->policy == SCHED_IDLE)
#curr->policy=[%r13+0x188]=[0xffff96808d678000+0x188]=0000000200000000;
#curr->policy是int類型,占4個字節(jié),所以取低4字節(jié),即0,curr->policy=0;
0xffffffffba2e5ad5 <check_preempt_wakeup+117>: je 0xffffffffba2e5c6b <check_preempt_wakeup+523> #curr->policy == SCHED_IDLE不成立,所以不進行跳轉
#對應源碼if (unlikely(curr->policy == SCHED_IDLE) &&
0xffffffffba2e5adb <check_preempt_wakeup+123>: mov 0x188(%rsi),%r9d #%rsi對應第二個參數(shù)struct task_struct *p;此處獲取p->policy=[%rsi+0x188]=[0xffff9680406f9070+0x188]=0x0000000200000000
#p->policy占4字節(jié),所以p->policy=0,
#%r9d=0
0xffffffffba2e5ae2 <check_preempt_wakeup+130>: test %r9d,%r9d #判斷%r9d是否為0,此處為0
0xffffffffba2e5ae5 <check_preempt_wakeup+133>: jne 0xffffffffba2e5c00 <check_preempt_wakeup+416> #如果不為0則跳轉,此處不跳轉
#對應源碼:if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
0xffffffffba2e5aeb <check_preempt_wakeup+139>: nopl 0x0(%rax,%rax,1)
0xffffffffba2e5af0 <check_preempt_wakeup+144>: mov 0x110(%r13),%edx #%r13中保存rq->curr值,此處偏移0x110直接找偏移難以找出,0x110=0x68+0xa0,其中task_struct中偏移0x68處是sched_entity;
#sched_entity中偏移0xa0處是depth成員,所以以task_struct為基地址,偏移0x110處是獲取depth成員;
#%edx=[%r13+0x110]=3 ----> se_depth=3
#對應源碼se_depth = (*se)->depth;
0xffffffffba2e5af7 <check_preempt_wakeup+151>: mov 0x110(%rsi),%eax #rsi中保存第二個參數(shù)struct task_struct *p值,此處獲取p->se->depth值
#%eax=[%rsi+0x110]=2 ----> pse_depth=2
#對應源碼pse_depth = (*pse)->depth;
0xffffffffba2e5afd <check_preempt_wakeup+157>: cmp %eax,%edx #比較%edx與%eax
0xffffffffba2e5aff <check_preempt_wakeup+159>: jle 0xffffffffba2e5b16 <check_preempt_wakeup+182> #%edx=3 如果小于等于 %eax=2,則跳轉,此處不成立,不跳轉
#對應源碼:while (se_depth > pse_depth)
0xffffffffba2e5b01 <check_preempt_wakeup+161>: nopl 0x0(%rax)
0xffffffffba2e5b08 <check_preempt_wakeup+168>: sub $0x1,%edx #%edx=%edx - 1 = 2;
#se_depth=2
#對應源碼:se_depth--;
0xffffffffba2e5b0b <check_preempt_wakeup+171>: mov 0x68(%r12),%r12 #%r12中保留有rq->curr->se處地址,sched_entity中偏移0x68處是parent成員,此處獲取se->parent成員值給%r12
#%r12=0xffff968095227900
#對應源碼:*se = parent_entity(*se);
0xffffffffba2e5b10 <check_preempt_wakeup+176>: cmp %eax,%edx #此時%eax=2,%edx=2,兩者相等
0xffffffffba2e5b12 <check_preempt_wakeup+178>: jne 0xffffffffba2e5b08 <check_preempt_wakeup+168> #如果兩者不想等,則跳轉,條件不成立,此處不跳轉;
0xffffffffba2e5b14 <check_preempt_wakeup+180>: mov %eax,%edx #%edx=%eax=2
0xffffffffba2e5b16 <check_preempt_wakeup+182>: cmp %eax,%edx #比較%edx與%eax
0xffffffffba2e5b18 <check_preempt_wakeup+184>: jge 0xffffffffba2e5b49 <check_preempt_wakeup+233> #%edx大于等于%eax則跳轉,此處相等,跳轉到0xffffffffba2e5b49------>@@@@@@@@@@ 3 @@@@@@@@@
0xffffffffba2e5b1a <check_preempt_wakeup+186>: nopw 0x0(%rax,%rax,1)
0xffffffffba2e5b20 <check_preempt_wakeup+192>: sub $0x1,%eax
0xffffffffba2e5b23 <check_preempt_wakeup+195>: mov 0x68(%rbx),%rbx
0xffffffffba2e5b27 <check_preempt_wakeup+199>: cmp %edx,%eax
0xffffffffba2e5b29 <check_preempt_wakeup+201>: jne 0xffffffffba2e5b20 <check_preempt_wakeup+192>
0xffffffffba2e5b2b <check_preempt_wakeup+203>: mov 0x70(%r12),%rdi
0xffffffffba2e5b30 <check_preempt_wakeup+208>: cmp 0x70(%rbx),%rdi
0xffffffffba2e5b34 <check_preempt_wakeup+212>: je 0xffffffffba2e5b54 <check_preempt_wakeup+244>
0xffffffffba2e5b36 <check_preempt_wakeup+214>: nopw %cs:0x0(%rax,%rax,1)
0xffffffffba2e5b40 <check_preempt_wakeup+224>: mov 0x68(%r12),%r12
0xffffffffba2e5b45 <check_preempt_wakeup+229>: mov 0x68(%rbx),%rbx
#------>@@@@@@@@@@ 3 @@@@@@@@@
0xffffffffba2e5b49 <check_preempt_wakeup+233>: mov 0x70(%r12),%rdi #實際報錯在該行,如果直接從上往下推一次,此處%r12不為0,而報錯時候%r12=0,具有誤導性
#是由于經(jīng)過while循環(huán),不斷重新給%12賦值,最后導致%12=0;
#此處%12對應rq->curr中sched_entity的成員地址,sched_entity中偏移0x70剛好是cfs_rq成員,所以獲取sched_entity中cfs_rq成員;
#對應源碼:if (se->cfs_rq == pse->cfs_rq)
0xffffffffba2e5b4e <check_preempt_wakeup+238>: cmp 0x70(%rbx),%rdi #獲取pse->cfs_rq
0xffffffffba2e5b52 <check_preempt_wakeup+242>: jne 0xffffffffba2e5b40 <check_preempt_wakeup+224> #每次循環(huán)se->cfs_rq與pse->cfs_rq都不相等,所以跳轉到0xffffffffba2e5b40處執(zhí)行;
0xffffffffba2e5b54 <check_preempt_wakeup+244>: test %rdi,%rdi
0xffffffffba2e5b57 <check_preempt_wakeup+247>: je 0xffffffffba2e5b40 <check_preempt_wakeup+224>
0xffffffffba2e5b59 <check_preempt_wakeup+249>: mov %ecx,-0x30(%rbp)
0xffffffffba2e5b5c <check_preempt_wakeup+252>: callq 0xffffffffba2e4900 <update_curr>
0xffffffffba2e5b61 <check_preempt_wakeup+257>: test %rbx,%rbx
0xffffffffba2e5b64 <check_preempt_wakeup+260>: mov -0x30(%rbp),%ecx
0xffffffffba2e5b67 <check_preempt_wakeup+263>: je 0xffffffffba2e5c7d <check_preempt_wakeup+541>
0xffffffffba2e5b6d <check_preempt_wakeup+269>: mov 0x50(%r12),%rdx
0xffffffffba2e5b72 <check_preempt_wakeup+274>: sub 0x50(%rbx),%rdx
……
#---------->@@@@@@@@@@ 1 @@@@@@@@@
0xffffffffba2e5c10 <check_preempt_wakeup+432>: mov 0xfc(%rdi),%edi #%rdi=0xffff967f41400400,對應的是pse->cfs_rq,cfs_rq中偏移0xfc處是throttle_count成員
#%rdi=[%rdi + 0xfc]=[0xffff967f414004fc]=91bf264000000000;
#%edi=0;
0xffffffffba2e5c16 <check_preempt_wakeup+438>: test %edi,%edi #測試%edi是否為0
0xffffffffba2e5c18 <check_preempt_wakeup+440>: je 0xffffffffba2e5aad <check_preempt_wakeup+77> #為0,則跳轉到0xffffffffba2e5aad------->@@@@@@@@@@ 2 @@@@@@@@@
#對應源碼if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
……