Menu â–¾ â–´

#54 ipmi_cmd (GET_SEL_ENTRY) returns response with incorrect id.

ipmiutil-3.1.x
wont-fix
None
2
2023-01-09
2022-07-21
Naveen
No

Hello Team,

I and my team have been using ipmiutil getevt for a while now. As per our understanding ipmiutil getevt seems to be using function called ipmi_cmd(GET_SEL_ENTRY, ....) to get the response that contains, current_record_id, next_record_id and the record. We have noticed that sometimes an older log id is present in place of next_record_id. Could you please provide your insights on this?

I have attached few debug logs from /var/log/ipmiutil-evt.log, when this happened.
(when recid (currentrecordid) matches newid (current_record_id seen in response)

get_sel(dff) rv=0 cc=0 id=dff next=7
sel ok, id=dff next=7
get_sel(7) rv=0 cc=0 id=7 next=8
recid=7 newid=7 next=8
get_event ret = 0
got event id 0007, sensor_type = 04
event data: 07 00 02 0e 75 60 5e 20 00 04 04 40 08 01 ff ff
0007 03/05/20 03:42:06 INF BMC  Fan #40 Fan 1 Present  Present 08 [01 ff ff]
sync: recid=7 time=5e60750e
run(/usr/local/bin/bmcevent.sh $1), ret = 0
Waiting 0 seconds for an event ...
get_sel(7) rv=0 cc=0 id=7 next=8
sel ok, id=7 next=8
get_sel(8) rv=0 cc=0 id=8 next=9
recid=8 newid=8 next=9

(when recid (currentrecordid) does not match newid (current_record_id seen in response)

get_sel(0) rv=0 cc=0 id=b4 next=2200
sel ok, id=0 next=2200
get_sel MISMATCH: recid=2200 newid=0 next=144
get_sel(2200) rv=0 cc=0 id=0 next=144
recid=2200 newid=0 next=144
get_event ret = 0
got event id 2200, sensor_type = 10
event data: 00 00 28 23 27 23 62 20 00 04 10 07 6f 02 ff ff
0000 Type28 INF 23 27 23 62 20 00 04 10 07 6f 02 ff ff
sync: recid=2200 time=62232723
run(/usr/local/bin/bmcevent.sh $1), ret = 0
Waiting 0 seconds for an event ...
get_sel MISMATCH: recid=2200 newid=0 next=144
get_sel(2200) rv=0 cc=0 id=0 next=144
sel ok, id=2200 next=144
get_sel MISMATCH: recid=144 newid=0 next=144
get_sel(144) rv=0 cc=0 id=0 next=144
recid=144 newid=0 next=144
get_event ret = 0
got event id 0144, sensor_type = 00
event data: 00 00 28 23 25 23 00 00 00 00 00 00 00 00 64 0e
0000 Type28 INF 23 25 23 00 00 00 00 00 00 00 00 64 0e
sync: recid=144 time=232523

Discussion

  • Andy Cress

    Andy Cress - 2022-07-21

    OK, this seems to be occurring when the recid requested is a large number (e.g. 2200) but in the interim, the SEL has been cleared, so the current last recid is smaller (e.g. 0144). It sounds like we need a method for getevt to notice that the SEL has been cleared and reset its last recid.
    If the getevt service were restarted, it would get back to normal, but it needs to self-correct.

     
  • Naveen

    Naveen - 2022-07-22

    Thank you for your response Andy,

    But, No, that was not the case, if you observe below log closely

    get_sel(0) rv=0 cc=0 id=b4 next=2200
    sel ok, id=0 next=2200
    

    we are requesting recid= 0, usual response should be 1 or ffff, but why 2200? ( as there was no SEL log identified by 2200, when this log happened)

    and in the below log

    get_sel(dff) rv=0 cc=0 id=dff next=7
    sel ok, id=dff next=7
    

    recid is dff, the next record is supposed to be e00 or ffff (LAST_REC), but why is it going back to 7 and relogging everything till dff.

     

    Last edit: Naveen 2022-07-22
  • Andy Cress

    Andy Cress - 2022-07-27

    Could you provide the 'ipmiutil sel' output (without debug) from this system, so that I can see the record ids in order?

     
  • Andy Cress

    Andy Cress - 2022-07-27
    • status: open --> accepted
     
  • Naveen

    Naveen - 2022-07-28

    please find attached.

     
  • Naveen

    Naveen - 2022-07-28

    Apologies Andy,
    The above file may not be helpful in your debugging. I had to clear old-sel logs as free records were zero. I have attached the ipmiutil_event.log , that was captured earlier. See if it helps.

     

    Last edit: Naveen 2022-07-28
  • Andy Cress

    Andy Cress - 2023-01-09

    Sorry for the delay. I just realized that I didn't respond to your latest ipmiutil_evt.log. It includes records from 0122 through 03f5 but doesn't show the 0dff records or beyond (before the SEL was cleared). It looks like we don't have enough evidence to find out what went wrong in this instance.
    My best guess is that one of the earlier records had a malformed event that caused it to point to 2200.

     

    Last edit: Andy Cress 2023-01-09
  • Andy Cress

    Andy Cress - 2023-01-09
    • status: accepted --> wont-fix
     

Log in to post a comment.