• Không có kết quả nào được tìm thấy

13.11.6 ‘Null’ Terminator Character (00h)

14. Event Messages

Event Messages are special messages that are sent by management controllers when they detect significant or critical system management events. This includes messages for events such as ‘temperature threshold exceeded’, ‘voltage threshold exceeded’, ‘power fault’, etc. The Event Message generator (the device generating an Event Message) notifies the system of the event by sending an “Event Request Message” to the Event Receiver Device.

When the Event Receiver gets a valid Event Message, it sends a response message to the generator of the Event Message. It then typically transfers the message to the System Event Log. The Event Receiver does not interpret the Event Messages it receives. Thus, new Event Message types can be added into the system without impacting the Event Receiver implementation.

In some systems, the Event Receiver will need to interrupt the system to notify it that there is an Event Message to be logged. It is desirable for the implementation to have verified and buffered Event Messages in their entirety before issuing such an interrupt. This way, the interrupt handler will not need to wait for the Event Message transmission to complete first.

14.1 Critical Events and System Event Log Restrictions

The platform’s System Event Log is typically of limited size (~3 to ~8 KB, depending on implementation).

Therefore, it is important to refrain from filling the System Event Log with non-critical ‘clutter’.

The System Event Log is primarily intended for capturing Critical Events. These include events that require immediate logging to guarantee that they’re available for ‘post-mortem’ analysis, and events that may require quick system responses, such as system power off, or shutdown.

Critical events include out-of-range temperature and voltage events, hardware failures such as power supply or fan failures, interrupts and signals that affect system operation such as NMIs and PCI PERR (parity error) and SERR (system error). Critical Events also include events that impact system data integrity, such as the uncorrectable ECC errors, or system security, such as ‘chassis intrusion’.

In addition to events that indicate ‘failure’ conditions, events that indicate impending failures are also considered to be critical events. This includes events for reaching ‘warning levels’ for things such as system temperature or error counts. The assertion of ‘Predictive Fault’ information is also considered critical, particularly if the monitored device does not have a direct ‘failure’ indication.

Non-critical events, such as the return to an ‘OK’ state from a ‘Warning’ state should not be sent as critical events.

Non-critical system information is normally obtained by System Management Software polling sensors and management controllers for their status.

Table 14-1, Event Message Reception

NV Storage

Event Receiver SEL Mgr.

PEF

NV Storage I/F

SEL Data IPMB Interface

System Interface PCI Mgmt. Bus

Event Msg. Buffer

BIOS Events SMS Events IPMB Events PCI Mgmt. Bus events

BMC Internal Events External Event Messages

The preceding figure presents a conceptual illustration of the manner in which Event Messages can be handled by a Baseboard Management Controller device that uses an external non-volatile storage device to hold the System Event Log.

The figure shows a BMC with a shared system messaging interface where Event Messages can be delivered from either BIOS, SMS (system management software / OS), or an SMI Handler, and an IPMB interface and through which it can receive Event Messages from the Intelligent Platform Management bus. The BMC can also generate

‘internal’ Event Messages.

When the BMC receives a message via the system or IPMB interfaces, a ‘Message Handler’ function recognizes the message as being for the ‘Event’ functionality in the BMC and passes the message information on to the ‘Event Receiver’ function. The Event Receiver function then takes the message content and issues a request to a ‘SEL Mgr.’

function that formats the message as an SEL Entry and calls the FLASH Interface to have the data stored.

The Event Receiver function is also responsible for driving the response message back through the messaging system. This way, message acknowledgment or error reporting can be provided.

14.2 Event Receiver Handling of Event Messages

This section presents some implementation advice for the Event Receiver device. Please refer to the Intelligent Platform Management Bus Communications Protocol Specification for additional information on Event Message handling.

Since retries of Event Messages are part of the IPMB protocol, there is the potential for the Critical Event Handler to receive more than one Event Messages for the same event. The Seq field allows repeated Event Messages to be discriminated from new Event Messages. Event Messages from a Event Generator that match an earlier Event Message can be ignored.

The option to disable SEL Logging only affects events that are received from the IPMB and PCI Management Bus interfaces. Devices on the IPMB and PCI Management Bus are more likely to generate events ‘automatically’

while the other interfaces are primarily driven by either local or remote software which is assumed to have more control as to whether it generates events or not.

It is recommended that Event Receiver keep a table or queue of the Event Messages it has received. Any new event message from the same source and of the same type, but with a different sequence number, would replace the previous entry.

There are many ways to implement such a table or queue. Any implementation should provide enough tracking support to handle previously received Event Messages for all the ‘known’ Event Generators in the basic system.

For example, a system that has four management controllers on the IPMB that can generate Event Messages should track the previously received Event Messages from those devices.

It is desired that the Event Receiver can track at least six additional Event Generators to cover additional Event Generators that are added into the system. (One common add-on would be an emergency management. Other possible ‘add-on’ event generators would be other systems and peripheral boxes in a “managed cluster”

arrangement).

The Event Receiver implementation should account for the possibility that there can be more different Event Generators than there are slots in the table. This can be managed by implementing the table with an ‘LRU’

deletion algorithm, where the oldest tracked Event Messages are deleted if a new Event Message comes in and the table or queue is full. It can be assumed that there will rarely be more than two event messages that would be in the state where they are to be re-transmitted because of a lost acknowledge.

With this type of design, the most anomalous behavior would be the multiple recording of the same event. This would only be seen under artificially generated ‘stress’ testing and would only be able to occur if there were more event message sources than table slots.

It is also recommended that the Event Receiver implement the ‘Seq Timeout’ as specified in the IPMB Communications Protocol specification.

14.3 IPMB Seq Field use in Event Messages

This section presents a review of the IPMB Seq field and the manner in which it is used when Event Messages are delivered via the IPMB.

The Event Receiver uses the Seq field to reject retried (duplicate) Event Request Messages that it may receive.

The Event Generator will re-send an Event Request Message if it does not receive the Event Response Message. It is possible that the response could get corrupted, causing the Event Generator to re-send the original request even though the Event Receiver had already successfully received it. This is one way that an Event Receiver could get more than one Event Request Message for the same event. When the Event Generator re-sends the Event Request Message, it does so with the same Seq value that it used for the original try. The Event Generator will increment the Seq value the next time it has a new Event Request Message to send.

When Event Messages are delivered via the IPMB, the IPMB message’s Seq field is used to allow Event Receiver to discriminate whether the Event Message is for a new occurrence of a given event, or is a re-transmission of a previous Event Message for that event. The IPMB Seq field should not be confused with being a sequence number for tracking multi-message transfers, as might be its use in other serial protocols.

If the Event Receiver receives an Event Message where the Cmd, NetFn, LUN, and Seq fields match the previous event message from the same Requester, it can assume that the latter message is a re-transmission and return a

‘normal completion’ (00h) as a response to valid, duplicated requests. The Event Receiver does not log duplicate events.

If the Event Receiver does not return a response, the Event Generator retries up to its retry limit count and then concludes that the Event Request failed. Event Generator devices on the IPMB do not send new Event Messages until they’ve finished sending the previous Event Message (including retries). This eliminates the need for the Event Receiver to maintain status for multiple Seq numbers from a single Event Generator.

The data fields for the Event Request Message are not included in the comparison. This is because the Event

Refer to the Intelligent Platform Management Bus v1.0 Communications Protocol Specification for more information on the Seq field.

14.4 Event Status, Event Conditions, and Present State

A sensor tracks present state and Event Conditions. An Event Condition is that set of comparisons applied to the present state and previous state that produces a given Event Status.

A management controller typically polls for Event Conditions. When it sees a condition become active, it updates the Event Status for the sensor. The process of updating the present state Event Status is referred to as Scanning or Sensor Scanning.

The Event Status is those bits that are reported in the Get Sensor Event Status command. As long as scanning is enabled, the Event Status bits will be updated according to changes in Event Status. This is independent of whether Event Messages are generated on a given event. That is, turning off Event Message Generation for a particular state does not turn off scanning or updates of the Event Status.

The Get Sensor Reading command returns State Bits reflecting the present state of the sensor. If the sensor is an

‘auto- re-arm’ sensor, these bits can also represent the Event Status if hysteresis is factored in. Thus, the Get Sensor Events command is optional for auto- re-arm sensors. An application uses the masks in the SDR to determine which bits reflect both current state and event status, and which bits reflect current state only.

The condition that causes an Event Message to be sent is referred to as the 'Event Trigger'. The classification of a sensor indicates whether the corresponding event was discrete, or threshold-based. The sensor classification is part of the Event/Reading Type Code (see section 36.1, Event/Reading Type Codes).

14.5 System Software use of Sensor Scanning bits & Entity Info

System software must ignore any sensor that has the sensor scanning bit disabled - if system software didn’t disable the sensor. This provides an alternate mechanism to allow the management controller to automatically adjust the sensor population without requiring a corresponding change of the sensor data records. For example, suppose the management controller has a way of automatically knowing that a particular temperature sensor will be absent in a given system configuration if a given processor is also absent. The management controller could elect to automatically disable scanning for that temperature sensor. System management software would ignore that sensor even if it was reported in the SDRs.

Note that this is an alternate mechanism that may be useful in some circumstances. The primary mechanism is to use the Entity ID information in the SDRs, and combine that information with presence detection for the entity.

If there is a presence detection sensor for a given entity, then system management software should ignore all other sensors associated with that entity. Some sensors have intrinsic support for this. For example, a sensor-specific Processor sensor has a ‘Processor Presence’ bit. If that bit is implemented, and the processor is absent, any other sensors and non-presence related bits associated with that processor can be ignored. If the sensor type doesn’t have an intrinsic presence capability, you can implement an ‘Entity Presence’ sensor. This sensor solely reports whether a given Entity is present or not.

14.6 Re-arming

Re-arm refers to resetting internal device state that tracks that an event has occurred on the sensor. After a sensor is re-armed the device will re-check the event condition and re-generate the event if the event condition exists.

If the event condition already exists at the time that the re-arm is initiated, then it is possible that the event will be regenerated immediately following the conclusion of the re-arm. The delay from the re-arming of a sensor to the regeneration of the event is device implementation dependent. An initial update in progress bit is provided with

the Get Sensor Reading and Get Sensor Event Status commands to help software avoid getting incorrect event status due to a re-arm.