Run-time Hardware and Software Co-Monitoring

Project Overview:

While significant advances have been made in the last few decades in the development practice for software and hardware, the state of practice in driver development suffers from numerous defects. First, the driver/hardware interfaces lack rigorous specification: specifications, when available, are in ambiguous English, not amenable to mechanized analysis. Second, although transient hardware failure has been known to occur occasionally, the driver and hardware are tested and evaluated as one entity and it is not easy to separate bugs in driver and hardware. To facilitate device and driver test and more precisely troubleshoot driver and hardware bugs, we propose to simultaneously monitor hardware device and driver states for consistency and property checking.

Run-time monitoring essentially provides run-time verification and protection for device drivers.  Comparing to static-based approach, Run-time monitoring provide multiple benefits to users and system developers: First, the state and property checking is running in a real environment, whereas static approaches can only test software by either enumerating a wide range of possible inputs and system events or through symbolic execution. Second, Run-time monitoring could provide detailed execution traces for each problem it finds, including the observed state transition history of device and driver. The trace information is useful to analyze the problem and potentially can be used differentiate driver problems from hardware failures. The run-time verification also has the advantages of being able to control the execution path of driver software, and prevent device or driver problem from affecting the whole system.

This page summarizes our initial effort to design, develop, and evaluate run-time co-monitoring of driver and hardware.


Newly Found Bugs:

This bug happens after the e100 card initialization or a reset.  The driver ignored the status of the device and went ahead to issue command. A potential state mismatch occurs between hardware and driver, which could cause problems from input mistakes to driver crashes. Bug is reported to and confirmed by Microsoft (Jan 2010).

This bug happens when device is in error states but the driver proceeds as if the device functions properly.  Bug is reported to and confirmed by Microsoft (Jan 2010).

This bug happens during the USB device discovery stage when the USB host controller makes the Get Descriptor request. The bug only happens when client USB device acts in composite mode with multiple end points and with a long (256 bytes) USB device descriptor.  Bug was reported and confirmed by Linux USB-Devel List (Sep 2009).

Previous Known Bugs:

We also experiment with previous known bugs in device drivers to test and validate our work. To explore the benefit of run-time bug detection, we focus on the bugs that are tightly related to hardware status and potentially can be mistakenly identified as transient hardware failures. We also expect that these bugs could be detected by monitoring both hardware and software status. We made efforts to mine the previously found bugs found in linux device drivers that are potentially difficult to separate from transient hardware failures.  Our initial effect of bug mining from Linux driver updates is based on the well-known linux drivers for Intel PCI Ethernet Card.  A list of bugs used in our study is presented here:

1.       IO Control Hub (ICH) bug at half-duplex mode

2.       eeprom byte-order bug

3.       driver bug with pointers

4.       violation of hardware action sequence with write actions to MID control register

5.       driver bug that fails to clear the transmission queue

6.       violation of hardware action sequence with DMA sync and Memory compare

7.       driver bug with write flush and lock sequence