I have a scenario in mind and was wondering if anyone had any suggestions on an approach.
Lets say I have a dataset where I have captured various attributes about blade servers in a rack. I have internal temperature, fan speed, disk reads, cpu temperature and failure events. Each of these are continuous variables except for the failure event, which has only two states, 'true' or 'false' which indicates whether a failure occurred at that point in time.
The table looks like this:
MachineID | Timestamp | Temperature | FanSpeed | Reads | CPUTemp | Failure |
At the time of failure, only the MachineID, Timestamp and Failure values will be populated in the table. The failures at this time are not categorized or discriminated from each other in any way although they will be in the future. We're looking to use a mining algorithm to determine which variable is the best predictor of failure, or combination of variables. What do you think is the right approach How might this approach change once the failures do become categorized and differentiated
Thanks,
Frank