**Warranty Analytics & Preventive maintenance**

Here we illustrate Warranty Analytics & Preventive maintenance with an example. Plus we show how you can use the computational engine Wolfram Alpha and the programming language R to solve such problems easily.

In preventive maintenance (PM) there are the concepts of the **mean-time-to-failure (MTF)** and the **failure rate**. One is calculated directly from the other. The manufacturer knows the MTF of their product and prints that on the product label or specs sheet. They base their product warranty on that. They pick some period of time for the warranty period where the probability of the device breaking down is not so high as to expose them to excess repair costs yet satisfies the customer by selling them something that is guaranteed to last a certain amount of time. That is typically 1, 2, or 5 years.

Factories and people with lots of machines to maintain, or a data center with lots of servers and disks, also do this kind of analysis. They weigh the cost of replacing a device or its components against the cost of it breaking down within a certain period of time.

**PM: the Classic Analytics Problem and major IoT Application**

Preventive maintenance is the classic analytics problem. It is the first, and the dominate IoT (Internet of Things) application, besides, perhaps, health care monitoring. For example, sensors in truck tires and brakes use cellular networks to phone in temperature and air pressure data so that the trucking company can know when it is time to replace the brakes or tires.

The problem with analytics and PM is knowing which statistical model to use. Pick the wrong probability distribution and the results will be horribly wrong. That is why you need a data scientist on your staff. You should not blindly trust your analytics software without understanding its assumptions and how it works.

**Example: Predicting the Failure of an Electrical Device**

If a machine component fails at a frequency that is independent of any other event then its failure rate follows an **exponential distribution**. This is often applicable in predicting failure of electric components as they do not wear out over time, because they have no moving parts. (Actually silicon flash memory dies suffer what is called wear fatigue and wears out after about 100,000 write/erase cycles. The disk controller knows that and uses wear leveling to prolong the life of the solid state drive.) Brake failure does not follow the exponential distribution, as brake deterioration is a linear function, i.e., its graph is a straight line when plotted against time. It is not a curve, as the brakes do not wear out at an increasing rate over time.

Let’s expand on this example taken from the internet and assume we are looking at an electrical device and want to predict when it might break down. The device is made up electrical components whose failure rate is not at all related to the failure rate of any other component. All of these properties allow us to use the exponential distribution model to predict when the whole device might break down.

Suppose the device has 1,000 components and the failure rate is 0.01. These are not realistic numbers as they are abnormally large. But with analytics it is normal to multiply or divide large or small numbers (called **normalization**) to make them small numbers that are easy to graph and work with. For example, if you try to graph something like 0.0000001 over time, the graph will be so scrunched up you cannot easily read it.

Since the failure rate is 0.01 and we have 1,000 components then on the first day we would expect to have 100 failures.

On average, the device falls apart at this exponent rate:

day | Failure | Remaining |

0 | 0 | 1,000 |

1 | (0.01*100=100 | 1000-100=900 |

2 | (0.01*900)=90 | 900-90=810 |

3 | (0.01*810)=81 | 810-81=729| |

and so on |

The graph of the exponential probability density function is:

where;

λ is the failure rate

e is the constant e

t is time in days

**Wolfram Alpha**

If you do not know what Wolfram Alpha is, now is the time to see this powerful computational engine in action. You can generate a plot from Wolfram Alpha plus solve easy or complicated functions. Click this link and it will draw the graph:

Of course, with statistics we are not interest in time t<0. So you can ignore anything to the left of t=0.

To solve that function in Wolfram Alpha your use this syntax, which is similar to what you would put into Microsoft Excel:

(0.1)exp(-t/10)

**Probability Density Distribution**

Here another view of the probability density distribution drawn as a graph.

The area under the curve from the point where time (t) = 0 is to any other time t, say t=k, is the **cumulative probability **or the probability of the device failing is the area at any time less than or equal to t.

Because this curve flattens slowly as time moves toward 30, 40, and 50 days the cumulative probability at those points will be close to 99%, which is certainly, 100%.

If you remember your calculus, the area under that curve is the definite integral over from 0 to t of the function

which you can also solve with Wolfram Alpha. Click here to see that.

The definite integral is the indefinite integral evaluated at two points. In this case, time t=0 and t whatever you want it to be.

Give that explanation, the probability that the device will fail in t=10 days or less is:

If you want another explanation of what a probability distribution is, watch this short video from The Kahn Academy.

**The R Programming Language**

Anyone working with analytics should learn how to use the R programming language. It is the programming language of chose for data scientists. It and its APIs are used in many if not most analytics software. Plus you can use it like a calculator. Some programmer use Python, but R is superior for doing analytics because of its rich function set and short, albeit cryptic, notation.

R is filled with analytics models. One of which is the exponential distribution. Finding the probability that the device will break down on any one day is as simple as typing;

pexp(day, failure rate)

into its command line interpreter. Using that, here are some cumulative percentages for different days.

Day | probability of failure of that device on that day or earlier |

0 | pexp(0,0.1)=0 (You would not expect it to break down as soon as you turn it on.) |

1 | pexp(1,0.1)=9.5% |

10 | pexp(10,0.1)=63% |

20 | 86% |

30 | 95% |

50 | 99% |

**Wrapping Up**

So the goal of the PM program should be to send technicians out into the field to replace the device before its probability of failure is within some threshold. That threshold will be a costs-versus-benefit decision of the cost of letting the machine break down versus the cost of going out to fix it. The cost of letting it break down include the cost in idle time at the plant.