Deep Time-to-Failure: predicting failures, churns and customer lifetime with RNN

“Survival analysis in evolution”

On 20th September 2018 at Spirit De Milan, Data Science Milan has organized an event as part of IBM #Party Cloud: Deep Time to Failure.

Machineries and customers are an asset for companies as well are subjected to failure: break down for machineries and churn for customers.


“Traditional Survival Analysis”, by Gianmario Spacagna, Chief Scientist at Cubeyou

Predict failure requires survival study and Gianmario in the first part of his talk has explained traditional method for survival analysis.

Survival analysis is used to analyse data in which the time until the event is of interest. The response is often referred to as a failure, survival time or event time.

The survival function S(t) gives the probability that a subject will survive past time t and has the following properties:

-Monotonically decreasing;


-The probability of surviving past time 0 is 1; as time goes to infinity, the survival curve goes to 0.

In theory, the survival function is smooth. In practice, we observe events on a discrete time scale (days, weeks, etc.).

The survival model can be described by the hazard function, h(t), that is the instantaneous rate at which events occur, given no previous events, or by the cumulative hazard function H(t) that describes the accumulated risk up to time t.

Given one of these previous functions S(t), H(t), h(t) is possible to derive the other two ones and to derive the time-to-failure, namely the remaining time work for a device or other product.

With incomplete raw data (truncated or censored), raw empirical estimators will not produce good results and in this scenario two techniques are available: the Kaplan-Meier product limit estimator that can be used to generate a survival distribution function or the Nelson-Aalen estimator that can be used to generate a cumulative hazard rate function.

The survival distribution can be estimated by making parametric assumptions: for this task has been used Weibull distribution that is applied in many real-world use cases.

They are examples of univariate analysis and useful when the predictor variable is categorical.

An alternative method is the Cox proportional hazards regression analysis, which works for both quantitative predictor variables and for categorical variables. Furthermore, the Cox regression model can assess simultaneously the effect of several risk factors on survival time. The idea behind the Cox model is to separate the estimation of the heterogeneity parameter on one hand and the baseline hazard function on the other one. When the proportional hazard hypothesis are not satisfied, is possible to turn into Aalen’s additive model where coefficients can be parametric, semiparametric or nonparametric.



“Time-to-failure using Weibull and Recurrent Neural Network (RNN)”, by Gianmario Spacagna, Chief Scientist at Cubeyou

In the second part of the talk Gianmario go deeply into the wtte-rnn application (Weibull time-to-event RNN).

In time-to-failure Weibull distribution gives a distribution for which the failure rate is proportional to a power of time. It’s flexible and explained by two parameters: α and β. The first one is the scale parameter of the distribution and the second one is the shape parameter.

-β<1 indicates that the failure rate decreases over time;

-β=1 indicates that the failure rate is constant over time and the shape is an exponential distribution;

-β>1 indicates that the failure rate increases with time;

-β=2 the shape is a log-normal distribution;

-3,5<β<4 the shape is a gaussian distribution.

The task is to estimate α and β by Recurrent Neural Networks.

Recurrent neural networks are a kind of neural network where outputs from previous time steps are taken as inputs for the current time step, with one time of step there is a generation of a cycle.

RNNs are fit and make predictions over many time steps.

Considering multiple time steps of input (X(t), X(t+1), …), multiple time steps of internal state (u(t), u(t+1), …), and multiple time steps of output (y(t), y(t+1), …) the previous cycle is removed and outputs (y(t) and u(t)) from previous time step are passed into the network as inputs for processing the next time step, so the network doesn’t change between the unfolded time steps. Same weights are used for each time step, only the outputs and the internal states differs.

Gianmario has showed how wtte-rnn works and has explained a practical application: a dataset of jet of engines from NASA.

Read and apply the code from the tutorial

Author: Claudio Giancaterino

Actuary & Data Science Enthusiast

Follow up