Advertisement

The Hazard of Non-proportional Hazards in Time to Event Analysis

Open ArchivePublished:August 04, 2021DOI:https://doi.org/10.1016/j.ejvs.2021.05.036

      Keywords

      This Edutorial illustrates the relevance of statistical method reporting and testing of model assumptions using the example of the Cox proportional hazards model, a frequently used statistical method to compare time to event data among treatment groups.
      In medical research, time to event analyses are of paramount interest. Their main advantage is that they not only compare a binary outcome, but also reflect the length of time until the event occurs, or the subject leaves the study. The latter is given when no event during the follow up occurs and subjects are censored at the last known time point during follow up. Time to event analyses assess the risk of an event occurring at any time throughout the study (i.e., the hazard rate) for each treatment arm. The hazard rate of two groups can then be compared by calculating the hazard ratio (HR). The HR represents the likelihood of an event in one group relative to the other at any time during follow up. Cox proportional hazard models are very attractive since they allow adjustments for potential confounders. However, like any other statistical method, model assumptions must be checked, and the obtained results are only valid if those assumptions are met. One distinctive concept of Cox models is the proportional hazard assumption (PHA, Fig. 1). It requires that the hazard ratio of the compared study groups be proportional over time. In other words: the risk of an event occurring during the study period in one group is a constant multiplier of the compared group. Consequently, the HR is a valid estimator of the overall treatment effect. Figure 2A shows the replicated results of a recently published study.
      • Bleyer A.J.
      • Scavo V.A.
      • Wilson S.E.
      • Browne B.J.
      • Ferris B.L.
      • Ozaki C.K.
      • et al.
      A randomized trial of vonapanitase (PATENCY-1) to promote radiocephalic fistula patency and use for hemodialysis.
      The HR of 0.66 (95% confidence interval 0.43 – 1.00) implies that the risk of an event was one third smaller for the patients in the treatment group at any time throughout the entire follow up compared with the control group.
      Figure 1
      Figure 1The proportional hazard assumption. Cox proportional hazard models assess the risk of an event occurring in time to event analyses (i.e. the hazard rate) and allow for multivariable adjustments. Data were simulated using the R runif() and rnorm() commands with n = 1 000. HR = hazard ratio.
      Figure 2
      Figure 2Testing the proportional hazard assumption with an example study for (A) replication of time to event analysis and (B) test or proportionality of hazards in group variable. The data were replicated from Bleyer et al. using digitising methods.
      • Bleyer A.J.
      • Scavo V.A.
      • Wilson S.E.
      • Browne B.J.
      • Ferris B.L.
      • Ozaki C.K.
      • et al.
      A randomized trial of vonapanitase (PATENCY-1) to promote radiocephalic fistula patency and use for hemodialysis.
      (A) Kaplan–Meier Estimator with Cox model. The hazard ratio (HR) of 0.66 (95% confidence interval [CI] 0.43 – 1.00) indicates a longer event free follow up in the treatment group. (B) Testing of the proportional hazard assumption. The analysis of the scaled Schoenfeld residuals (scaledsch) for the treatment group variable indicates a non-zero slope. The black line represents the smoothed size of the residuals with the corresponding 95% CI (grey area). The average slope is non-zero indicating that the proportional hazard assumption is violated.
      The PHA can be assessed informally by visual inspection of the Kaplan–Meier estimator. Disproportional courses of the survival curves can indicate changes in the HR over time (Fig. 1). Yet, if the study groups are small, disproportional curves might not result in a violation of the PHA and if study groups are very large, disproportional hazards might not be detectable by visual inspection only. In Figure 2A, the survival curves were similar at the beginning of the study. After approximately 100 days, the hazard was bigger in the control group, whereas after 180 days, the curves seem to be parallel again, indicating similar hazard rates in both groups. However, testing of the PHA was not reported in this study.
      The statistical concept for testing of the PHA is similar to testing of residuals in simple linear regression: the difference of the observed time to event for each study participant and the corresponding estimated time to event from the Cox model are plotted against time and assessed for slope (Fig. 2B). Since the effect of each variable is assumed to be constant over time, the size of the residuals is expected to be constant over time as well. Figure 2B shows the analysis of the residuals for the treatment group variable of the example study in Figure 2A. The residuals were in fact not constant over time: The blue smoothed line is not constant in slope, and the average slope is negative (p = .016 for non-zero slope).
      If non-proportional hazards are present, reporting of the overall HR may be misleading. Additionally, the statistical tests to assess an overall difference between the compared groups lose power.
      Throughout the medical literature, reporting of the statistical methods is often only rudimentary.
      • Rulli E.
      • Ghilotti F.
      • Biagioli E.
      • Porcu L.
      • Marabese M.
      • D'Incalci M.
      • et al.
      Assessment of proportional hazard assumption in aggregate data: a systematic review on statistical methodology in clinical trials using time-to-event endpoint.
      • Trinquart L.
      • Jacot J.
      • Conner S.C.
      • Porcher R.
      Comparison of treatment effects measured by the hazard ratio and by the ratio of restricted mean survival times in oncology randomized controlled trials.
      • Kuemmerli C.
      • Sparn M.
      • Birrer D.L.
      • Müller P.C.
      • Meuli L.
      Prevalence and consequences of non-proportional hazards in surgical randomized controlled trials.
      A well known example of PHA violation but excellent reporting is the Endovascular Aneurysm Repair (EVAR) Trial 1.
      • Patel R.
      • Sweeting M.J.
      • Powell J.T.
      • Greenhalgh R.M.
      Endovascular versus open repair of abdominal aortic aneurysm in 15-years' follow-up of the UK endovascular aneurysm repair trial 1 (EVAR trial 1): a randomised controlled trial.
      Due to non-proportional hazards during the 12.7 year follow up period, the investigators decided to split the follow up into four time groups of which each met the PHA. The underlying reason for non-proportionality was the varying hazard for aneurysm related mortality. There was a short term benefit in the EVAR group with a lower aneurysm related mortality during the first six months (HR 0.47, 95% CI 0.23 – 0.93), similar hazard rates from then on to eight years, but a higher aneurysm related mortality in the EVAR group thereafter (HR 5.82, 95% CI 1.64 – 20.65). In this situation, the overall HR does not sufficiently summarise the data and splitting the follow up time is one possible solution to address the issue of non-proportional hazards. This trial also demonstrates the importance of adequate follow up times. If the follow up was stopped after 12 years, the full pattern of the treatment effect would not have been detected.
      There are multiple other options at one’s disposal if the PHA is violated: the Cox models can be extended by time varying covariates. This can be helpful if for example a substantial proportion of patients stops smoking during follow up. The analysis can also be stratified by the violating variable if such a one can be identified. An example would be stratification by a cardiovascular risk factor. However, there are non-parametric alternatives available that make no assumption on the underlying distribution of the data and allow treatment effects to be non-proportional. The restricted mean survival time (RMST) or Kaplan–Meier estimates with inverse probability weighting are available.
      • Rulli E.
      • Ghilotti F.
      • Biagioli E.
      • Porcu L.
      • Marabese M.
      • D'Incalci M.
      • et al.
      Assessment of proportional hazard assumption in aggregate data: a systematic review on statistical methodology in clinical trials using time-to-event endpoint.
      ,
      • Cole S.R.
      • Hernan M.A.
      Adjusted survival curves with inverse probability weights.
      Figure 3 shows the RMST for the example study. The RMST indicates the mean survival time of all participants, restricted to the last available follow up time.
      • Rulli E.
      • Ghilotti F.
      • Biagioli E.
      • Porcu L.
      • Marabese M.
      • D'Incalci M.
      • et al.
      Assessment of proportional hazard assumption in aggregate data: a systematic review on statistical methodology in clinical trials using time-to-event endpoint.
      The RMST was similar in both groups and the ratio between the two RMSTs was 0.91 (95% CI 0.81 – 1.02), thus indicating a non-significant difference, p = .11. Interpretation of the RMST Ratio is straightforward. The average event free follow up time in the control group was 91% of the event free follow up time in the treatment group. The RMST is especially useful if almost all patients experience the event and hence, one is interested in the analysis of the event free follow up time rather than the comparison of rare events.
      Figure 3
      Figure 3Time to event analysis by the restricted mean survival time (RMST) for (A) treatment and (B) control. The data were replicated from Bleyer et al. using digitising methods.
      • Bleyer A.J.
      • Scavo V.A.
      • Wilson S.E.
      • Browne B.J.
      • Ferris B.L.
      • Ozaki C.K.
      • et al.
      A randomized trial of vonapanitase (PATENCY-1) to promote radiocephalic fistula patency and use for hemodialysis.
      The mean event free follow up time was restricted to the longest follow up time, t = 410 days. The RMST was 30 days longer for the treatment group. This difference was not statistically significant, p = .11. CI = confidence interval.
      Apart from these statistical obstacles that can impair time to event analysis, researchers must also address the concept of non-informative censoring. Censoring of participants must not be in a causal relationship to the treatment provided in their study group. An example would be dropouts in one study group due to side effects and subsequent discontinuation of the assigned treatment or follow up visits. Therefore, researchers should aim for a maximum of follow up information of all study participants and report completeness of follow up using standardised measures, i.e. the Follow up Index.
      In conclusion, model assumptions should be tested and reported with sufficient details to ensure valid study results and allow critical appraisal of the analysis. As shown in this Edutorial, non-proportional hazards in Cox models can have a dramatic impact and even lead to a change in the outcome direction. The following steps are proposed as part of a standardised statistical reporting:
      • (1)
        Declare all conducted statistical tests and identify the underlying model assumptions.
      • (2)
        Verify all model assumptions and report the results concisely.
      • (3)
        Adapt the analysis if assumptions are not met.
      As a further action to enhance reporting quality, statistical consulting should be a part of the peer reviewing process. A concept that already has been implemented successfully in the European Journal of Vascular and Endovascular Surgery.

      References

        • Bleyer A.J.
        • Scavo V.A.
        • Wilson S.E.
        • Browne B.J.
        • Ferris B.L.
        • Ozaki C.K.
        • et al.
        A randomized trial of vonapanitase (PATENCY-1) to promote radiocephalic fistula patency and use for hemodialysis.
        J Vasc Surg. 2019; 69: 507-515
        • Rulli E.
        • Ghilotti F.
        • Biagioli E.
        • Porcu L.
        • Marabese M.
        • D'Incalci M.
        • et al.
        Assessment of proportional hazard assumption in aggregate data: a systematic review on statistical methodology in clinical trials using time-to-event endpoint.
        Br J Cancer. 2018; 119: 1456-1463
        • Trinquart L.
        • Jacot J.
        • Conner S.C.
        • Porcher R.
        Comparison of treatment effects measured by the hazard ratio and by the ratio of restricted mean survival times in oncology randomized controlled trials.
        J Clin Oncol. 2016; 34: 1813-1819
        • Kuemmerli C.
        • Sparn M.
        • Birrer D.L.
        • Müller P.C.
        • Meuli L.
        Prevalence and consequences of non-proportional hazards in surgical randomized controlled trials.
        Br J Surg. 2021; ([Epub ahead of print])https://doi.org/10.1093/bjs/znab110
        • Patel R.
        • Sweeting M.J.
        • Powell J.T.
        • Greenhalgh R.M.
        Endovascular versus open repair of abdominal aortic aneurysm in 15-years' follow-up of the UK endovascular aneurysm repair trial 1 (EVAR trial 1): a randomised controlled trial.
        Lancet. 2016; 388: 2366-2374
        • Cole S.R.
        • Hernan M.A.
        Adjusted survival curves with inverse probability weights.
        Comput Methods Programs Biomed. 2004; 75: 45-49

      Comments

      Commenting Guidelines

      To submit a comment for a journal article, please use the space above and note the following:

      • We will review submitted comments as soon as possible, striving for within two business days.
      • This forum is intended for constructive dialogue. Comments that are commercial or promotional in nature, pertain to specific medical cases, are not relevant to the article for which they have been submitted, or are otherwise inappropriate will not be posted.
      • We require that commenters identify themselves with names and affiliations.
      • Comments must be in compliance with our Terms & Conditions.
      • Comments are not peer-reviewed.