New Measure Offers Better Way to Estimate Benefit of Therapies in Clinical Trials

Written by: Rob Levy

When researchers report the survival effect of a cancer drug in a clinical trial, they almost always use a measure called hazard ratio. For all its ubiquity, however, hazard ratio has a hazard of its own: it’s a bit tricky to understand. 

A 2019 study found that 47% of researchers misinterpreted hazard ratio (HR) when it was used as a yardstick to compare cancer treatments in clinical trials. Forty percent of the researchers thought the term indicated a reduction in patients’ absolute risk of dying after a specific amount of time, but it actually shows the relative benefit of one treatment over another. 

In a new paper, Dana-Farber’s Hajime Uno, PhD, offers an alternative that is less susceptible to confusion. His proposed measure, known as average hazard with survival weight, is just as statistically powerful as HR but easier to use. 

“The interpretation of hazard ratio can be very difficult, even for experienced researchers,” says Uno, senior author of the paper published in Statistics in Medicine. “It’s important that clinical trial results be interpreted correctly to ensure that treatment decisions are based on solid statistical evidence.” 

Dana-Farber's Hajime Uno, PhD
Dana-Farber’s Hajime Uno, PhD

Hazard ratio is an estimate of the intensity with which a particular event will happen in one group vs. another over time. The hazard ratio for broken bones, for example, is quite high if the comparison is between snowboarders and chess players. 

In cancer research, HR has been used for decades in nearly all clinical trials of potential therapies. It calculates the hazard of deaths among patients receiving a novel therapy over a specific period compared to the hazard for patients receiving a placebo or other treatment. A hazard ratio of one means there is no difference in survival between the two groups. A hazard ratio of 0.8 would mean the new therapy has reduced the hazard of deaths by 20%. 

While that is clear enough, the numbers may be less informative than they appear, Uno explains. “HR doesn’t give an indication of baseline hazard — the intensity of mortality of the control group that is given a placebo. If the baseline intensity is very high, an HR of 0.8 can be quite significant. If it’s low, 0.8 might not mean much clinically.” 

He cites two cases as an example. In the first case, 50% of patients receiving a placebo die within five years and the HR for the treatment is 0.8. In the second case, only 1% of placebo recipients die within five years and the HR is also 0.8. Although the HRs are the same, the first improvement is obviously more meaningful than the second. That isn’t evident, however, from the HR itself. 

A further problem with HR is known as the proportional hazard assumption — the premise that the ratios of hazard from two groups stay constant over time. Often, that assumption doesn’t hold. Different therapies, for example, operate on different timetables: the benefits of some immunotherapies, for example, can take longer to appear than many chemotherapies. By failing to account for such differences, hazard ratio can give a distorted picture of the relative benefits of different approaches to treatment. 

Average hazard with survival weight (AH-SW) avoids these shortcomings. It’s based on a statistical measure called person-time incidence rate, which factors in the period that patients are potentially tracked in a clinical trial.   

Uno explains: “If you follow 100 patients for a year — which would be 100 years of person-time — we divide that into the number of times we would observe an event such as a patient’s death or cancer recurrence over that period. We can calculate this for each group of patients in the trial with the effect of lost-to-follow-up being removed. 

“Since AH-SW is not affected by study-specific proportions of lost to-follow-up subjects, we can obtain an unbiased estimate of the effectiveness of different treatments by comparing the results for each group,” he continues. “It can increase the likelihood that results from clinical studies are correctly interpreted and generalized to future populations.”