Describe the bug
There is an issue in the way pack years is calculated for the PUMF databases that results in a negative value for certain individuals.
This bug specifically affects occasional smokers who are in the 50 -54 age range and who started smoking when they were 50 years or older.
The bug is in this line of the pack years function pasted below,
ifelse2(SMKDSTY_A == 3, (pmax((SMK_05B * SMK_05C / 30), 1) / 20) *
(DHHGAGE_cont - SMKG01C_cont)
specifically in the expression DHHGAGE_cont - SMKG01C_cont. In the study I'm running, there is an individual whose DHHGAGE_cont value is 52 and whose SMKG01C_cont value is 55 resulting in a negative value.
The crux of the issue is how we're converting the categorical variables age and age started smoking variables within each cycle into their continuous harmonized equivalent.
For DHHGAGE_cont, we're setting the continuous age variable to the midpoint of the lower and upper bound of each category, so for this individual their categorical age value was 50 - 54 which is being set to 52 in the continuous age value.
For the SMKG01C_cont, the individual has a value of 55 which is derived from the categorical age started smoking variable category 50 years or more.
Possible solution
A possible fix for this is to change the value of SMKG01C_cont so that its set to 51 for the last category which would always make it lower than the value of DHHGAGE_cont for categories greater than age 50.
Pros:
- Fixes this issue
- SMKG01C_cont is only used for pack years so changing this would not affect other variables
- Affects a very small portion of the study population as shown below
- cchs2001: 0.11%
- cchs2003: 0.094%
- cchs2005: 0.094%
- cchs2007 - 2008: 0.090%
- cchs2009-2010: 0.078%
- cchs2010: 0.069%
- cchs2011-2012: 0.067%
- cchs2012: 0.063%
- cchs2013-2014: 0.087%
- cchs2014: 0.081%
Cons:
- Could result in overestimating pack years for this sub-population.
Describe the bug
There is an issue in the way pack years is calculated for the PUMF databases that results in a negative value for certain individuals.
This bug specifically affects occasional smokers who are in the 50 -54 age range and who started smoking when they were 50 years or older.
The bug is in this line of the pack years function pasted below,
specifically in the expression
DHHGAGE_cont - SMKG01C_cont. In the study I'm running, there is an individual whoseDHHGAGE_contvalue is 52 and whoseSMKG01C_contvalue is 55 resulting in a negative value.The crux of the issue is how we're converting the categorical variables age and age started smoking variables within each cycle into their continuous harmonized equivalent.
For
DHHGAGE_cont, we're setting the continuous age variable to the midpoint of the lower and upper bound of each category, so for this individual their categorical age value was 50 - 54 which is being set to 52 in the continuous age value.For the
SMKG01C_cont, the individual has a value of 55 which is derived from the categorical age started smoking variable category 50 years or more.Possible solution
A possible fix for this is to change the value of
SMKG01C_contso that its set to 51 for the last category which would always make it lower than the value ofDHHGAGE_contfor categories greater than age 50.Pros:
Cons: