Implementing Cause-of-Death Data Checks with codeditr

The World Health Organization’s CoDEdit electronic tool is intended to help producers of cause-of-death statistics in strengthening their capacity to perform routine checks on their data. This package ports the original tool built using Microsoft Access into R. The aim is to leverage the utility and function of the original tool into a usable application program interface (API) that can be used for building more universal applications or for creating programmatic scientific workflows aimed at routine, automated, and large-scale monitoring of cause-of-death data.

Workflows for cause-of-death data processing and data quality checks using codeditr

Perform checks on existing input data for CoDEdit tool

Using the icd10_example dataset which is a dataset already formatted into a compatible structure required by the CoDEdit tool, we can perform a check on this dataset to see possible issues in its formatting and structure before using with the CoDEdit tool.

cod_check_codedit_input(icd10_example)
#> # A tibble: 3,613 × 8
#>    sex_check sex_check_note  age_check age_check_note code_check code_check_note
#>        <int> <fct>               <int> <fct>               <int> <chr>          
#>  1         0 No issues with…         0 No issues wit…          0 Cause of death…
#>  2         0 No issues with…         0 No issues wit…          0 Cause of death…
#>  3         0 No issues with…         0 No issues wit…          0 Cause of death…
#>  4         0 No issues with…         0 No issues wit…          0 Cause of death…
#>  5         0 No issues with…         0 No issues wit…          0 Cause of death…
#>  6         0 No issues with…         0 No issues wit…          0 Cause of death…
#>  7         0 No issues with…         0 No issues wit…          0 Cause of death…
#>  8         0 No issues with…         0 No issues wit…          0 Cause of death…
#>  9         0 No issues with…         0 No issues wit…          0 Cause of death…
#> 10         0 No issues with…         0 No issues wit…          0 Cause of death…
#> # ℹ 3,603 more rows
#> # ℹ 2 more variables: dod_check <int>, dod_check_note <fct>

The result is a data.frame the columns of which are the check codes and check notes for each of the four types of check performed on the data.

  1. Check input sex

The CoDEdit tool requires sex to be provided as a value of 1 for males and a value of 2 for females. If the input value for sex does not use this format, the check will output a note saying that the sex value is missing.

  1. Check input age

The CoDEdit tool requires age to be recorded as two values - age value and age type. Age value is the integer value for age based on age type which can either be in days (D), months (M), or years (Y).

Age value Age type
0 - 27 D (days)
1 - 11 M (months)
1 - 125 Y (years)

The check uses this heuristic in determining if the age value and age type combination provided in the input data is appropriate for input into CoDEdit.

  1. Check code

A low level check for cause-of-death code is performed which basically checks whether the values for the cause-of-death code are missing or not.

  1. Date of death code

A low level check for date of death is performed which basically checks whether the values for the date of death are missing or not.

Structure raw cause-of-death data for input into CoDEdit tool

Given a raw cause of death dataset that contains information on sex, date of birth, date of death, and cause-of-death code, we can format this into a compatible structure required by the CoDEdit tool.

cod_structure_input(
  df = cod_data_raw_example, 
  sex = "sex", dob = "dob", dod = "dod", code = "code", id = "id"
)
#> # A tibble: 20 × 6
#>    FreeId   Sex `Age Value` `Age Type` Code          `Death Date`
#>     <int> <int>       <int> <chr>      <chr>         <chr>       
#>  1   4136     1        1318 Y          NE84&XA6KU8   2023        
#>  2   4137     2        1318 Y          2B6D&XS9R     2023        
#>  3   4138     1        1318 Y          2C82&XS9R     2023        
#>  4   4139     1        1318 Y          CA40.Z&XK9J   2023        
#>  5   4140     2        1318 Y          6C40.3&XS25   2023        
#>  6   4141     1        1318 Y          6C40.3&XS25   2023        
#>  7   4142     1        1318 Y          DB94.1&XT8W   2023        
#>  8   4143     2        1318 Y          BD40.Z        2023        
#>  9   4144     2        1318 Y          2C76.Z&XA8QA8 2023        
#> 10   4145     1        1318 Y          6C40.3&XS25   2023        
#> 11   4146     2        1318 Y          8B11.5Z       2023        
#> 12   4147     1        1318 Y          2B90.Y&XH74S1 2023        
#> 13   4148     1        1318 Y          BD10&XT5R     2023        
#> 14   4149     1        1318 Y          1G41          2023        
#> 15   4150     1        1318 Y          BD10&XT5R     2023        
#> 16   4151     2        1318 Y          CA40.Z&XB25   2023        
#> 17   4152     2        1318 Y          BA01          2023        
#> 18   4153     1        1318 Y          1G41          2023        
#> 19   4154     2        1318 Y          BB40          2023        
#> 20   4155     1        1318 Y          1B91          2023

This output can then be stored as an .xlsx file and then uploaded into the CoDEdit tool.

Perform all checks on cause-of-death data

The cod_check_code() function performs all the checks implemented by the CoDEdit tool.

cod_check_code(
  cod_data_raw_example$code, version = "icd11", 
  sex = cod_data_raw_example$sex, age = cod_data_raw_example$age
)
#> # A tibble: 20 × 12
#>    cod_check_structure cod_check_note_structure    cod_check_ill_defined
#>                  <int> <fct>                                       <int>
#>  1                   0 No issues found in CoD code                     0
#>  2                   0 No issues found in CoD code                     0
#>  3                   0 No issues found in CoD code                     0
#>  4                   0 No issues found in CoD code                     0
#>  5                   0 No issues found in CoD code                     0
#>  6                   0 No issues found in CoD code                     0
#>  7                   0 No issues found in CoD code                     0
#>  8                   0 No issues found in CoD code                     0
#>  9                   0 No issues found in CoD code                     0
#> 10                   0 No issues found in CoD code                     0
#> 11                   0 No issues found in CoD code                     0
#> 12                   0 No issues found in CoD code                     0
#> 13                   0 No issues found in CoD code                     0
#> 14                   0 No issues found in CoD code                     0
#> 15                   0 No issues found in CoD code                     0
#> 16                   0 No issues found in CoD code                     0
#> 17                   0 No issues found in CoD code                     0
#> 18                   0 No issues found in CoD code                     0
#> 19                   0 No issues found in CoD code                     0
#> 20                   0 No issues found in CoD code                     0
#> # ℹ 9 more variables: cod_check_note_ill_defined <fct>,
#> #   cod_check_unlikely <int>, cod_check_note_unlikely <fct>,
#> #   cod_check_sex <int>, cod_check_note_sex <fct>, cod_check_age <int>,
#> #   cod_check_note_age <fct>, cod_check_code <dbl>, cod_check_code_note <fct>

Results of the per row cause-of-death checks can also be summarised to give a count of issues found in the dataset.

cod_check_code(
  cod_data_raw_example$code, version = "icd11", 
  sex = cod_data_raw_example$sex, age = cod_data_raw_example$age
) |>
  cod_check_code_summary()
#> $`Code structure`
#> # A tibble: 65 × 2
#>    cod_check_note                                                              n
#>    <fct>                                                                   <int>
#>  1 No issues found in CoD code                                                20
#>  2 CoD code has a period (`.`) character in the wrong place                    0
#>  3 CoD code starts with `O` or `I`                                             0
#>  4 CoD code has a period (`.`) character in the wrong place; CoD code sta…     0
#>  5 CoD code has a number as its second value                                   0
#>  6 CoD code has a period (`.`) character in the wrong place; CoD code has…     0
#>  7 CoD code starts with `O` or `I`; CoD code has a number as its second v…     0
#>  8 CoD code has a period (`.`) character in the wrong place; CoD code sta…     0
#>  9 CoD code has `O` or `I` as its second value                                 0
#> 10 CoD code has a period (`.`) character in the wrong place; CoD code has…     0
#> # ℹ 55 more rows
#> 
#> $`Ill-defined code`
#> # A tibble: 2 × 2
#>   cod_check_note                      n
#>   <fct>                           <int>
#> 1 No issues found in CoD code        20
#> 2 CoD code is an ill-defined code     0
#> 
#> $`Unlikely cause-of-death code`
#> # A tibble: 2 × 2
#>   cod_check_note                             n
#>   <fct>                                  <int>
#> 1 No issues found in CoD code               20
#> 2 CoD code is an unlikely cause-of-death     0
#> 
#> $`Code not appropriate for sex`
#> # A tibble: 2 × 2
#>   cod_check_note                                   n
#>   <fct>                                        <int>
#> 1 No issues found in CoD code                     20
#> 2 CoD code is not appropriate for person's sex     0
#> 
#> $`Code not appropriate for age`
#> # A tibble: 2 × 2
#>   cod_check_note                                   n
#>   <fct>                                        <int>
#> 1 No issues found in CoD code                     20
#> 2 CoD code is not appropriate for person's age     0
#> 
#> $Overall
#> # A tibble: 2 × 2
#>   cod_check_note                  n
#>   <fct>                       <int>
#> 1 No issues found in CoD code    20
#> 2 Issues found in CoD code        0

Perform specific check types on cause-of-death data

The family of cod_check_code_* functions can be used to perform specific check types on the cause-of-death data.

  1. Check code structure
### Perform code structure check on cause-of-death data ----
cod_check_code_structure_icd11(cod_data_raw_example$code)
#> # A tibble: 20 × 2
#>    cod_check cod_check_note             
#>        <int> <fct>                      
#>  1         0 No issues found in CoD code
#>  2         0 No issues found in CoD code
#>  3         0 No issues found in CoD code
#>  4         0 No issues found in CoD code
#>  5         0 No issues found in CoD code
#>  6         0 No issues found in CoD code
#>  7         0 No issues found in CoD code
#>  8         0 No issues found in CoD code
#>  9         0 No issues found in CoD code
#> 10         0 No issues found in CoD code
#> 11         0 No issues found in CoD code
#> 12         0 No issues found in CoD code
#> 13         0 No issues found in CoD code
#> 14         0 No issues found in CoD code
#> 15         0 No issues found in CoD code
#> 16         0 No issues found in CoD code
#> 17         0 No issues found in CoD code
#> 18         0 No issues found in CoD code
#> 19         0 No issues found in CoD code
#> 20         0 No issues found in CoD code
  1. Check for ill-defined codes
### Perform check for ill-defined codes on cause-of-death data ----
cod_check_code_ill_defined_icd11(cod_data_raw_example$code)
#> # A tibble: 20 × 2
#>    cod_check cod_check_note             
#>        <int> <fct>                      
#>  1         0 No issues found in CoD code
#>  2         0 No issues found in CoD code
#>  3         0 No issues found in CoD code
#>  4         0 No issues found in CoD code
#>  5         0 No issues found in CoD code
#>  6         0 No issues found in CoD code
#>  7         0 No issues found in CoD code
#>  8         0 No issues found in CoD code
#>  9         0 No issues found in CoD code
#> 10         0 No issues found in CoD code
#> 11         0 No issues found in CoD code
#> 12         0 No issues found in CoD code
#> 13         0 No issues found in CoD code
#> 14         0 No issues found in CoD code
#> 15         0 No issues found in CoD code
#> 16         0 No issues found in CoD code
#> 17         0 No issues found in CoD code
#> 18         0 No issues found in CoD code
#> 19         0 No issues found in CoD code
#> 20         0 No issues found in CoD code
  1. Check for unlikely cause-of-death codes
### Perform check for unlikely cause-of-death codes ----
cod_check_code_unlikely_icd11(cod_data_raw_example$code)
#> # A tibble: 20 × 2
#>    cod_check cod_check_note             
#>        <int> <fct>                      
#>  1         0 No issues found in CoD code
#>  2         0 No issues found in CoD code
#>  3         0 No issues found in CoD code
#>  4         0 No issues found in CoD code
#>  5         0 No issues found in CoD code
#>  6         0 No issues found in CoD code
#>  7         0 No issues found in CoD code
#>  8         0 No issues found in CoD code
#>  9         0 No issues found in CoD code
#> 10         0 No issues found in CoD code
#> 11         0 No issues found in CoD code
#> 12         0 No issues found in CoD code
#> 13         0 No issues found in CoD code
#> 14         0 No issues found in CoD code
#> 15         0 No issues found in CoD code
#> 16         0 No issues found in CoD code
#> 17         0 No issues found in CoD code
#> 18         0 No issues found in CoD code
#> 19         0 No issues found in CoD code
#> 20         0 No issues found in CoD code
  1. Check for cause-of-death codes inappropriate for given sex
### Perform check for cause-of-death codes inappropriate for specific sex ----
cod_check_code_sex_icd11(cod_data_raw_example$code, cod_data_raw_example$sex)
#> # A tibble: 20 × 2
#>    cod_check cod_check_note             
#>        <int> <fct>                      
#>  1         0 No issues found in CoD code
#>  2         0 No issues found in CoD code
#>  3         0 No issues found in CoD code
#>  4         0 No issues found in CoD code
#>  5         0 No issues found in CoD code
#>  6         0 No issues found in CoD code
#>  7         0 No issues found in CoD code
#>  8         0 No issues found in CoD code
#>  9         0 No issues found in CoD code
#> 10         0 No issues found in CoD code
#> 11         0 No issues found in CoD code
#> 12         0 No issues found in CoD code
#> 13         0 No issues found in CoD code
#> 14         0 No issues found in CoD code
#> 15         0 No issues found in CoD code
#> 16         0 No issues found in CoD code
#> 17         0 No issues found in CoD code
#> 18         0 No issues found in CoD code
#> 19         0 No issues found in CoD code
#> 20         0 No issues found in CoD code
  1. Check for cause-of-death codes inappropriate for given age
### Perform check for cause-of-death codes inappropriate for specific age ----
cod_check_code_age_icd11(cod_data_raw_example$code, cod_data_raw_example$age)
#> # A tibble: 20 × 2
#>    cod_check cod_check_note             
#>        <int> <fct>                      
#>  1         0 No issues found in CoD code
#>  2         0 No issues found in CoD code
#>  3         0 No issues found in CoD code
#>  4         0 No issues found in CoD code
#>  5         0 No issues found in CoD code
#>  6         0 No issues found in CoD code
#>  7         0 No issues found in CoD code
#>  8         0 No issues found in CoD code
#>  9         0 No issues found in CoD code
#> 10         0 No issues found in CoD code
#> 11         0 No issues found in CoD code
#> 12         0 No issues found in CoD code
#> 13         0 No issues found in CoD code
#> 14         0 No issues found in CoD code
#> 15         0 No issues found in CoD code
#> 16         0 No issues found in CoD code
#> 17         0 No issues found in CoD code
#> 18         0 No issues found in CoD code
#> 19         0 No issues found in CoD code
#> 20         0 No issues found in CoD code

This vignette gives a more detailed discussion of all the checks performed by the codeditr package.