R: pivot_wider no puede contraer filas

Resuelto John asked hace 55 años • 0 respuestas

El título debe explicarse por sí mismo. Aquí está el conjunto de datos con el que estoy empezando.

a <- data.frame(out=c('asd', NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                      "adhd",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
                exposure=c('x susceptibility', NA,NA,NA,
                           'hospitalised x', NA,NA,NA,
                           'severe x', NA,NA,NA,
                           'x susceptibility', NA,NA,NA,
                           'hospitalised x', NA,NA,NA,
                           'severe x', NA,NA,NA),
                method=rep(c('a','b','c','d'),6),
                or=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
                loci_or=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
                upci_or=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
                p_val=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
                egger_int=c(NA,0.00004,NA,NA,NA,0.00009,NA,NA,NA,0.00003,NA,NA,
                            NA,0.00001,NA,NA,NA,0.00002,NA,NA,NA,0.00007,NA,NA),
                egger_int_p=c(NA,0.00004,NA,NA,NA,0.00009,NA,NA,NA,0.00003,NA,NA,
                              NA,0.00001,NA,NA,NA,0.00002,NA,NA,NA,0.00007,NA,NA))

Una representación visual de lo anterior se ve así . En este momento, hay tantas filas como métodos. Quiero usar tidyr::pivot_wider (o equivalente) para que haya una fila por par de resultado-exposición. Los NA en la columna de resultados me permiten saber instantánea y visualmente qué resultado se utiliza.

En otras palabras, haga que los datos se vean así:

b <- data.frame(out=c('asd', NA,NA,
                  "adhd",NA,NA),
            exposure=c('x susceptibility','hospitalised x','severe x',
                       'x susceptibility','hospitalised x','severe x'),
            a_or=rnorm(6,0,0.004),
            a_loci_or=rnorm(6,0,0.004),
            a_upci_or=rnorm(6,0,0.004),
            a_p_val=rnorm(6,0,0.004),
            b_or=rnorm(6,0,0.004),
            b_loci_or=rnorm(6,0,0.004),
            b_upci_or=rnorm(6,0,0.004),
            b_p_val=rnorm(6,0,0.004),
            c_or=rnorm(6,0,0.004),
            c_loci_or=rnorm(6,0,0.004),
            c_upci_or=rnorm(6,0,0.004),
            c_p_val=rnorm(6,0,0.004),
            egger_int=c(0.00004,0.00009,0.00003,0.00001,0.00002,0.00007),
            egger_int_p=c(0.00004,0.00009,0.00003,0.00001,0.00002,0.00007))

Esto es lo que he hecho hasta ahora:

tidy_dev <- a %>%
            # fills missing values in these columns using next/previous entry.
            # Values are not repeated,
            tidyr::fill(outcome,exposure) %>%
            # changing from long format to wide format
            tidyr::pivot_wider(names_from = method,
                               values_from = or:p_value,
                               # naming scheme: value1_name1, value2_name1 etc
                               names_vary = 'slowest',
                               # how you want to format column names
                               names_glue = '{method}_{.value}') %>%
            # moving Egger intercept and its p-value to the last column
            dplyr::relocate(c(egger_int,
                              egger_int_p),
                            .after = last_col())

Sin embargo, lo que tengo son dos filas del mismo par resultado-exposición. Los valores de las columnas egger_int, egger_int_p, y egger_orestán en una fila y los demás valores de las otras columnas están en otra fila. Entonces, efectivamente tengo 12 filas cuando quiero 6.egger_loci_oregger_uci_or{method}_{.value}

Los datos que tengo después de mi intento se ven así , como referencia.

¡Cualquier ayuda es apreciada!

John avatar Jan 01 '70 08:01 John
Aceptado

El problema son las egger_columnas que para cada par de outcomecontienen exposuredos categorías, es decir, un valor y un NA. Por lo tanto terminas con dos filas.

Una opción para solucionarlo sería utilizar otra fillpara deshacerse del NAs:

library(tidyr)
library(dplyr, warn.conflicts = FALSE)

tidy_dev <- a %>%
  # fills missing values in these columns using next/previous entry.
  # Values are not repeated,
  tidyr::fill(outcome, exposure) %>%
  group_by(outcome, exposure) %>%
  tidyr::fill(starts_with("egger"), .direction = "downup") %>%
  ungroup() %>%
  # changing from long format to wide format
  tidyr::pivot_wider(
    names_from = method,
    values_from = or:p_value,
    # naming scheme: value1_name1, value2_name1 etc
    names_vary = "slowest",
    # how you want to format column names
    names_glue = "{method}_{.value}"
  ) %>%
  # moving Egger intercept and its p-value to the last column
  dplyr::relocate(
    c(
      egger_int,
      egger_int_p
    ),
    .after = last_col()
  )

tidy_dev
#> # A tibble: 6 × 20
#>   outcome exposure         a_or a_loci_or a_upci_or a_p_value     b_or b_loci_or
#>   <chr>   <chr>           <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
#> 1 asd     x susceptib… -5.26e-3  -0.00262  -0.00427   0.00884 -3.18e-3  0.00598 
#> 2 asd     hospitalise…  3.62e-5   0.00144  -0.00840  -0.00632 -4.75e-3  0.00309 
#> 3 asd     severe x      1.95e-3   0.00268  -0.00250  -0.00438  1.04e-3 -0.000874
#> 4 adhd    x susceptib…  1.55e-3  -0.00782  -0.00452   0.00317 -5.73e-4  0.00455 
#> 5 adhd    hospitalise…  1.10e-3   0.00600  -0.00342  -0.00334 -2.32e-3  0.00676 
#> 6 adhd    severe x      2.88e-3   0.00130   0.00352  -0.00317 -8.45e-4 -0.00134 
#> # ℹ 12 more variables: b_upci_or <dbl>, b_p_value <dbl>, c_or <dbl>,
#> #   c_loci_or <dbl>, c_upci_or <dbl>, c_p_value <dbl>, d_or <dbl>,
#> #   d_loci_or <dbl>, d_upci_or <dbl>, d_p_value <dbl>, egger_int <dbl>,
#> #   egger_int_p <dbl>
stefan avatar Feb 16 '2024 10:02 stefan