R: pivot_wider no puede contraer filas
El título debe explicarse por sí mismo. Aquí está el conjunto de datos con el que estoy empezando.
a <- data.frame(out=c('asd', NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
"adhd",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
exposure=c('x susceptibility', NA,NA,NA,
'hospitalised x', NA,NA,NA,
'severe x', NA,NA,NA,
'x susceptibility', NA,NA,NA,
'hospitalised x', NA,NA,NA,
'severe x', NA,NA,NA),
method=rep(c('a','b','c','d'),6),
or=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
loci_or=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
upci_or=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
p_val=c(rnorm(3,0,0.004),NA,rnorm(11,0,0.004),NA,rnorm(8,0,0.004)),
egger_int=c(NA,0.00004,NA,NA,NA,0.00009,NA,NA,NA,0.00003,NA,NA,
NA,0.00001,NA,NA,NA,0.00002,NA,NA,NA,0.00007,NA,NA),
egger_int_p=c(NA,0.00004,NA,NA,NA,0.00009,NA,NA,NA,0.00003,NA,NA,
NA,0.00001,NA,NA,NA,0.00002,NA,NA,NA,0.00007,NA,NA))
Una representación visual de lo anterior se ve así . En este momento, hay tantas filas como métodos. Quiero usar tidyr::pivot_wider (o equivalente) para que haya una fila por par de resultado-exposición. Los NA en la columna de resultados me permiten saber instantánea y visualmente qué resultado se utiliza.
En otras palabras, haga que los datos se vean así:
b <- data.frame(out=c('asd', NA,NA,
"adhd",NA,NA),
exposure=c('x susceptibility','hospitalised x','severe x',
'x susceptibility','hospitalised x','severe x'),
a_or=rnorm(6,0,0.004),
a_loci_or=rnorm(6,0,0.004),
a_upci_or=rnorm(6,0,0.004),
a_p_val=rnorm(6,0,0.004),
b_or=rnorm(6,0,0.004),
b_loci_or=rnorm(6,0,0.004),
b_upci_or=rnorm(6,0,0.004),
b_p_val=rnorm(6,0,0.004),
c_or=rnorm(6,0,0.004),
c_loci_or=rnorm(6,0,0.004),
c_upci_or=rnorm(6,0,0.004),
c_p_val=rnorm(6,0,0.004),
egger_int=c(0.00004,0.00009,0.00003,0.00001,0.00002,0.00007),
egger_int_p=c(0.00004,0.00009,0.00003,0.00001,0.00002,0.00007))
Esto es lo que he hecho hasta ahora:
tidy_dev <- a %>%
# fills missing values in these columns using next/previous entry.
# Values are not repeated,
tidyr::fill(outcome,exposure) %>%
# changing from long format to wide format
tidyr::pivot_wider(names_from = method,
values_from = or:p_value,
# naming scheme: value1_name1, value2_name1 etc
names_vary = 'slowest',
# how you want to format column names
names_glue = '{method}_{.value}') %>%
# moving Egger intercept and its p-value to the last column
dplyr::relocate(c(egger_int,
egger_int_p),
.after = last_col())
Sin embargo, lo que tengo son dos filas del mismo par resultado-exposición. Los valores de las columnas egger_int
, egger_int_p
, y egger_or
están en una fila y los demás valores de las otras columnas están en otra fila. Entonces, efectivamente tengo 12 filas cuando quiero 6.egger_loci_or
egger_uci_or
{method}_{.value}
Los datos que tengo después de mi intento se ven así , como referencia.
¡Cualquier ayuda es apreciada!
El problema son las egger_
columnas que para cada par de outcome
contienen exposure
dos categorías, es decir, un valor y un NA
. Por lo tanto terminas con dos filas.
Una opción para solucionarlo sería utilizar otra fill
para deshacerse del NA
s:
library(tidyr)
library(dplyr, warn.conflicts = FALSE)
tidy_dev <- a %>%
# fills missing values in these columns using next/previous entry.
# Values are not repeated,
tidyr::fill(outcome, exposure) %>%
group_by(outcome, exposure) %>%
tidyr::fill(starts_with("egger"), .direction = "downup") %>%
ungroup() %>%
# changing from long format to wide format
tidyr::pivot_wider(
names_from = method,
values_from = or:p_value,
# naming scheme: value1_name1, value2_name1 etc
names_vary = "slowest",
# how you want to format column names
names_glue = "{method}_{.value}"
) %>%
# moving Egger intercept and its p-value to the last column
dplyr::relocate(
c(
egger_int,
egger_int_p
),
.after = last_col()
)
tidy_dev
#> # A tibble: 6 × 20
#> outcome exposure a_or a_loci_or a_upci_or a_p_value b_or b_loci_or
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 asd x susceptib… -5.26e-3 -0.00262 -0.00427 0.00884 -3.18e-3 0.00598
#> 2 asd hospitalise… 3.62e-5 0.00144 -0.00840 -0.00632 -4.75e-3 0.00309
#> 3 asd severe x 1.95e-3 0.00268 -0.00250 -0.00438 1.04e-3 -0.000874
#> 4 adhd x susceptib… 1.55e-3 -0.00782 -0.00452 0.00317 -5.73e-4 0.00455
#> 5 adhd hospitalise… 1.10e-3 0.00600 -0.00342 -0.00334 -2.32e-3 0.00676
#> 6 adhd severe x 2.88e-3 0.00130 0.00352 -0.00317 -8.45e-4 -0.00134
#> # ℹ 12 more variables: b_upci_or <dbl>, b_p_value <dbl>, c_or <dbl>,
#> # c_loci_or <dbl>, c_upci_or <dbl>, c_p_value <dbl>, d_or <dbl>,
#> # d_loci_or <dbl>, d_upci_or <dbl>, d_p_value <dbl>, egger_int <dbl>,
#> # egger_int_p <dbl>