R에서 날짜를 분할 (분할)하는 방법은 무엇입니까?

Aug 17 2020

등록 데이터를 사용하여 병가에 대한 연구를하고 있습니다. 기록부에서 각 개인의 병가 시작일과 종료일 만 얻었습니다. 그러나 날짜는 해마다 분류되지 않습니다. 예를 들어 사람 A의 경우 시작일 (2016 년 5 월 1 일) 및 종료일 (2018 년 2 월 14 일)에 대한 데이터 만 있습니다.

따라서 R에서 연도별로 날짜를 분할하는 방법을 알고 싶습니다 (예 : 01/05/16 ~ 14/02/18은 01 / 5 / 16-31 / 12 / 16, 01 / 01 / 2017-31 / 12 / 17, 01 / 01 / 18-14 / 02 / 18)을 사용하여 매년 총 병가 수를 계산합니다.

질문에 대해 생성 된 예제 데이터는 다음과 같습니다.

sick_leave <- tribble(
      ~id,        ~from,          ~to, 
        1, "01/01/2018", "03/10/2020",
        2, "01/01/2016", "01/01/2021", 
        3, "02/01/2018", "02/06/2018",
        3, "02/07/2018", "31/12/2018",
        4, "02/10/2018", "02/02/2019",
        4, "31/12/2019", "01/01/2021",
        5, "02/10/2017", "20/05/2018",
        6, "02/03/2021", "31/12/2021",
        7, "01/01/2016", "05/06/2016"
    ) %>% mutate(from = dmy(from),to = dmy(to))

원하는 출력은 다음과 같습니다.

id  year  from        to          wanted
 1  2018  2018-01-01  2018-12-31  365
 1  2019  2019-01-01  2019-12-31  365
 1  2020  2020-01-01  2020-10-03  277
 2  2016  2016-01-01  2016-12-31  366
 2  2017  2017-01-01  2017-12-31  365
 2  2018  2018-01-01  2018-12-31  365
 2  2019  2019-01-01  2019-12-31  365
 2  2020  2020-01-01  2020-12-31  366
 2  2021  2021-01-01  2021-01-01    1
 3  2018  2018-01-02  2018-06-02  152
 3  2018  2018-07-02  2018-12-31  183
 4  2018  2018-10-02  2018-12-31   91
 4  2019  2019-01-01  2019-02-02   33
 4  2019  2019-12-31  2019-12-31    1
 4  2020  2020-01-01  2020-12-31  366
 4  2021  2021-01-01  2021-01-01    1
 5  2017  2017-10-02  2017-12-31   91
 5  2018  2018-01-01  2018-05-20  140
 6  2021  2021-03-02  2021-12-31  305
 7  2016  2016-01-01  2016-06-05  157

답변

1 Edo Aug 17 2020 at 15:40

이 솔루션을 사용하면 요청에 따라 새 행을 만들어 날짜를 분할 할 수 있습니다.

이 기능 split_by_year은 행 단위로 수행됩니다.

코드에 몇 가지 의견을 남길 것입니다.

# necessary libraries
library(dplyr)
library(lubridate)

split_by_year <- function(from, to){
    
    year_from <- year(from)
    year_to   <- year(to)
    
    # get sequence of years
    years <- seq(year_from, year_to)
    
    # create start and end date for each year
    starts <- make_date(years)  
    ends   <- make_date(years, 12, 31)
    
    # set starts and ends together, replace limits with from and end
    dates <- sort(c(starts, ends))
    dates[c(1, length(dates))] <- c(from, to)
    
    # recreate dataframe with columns from and to
    m <- matrix(dates, ncol = 2, byrow = TRUE)
    colnames(m) <- c("from", "to")
    mutate_all(as_tibble(m), as_date)
    
}

sick_leave %>%
    rowwise() %>% # next line will be performed row by row
    summarise(id = id, split_by_year(from, to)) %>% 
    mutate(sick_days = as.numeric(to - from + 1))

산출:

# A tibble: 20 x 4
      id from       to         sick_days
   <dbl> <date>     <date>         <dbl>
 1     1 2018-01-01 2018-12-31       365
 2     1 2019-01-01 2019-12-31       365
 3     1 2020-01-01 2020-10-03       277
 4     2 2016-01-01 2016-12-31       366
 5     2 2017-01-01 2017-12-31       365
 6     2 2018-01-01 2018-12-31       365
 7     2 2019-01-01 2019-12-31       365
 8     2 2020-01-01 2020-12-31       366
 9     2 2021-01-01 2021-01-01         1
10     3 2018-01-02 2018-06-02       152
11     3 2018-07-02 2018-12-31       183
12     4 2018-10-02 2018-12-31        91
13     4 2019-01-01 2019-02-02        33
14     4 2019-12-31 2019-12-31         1
15     4 2020-01-01 2020-12-31       366
16     4 2021-01-01 2021-01-01         1
17     5 2017-10-02 2017-12-31        91
18     5 2018-01-01 2018-05-20       140
19     6 2021-03-02 2021-12-31       305
20     7 2016-01-01 2016-06-05       157

1 Wimpel Aug 17 2020 at 15:46

귀하의 질문은 XY 문제 처럼 들립니다 .
그래서 연도 별 간격 생성을 건너 뛰고 원하는 답인 연간 ID 당 병가 계산 ..

R에서 날짜를 분할 (분할)하는 방법은 무엇입니까?

답변

원하는 출력과 일치하는 업데이트