An R Package for Change Point Detection
São Paulo, 24 de Setembro de 2019
1 Introduction
(Castro et al. 2018) describes a method of segmenting a data set of a finite alphabet into blocks of independent variables. This work expands on that by minimizing a cost function rather than maximizing a likelihood function, the two being equivalend when the cost function output is the opposite of the likelihood function output. We show this generalization can be used for other use-cases, e.g. segments with homogeneous values (see Chapter 4.2), segments that have the same linear regression trend (see Chapter )5).
This work produced an R Programming Language (R Core Team 2018) package. The
packaging of the code into redistributable software was based on instructions
provided by (Wickham 2015), and can be installed through CRAN
(Mello and Leonardi 2019), with the goal of making it as easy as
possible for researchers and R programmers to use this software. All the source code
is open-source and available on GitHub (Mello 2019).
Finally, a specialized version of the multivariate likelihood function is
implemented in native code using the Rcpp
package (Eddelbuettel and François 2011), which allows a faster execution
for the use-case described in (Castro et al. 2018).
References
Castro, Bruno M., Renan B. Lemes, Jonatas Cesar, Tábita Hünemeier, and Florencia Leonardi. 2018. “A Model Selection Approach for Multiple Sequence Segmentation and Dimensionality Reduction.” Journal of Multivariate Analysis 167: 319–30. https://doi.org/https://doi.org/10.1016/j.jmva.2018.05.006.
Eddelbuettel, Dirk, and Romain François. 2011. “Rcpp: Seamless R and C++ Integration.” Journal of Statistical Software 40 (8): 1–18. https://doi.org/10.18637/jss.v040.i08.
Maidstone, Robert, Toby Hocking, Guillem Rigaill, and Paul Fearnhead. 2017. “On Optimal Multiple Changepoint Algorithms for Large Data.” Statistics and Computing 27 (2): 519–33. https://doi.org/10.1007/s11222-016-9636-3.
Mello, Thales. 2019. “Segmentr: An R Package to Segment Data with Minimal Total Cost (Aka Change Points).” GitHub Repository. https://github.com/thalesmello/segmentr; GitHub.
Mello, Thales, and Florencia Leonardi. 2019. Segmentr: Segment Data Minimizing a Cost Function. https://CRAN.R-project.org/package=segmentr.
R Core Team. 2018. “R: A Language and Environment for Statistical Computing.” Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Wickham, Hadley. 2015. R Packages. 1st ed. O’Reilly Media, Inc. http://r-pkgs.had.co.nz/.