+ +

Preprocessing Operators

+

Data preprocessing can be performed using the "mlrCPO" ("Composable Preprocessing Operators") addon package for mlr. +mlrCPO makes it easy to use a variety of preprocessing operations, to chain different operations, to integrate +preprocessing with mlr Learners, and to define custom preprocessing operations.

+

mlrCPO provides the %>>%-operator, which is used as a piping operator: It chains different operations, +it applies an operation to a dataset, +and it attaches an operation to a Learner to create an integrated preprocessing and model fitting pipeline. +This way, it is possible to quickly create natural looking pipelines that are very flexible and can even be +tuned over.

+

This tutorial handles the basics of using mlrCPO for preprocessing in combination with mlr Learners. For a more in-depth introduction, look at the +mlrCPO vignette using

+
vignette("a_1_getting_started", package = "mlrCPO")
+
+ +

The following requires the mlrCPO package to be loaded:

+
library("mlrCPO")
+
+ +

CPO Objects

+

Different preprocessing operations are provided in the form of CPO Constructors, +which can be called like functions +to create CPO objects. These CPO objects are then used to apply the operation to a data set.

+
cpoAddCols  # a cpo constructor
+#> <<CPO new.cols(..., .make.factors = TRUE)>>
+
+ +
# create a CPO object that adds a new column
+cpo = cpoAddCols(Sepal.Area = Sepal.Length * Sepal.Width) 
+
+ +

CPO objects are central to mlrCPO, and they are very flexible. They can be applied to a +data.frame or a Task:

+
head(iris %>>% cpo)
+#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Area
+#> 1          5.1         3.5          1.4         0.2  setosa      17.85
+#> 2          4.9         3.0          1.4         0.2  setosa      14.70
+#> 3          4.7         3.2          1.3         0.2  setosa      15.04
+#> 4          4.6         3.1          1.5         0.2  setosa      14.26
+#> 5          5.0         3.6          1.4         0.2  setosa      18.00
+#> 6          5.4         3.9          1.7         0.4  setosa      21.06
+
+ +
head(getTaskData(iris.task %>>% cpo))
+#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Area Species
+#> 1          5.1         3.5          1.4         0.2      17.85  setosa
+#> 2          4.9         3.0          1.4         0.2      14.70  setosa
+#> 3          4.7         3.2          1.3         0.2      15.04  setosa
+#> 4          4.6         3.1          1.5         0.2      14.26  setosa
+#> 5          5.0         3.6          1.4         0.2      18.00  setosa
+#> 6          5.4         3.9          1.7         0.4      21.06  setosa
+
+ +

CPOs can be concatenated to create new operations. The following example adds the Sepal.Area column and then scales +and centers all numeric columns:

+
cpo %>>% cpoScale()
+#> (new.cols >> scale)(scale.center = TRUE, scale.scale = TRUE)
+
+ +

CPOs can be fused with a Learner to create a machine learning pipeline that performs +preprocessing on the training data +and also pre-processes the data that is fed to the resulting model for prediction.

+
lrn = cpo %>>% makeLearner("classif.randomForest")
+model = train(lrn, iris.task)
+getFeatureImportance(model$learner.model$next.model)
+#> FeatureImportance:
+#> Task: iris_example
+#> 
+#> Learner: classif.randomForest
+#> Measure: NA
+#> Contrast: NA
+#> Aggregation: function (x)  x
+#> Replace: NA
+#> Number of Monte-Carlo iterations: NA
+#> Local: FALSE
+#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Area
+#> 1     11.23503    4.260543     39.65265    40.19068   3.966734
+
+ +

A list of all internal CPOs can be retrieved using listCPO(), which returns a data.frame of names, categories, and descriptions.

+
listCPO()
+#>                      name               cponame category
+#> 11       cpoDropConstants             dropconst     data
+#> 36          cpoFixFactors            fixfactors     data
+#> 10        cpoCollapseFact         collapse.fact     data
+#> 4            cpoAsNumeric            as.numeric     data
+#> 15         cpoDummyEncode           dummyencode     data
+#> 13 cpoImpactEncodeClassif impact.encode.classif     data
+#>                  subcategory
+#> 11                   cleanup
+#> 36                   cleanup
+#> 10 factor data preprocessing
+#> 4         feature conversion
+#> 15        feature conversion
+#> 13        feature conversion
+#> ... (#rows: 69, #cols: 4)
+
+ +

Hyperparameters

+

CPO objects have hyperparameters that can be adjusted at creation, or later using setHyperPars(). They are shown by the +CPO Constructor representation when printed, and can be given as parameters during construction.

+
cpoScale
+#> <<CPO scale(center = TRUE, scale = TRUE)>>
+
+ +
do.center = cpoScale(scale = FALSE, center = TRUE)
+
+ +

The ParamSet of a CPO can be inspected using getParamSet(), but it is also shown when verbosely printing a CPO using !.

+
!do.center  # note the 'scale.' prefix
+#> Trafo chain of 1 cpos:
+#> scale(center = TRUE, scale = FALSE)
+#> Operating: feature
+#> ParamSet:
+#>                 Type len  Def Constr Req Tunable Trafo
+#> scale.center logical   - TRUE      -   -    TRUE     -
+#> scale.scale  logical   - TRUE      -   -    TRUE     -
+
+ +
do.scale = setHyperPars(do.center,
+  scale.scale = TRUE, scale.center = FALSE)
+do.scale
+#> scale(center = FALSE, scale = TRUE)
+
+ +

These hyperparameters even survive CPO composition and attachment to Learners:

+
cpo = cpoScale() %>>% cpoPca()
+lrn = cpo %>>% makeLearner("classif.logreg")
+print(lrn)
+#> Learner classif.logreg.pca.scale from package stats
+#> Type: classif
+#> Name: ; Short name: 
+#> Class: CPOLearner
+#> Properties: numerics,factors,twoclass,prob
+#> Predict-Type: response
+#> Hyperparameters: model=FALSE
+
+ +

When composing many CPOs, the ParamSet of the combined CPO can become quite cluttered. To prevent name clashes, it is possible +to change the prefix of the hyperparameters of a given CPO using the ID. It can be set during construction, or by using setCPOId().

+
combined = cpoScale(scale = TRUE, center = FALSE, id = "scale") %>>%
+  cpoScale(scale = FALSE, center = TRUE, id = "center")
+getParamSet(combined)
+#>                  Type len  Def Constr Req Tunable Trafo
+#> scale.center  logical   - TRUE      -   -    TRUE     -
+#> scale.scale   logical   - TRUE      -   -    TRUE     -
+#> center.center logical   - TRUE      -   -    TRUE     -
+#> center.scale  logical   - TRUE      -   -    TRUE     -
+
+ +

Another possibility is to change what parameters are "exported" by the CPO. A parameter that is not exported can not be changed +after construction. The export parameter given during construction can be a character vector of the parameters to export.

+
center = cpoScale(scale = FALSE, center = TRUE, export = "center")
+!center
+#> Trafo chain of 1 cpos:
+#> scale(center = TRUE)[not exp'd: scale = FALSE]
+#> Operating: feature
+#> ParamSet:
+#>                 Type len  Def Constr Req Tunable Trafo
+#> scale.center logical   - TRUE      -   -    TRUE     -
+
+ +

Affecting Only Some Features

+

It is possible to set up a CPO so that it only affects certain columns of a given dataset. This is done with a few +parameters during construction that begin with the prefix "affect.". The following example only scales and centers columns +that begin with "Sepal".

+
cpo = cpoScale(affect.pattern = "^Sepal")
+head(iris %>>% cpo)
+#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
+#> 1   -0.8976739  1.01560199          1.4         0.2  setosa
+#> 2   -1.1392005 -0.13153881          1.4         0.2  setosa
+#> 3   -1.3807271  0.32731751          1.3         0.2  setosa
+#> 4   -1.5014904  0.09788935          1.5         0.2  setosa
+#> 5   -1.0184372  1.24503015          1.4         0.2  setosa
+#> 6   -0.5353840  1.93331463          1.7         0.4  setosa
+
+ +

CPOTrained: Retrafo and Inverter

+

Manipulating data for preprocessing itself is relatively easy. A challenge comes when one wants to integrate preprocessing +into a machine-learning pipeline: The same preprocessing steps that are performed on the training data +need to be performed on the new prediction data. However, the transformation performed for prediction often needs +information from the training step. +For example, if training entails performing PCA, +then for prediction, the data must not undergo another PCA, instead it needs +to be rotated by the rotation matrix found by the training PCA. The process of obtaining the rotation matrix is called +"training" the CPO, and the object that contains the trained information is a CPOTrained object; it can be accessed using +the retrafo() function on the transformed data. When a CPO has an effect +on the target columns of a Task, two CPOTrained objects are generated: One, as before, is used on new prediction data before +doing predictoin with a model. The other is used on predictions made with that model, to map the prediction back to the space +of the original target column. This inverting CPOTrained can be accessed using inverter() on transformed data.

+

The process of using CPOTrained correctly can be a bit involved, but mlrCPO automates it when a CPO is attached to a +Learner object, see the following section. The CPOTrained objects are explained in more detail in the mlrCP vignette.

+

CPO Learner

+

When attaching a CPO to a Learner using the %>>%-operator, the complete preprocessing pipeline is integrated by mlrCPO, so there is no need to +worry about keeping CPOTrained objects. The resulting CPOLearner inherits the hyperparameters both from the CPO and the Learner. This way, +the function of a CPO can be tuned together with parameters of a Learner itself.

+

When a CPOLearner is trained on some data, it is possible to get information about the effect of an attached CPO by +inspecting the CPOTrained object created during training. It can be retrieved from a model using retrafo() and inspected +using getCPOTrainedState(). The following example retrieves the PCA rotation matrix trained when fitting a CPOLearner to iris.task.

+
lrn = cpoPca() %>>% makeLearner("classif.randomForest")
+model = train(lrn, iris.task)
+
+retr = retrafo(model)
+state = getCPOTrainedState(retr)
+state$control$rotation
+#>                      PC1         PC2         PC3        PC4
+#> Sepal.Length  0.36138659 -0.65658877  0.58202985  0.3154872
+#> Sepal.Width  -0.08452251 -0.73016143 -0.59791083 -0.3197231
+#> Petal.Length  0.85667061  0.17337266 -0.07623608 -0.4798390
+#> Petal.Width   0.35828920  0.07548102 -0.54583143  0.7536574
+
+ +

Tuning

+

Tuning CPO hyperparameters works exactly like tuning Learner hyperparameters, since the CPO's parameters are attached naturally to a Learner's parameters when a CPOLearner +is formed.

+
(clrn = cpoFilterFeatures(export = c("method", "abs")) %>>% makeLearner("classif.knn"))
+#> Learner classif.knn.filterFeatures from package class
+#> Type: classif
+#> Name: ; Short name: 
+#> Class: CPOLearner
+#> Properties: numerics,twoclass,multiclass
+#> Predict-Type: response
+#> Hyperparameters: filterFeatures.method=randomForest...,filterFeatures.abs=<NULL>
+
+ +
getParamIds(getParamSet(clrn))
+#> [1] "filterFeatures.method" "filterFeatures.abs"    "k"                    
+#> [4] "l"                     "prob"                  "use.all"
+
+ +
ps = makeParamSet(
+    makeDiscreteParam(
+        "filterFeatures.method",
+        values = list("anova.test", "variance", "chi.squared")),
+    makeIntegerParam(
+        "filterFeatures.abs",
+        lower = 1, upper = 8),
+    makeIntegerParam(
+        "k",
+        lower = 1, upper = 10))
+
+tuneParams(clrn, pid.task, cv5, par.set = ps,
+           control = makeTuneControlRandom(budget = 10),
+           show.info=FALSE)
+#> Tune result:
+#> Op. pars: filterFeatures.method=variance; filterFeatures.abs=8; k=10
+#> mmce.test.mean=0.2527120
+
+ +

Special CPOs

+

NULLCPO

+

Under certain circumstances it can be useful to represent the operation of no preprocessing. This is done using the NULLCPO object. If it is applied to data, attached to a Learner or composed with another CPO, the result is not modified.

+
identical(iris %>>% NULLCPO, iris)
+#> [1] TRUE
+identical(cpoPca() %>>% NULLCPO, cpoPca())
+#> [1] TRUE
+identical(NULLCPO %>>% makeLearner("classif.logreg"), makeLearner("classif.logreg"))
+#> [1] TRUE
+
+ +

CPO Multiplexer

+

The multiplexer makes it possible to combine many CPOs into one, with an extra selected.cpo parameter that chooses between them.

+
cpm = cpoMultiplex(list(cpoScale, cpoPca))
+!cpm
+#> Trafo chain of 1 cpos:
+#> multiplex(selected.cpo = scale, scale.center = TRUE, scale.scale = TRUE, pca.center = TRUE, pca.scale = FALSE)
+#> Operating: feature
+#> ParamSet:
+#>                  Type len   Def    Constr Req Tunable Trafo
+#> selected.cpo discrete   - scale scale,pca   -    TRUE     -
+#> scale.center  logical   -  TRUE         -   Y    TRUE     -
+#> scale.scale   logical   -  TRUE         -   Y    TRUE     -
+#> pca.center    logical   -  TRUE         -   Y    TRUE     -
+#> pca.scale     logical   - FALSE         -   Y    TRUE     -
+
+ +
head(iris %>>% setHyperPars(cpm, selected.cpo = "scale"))
+#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
+#> 1   -0.8976739  1.01560199    -1.335752   -1.311052  setosa
+#> 2   -1.1392005 -0.13153881    -1.335752   -1.311052  setosa
+#> 3   -1.3807271  0.32731751    -1.392399   -1.311052  setosa
+#> 4   -1.5014904  0.09788935    -1.279104   -1.311052  setosa
+#> 5   -1.0184372  1.24503015    -1.335752   -1.311052  setosa
+#> 6   -0.5353840  1.93331463    -1.165809   -1.048667  setosa
+
+ +
head(iris %>>% setHyperPars(cpm, selected.cpo = "pca"))
+#>   Species       PC1        PC2         PC3          PC4
+#> 1  setosa -2.684126 -0.3193972  0.02791483  0.002262437
+#> 2  setosa -2.714142  0.1770012  0.21046427  0.099026550
+#> 3  setosa -2.888991  0.1449494 -0.01790026  0.019968390
+#> 4  setosa -2.745343  0.3182990 -0.03155937 -0.075575817
+#> 5  setosa -2.728717 -0.3267545 -0.09007924 -0.061258593
+#> 6  setosa -2.280860 -0.7413304 -0.16867766 -0.024200858
+
+ +

Every CPO's Hyperparameters are exported:

+
head(iris %>>% setHyperPars(cpm, selected.cpo = "scale", scale.center = FALSE))
+#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
+#> 1    0.8613268   1.1296201    0.3362663    0.140405  setosa
+#> 2    0.8275493   0.9682458    0.3362663    0.140405  setosa
+#> 3    0.7937718   1.0327956    0.3122473    0.140405  setosa
+#> 4    0.7768830   1.0005207    0.3602853    0.140405  setosa
+#> 5    0.8444380   1.1618950    0.3362663    0.140405  setosa
+#> 6    0.9119931   1.2587196    0.4083234    0.280810  setosa
+
+ +

This makes it possible to tune over many different CPO configurations at once.

+

CBind CPO

+

The operation of using cbind on the result of multiple CPOs. cpoCbind makes it possible to build CPOs that perform different operations on data and paste the results next to each other.

+
cbnd = cpoCbind(scaled = cpoScale(), pca = cpoPca())
+head(iris %>>% cbnd)
+#>   scaled.Sepal.Length scaled.Sepal.Width scaled.Petal.Length
+#> 1          -0.8976739         1.01560199           -1.335752
+#> 2          -1.1392005        -0.13153881           -1.335752
+#> 3          -1.3807271         0.32731751           -1.392399
+#> 4          -1.5014904         0.09788935           -1.279104
+#> 5          -1.0184372         1.24503015           -1.335752
+#> 6          -0.5353840         1.93331463           -1.165809
+#>   scaled.Petal.Width scaled.Species pca.Species   pca.PC1    pca.PC2
+#> 1          -1.311052         setosa      setosa -2.684126 -0.3193972
+#> 2          -1.311052         setosa      setosa -2.714142  0.1770012
+#> 3          -1.311052         setosa      setosa -2.888991  0.1449494
+#> 4          -1.311052         setosa      setosa -2.745343  0.3182990
+#> 5          -1.311052         setosa      setosa -2.728717 -0.3267545
+#> 6          -1.048667         setosa      setosa -2.280860 -0.7413304
+#>       pca.PC3      pca.PC4
+#> 1  0.02791483  0.002262437
+#> 2  0.21046427  0.099026550
+#> 3 -0.01790026  0.019968390
+#> 4 -0.03155937 -0.075575817
+#> 5 -0.09007924 -0.061258593
+#> 6 -0.16867766 -0.024200858
+
+ +

It is even possible to build complex DAGs of preprocessing operators. In the following example, cpoCbind recognizes that cpoFilterVariance comes +before both cpoScale and cpoPca and performs filtering only once. +The original data is pasted next to the scaled and PCA'd data by having a NULLCPO slot +which does not change any data.

+
flt = cpoFilterVariance(abs = 2, export = "abs")
+cbnd = cpoCbind(scale = flt %>>% cpoScale(), pca = flt %>>% cpoPca(), NULLCPO)
+head(getTaskData(iris.task %>>% cbnd))
+#>   Species scale.Sepal.Length scale.Petal.Length   pca.PC1     pca.PC2
+#> 1  setosa         -0.8976739          -1.335752 -2.460241 -0.24479165
+#> 2  setosa         -1.1392005          -1.335752 -2.538962 -0.06093579
+#> 3  setosa         -1.3807271          -1.392399 -2.709611  0.08355948
+#> 4  setosa         -1.5014904          -1.279104 -2.565116  0.25420858
+#> 5  setosa         -1.0184372          -1.335752 -2.499602 -0.15286372
+#> 6  setosa         -0.5353840          -1.165809 -2.066375 -0.40249369
+#>   Sepal.Length Sepal.Width Petal.Length Petal.Width
+#> 1          5.1         3.5          1.4         0.2
+#> 2          4.9         3.0          1.4         0.2
+#> 3          4.7         3.2          1.3         0.2
+#> 4          4.6         3.1          1.5         0.2
+#> 5          5.0         3.6          1.4         0.2
+#> 6          5.4         3.9          1.7         0.4
+
+ +

The order of operations can be inspected in a crude ASCII graph when looking at the verbose printout of cbnd. The output of variance is fed into both pca and scale.

+
!cbnd
+#> Trafo chain of 1 cpos:
+#> cbind(variance.abs = 2, scale.center = TRUE, scale.scale = TRUE, pca.center = TRUE, pca.scale = FALSE)
+#> Operating: feature
+#> ParamSet:
+#>                 Type len    Def   Constr Req Tunable Trafo
+#> variance.abs integer   - <NULL> 0 to Inf   -    TRUE     -
+#> scale.center logical   -   TRUE        -   -    TRUE     -
+#> scale.scale  logical   -   TRUE        -   -    TRUE     -
+#> pca.center   logical   -   TRUE        -   -    TRUE     -
+#> pca.scale    logical   -  FALSE        -   -    TRUE     -
+#> O>+   variance(abs = 2)[not exp'd: perc = <NULL>, threshold = <NULL>]
+#> | |  
+#> O |   pca(center = TRUE, scale = FALSE)[not exp'd: tol = <NULL>, rank =
+#> | |  <NULL>]
+#> | |  
+#> +<O   scale(center = TRUE, scale = TRUE)
+#> |  
+#> O   CBIND[scale,pca,]
+#> 
+
+ +

The parameters of the internal CPOs are exported and can be manipulated and tuned.

+
getParamSet(cbnd)
+#>                 Type len    Def   Constr Req Tunable Trafo
+#> variance.abs integer   - <NULL> 0 to Inf   -    TRUE     -
+#> scale.center logical   -   TRUE        -   -    TRUE     -
+#> scale.scale  logical   -   TRUE        -   -    TRUE     -
+#> pca.center   logical   -   TRUE        -   -    TRUE     -
+#> pca.scale    logical   -  FALSE        -   -    TRUE     -
+
+ +

Custom CPOs

+

Even though CPOs are very flexible and can be combined in many ways, it may be necessary to create completely custom CPOs. +Custom CPOs can be created using the makeCPO() function (and similar related functions). +Its most important arguments are cpo.train and cpo.retrafo, both of which are functions. +In principle, a CPO needs a function that "trains" a control object depending on the data (cpo.train), +and another function that uses this control object, and new data, to perform the preprocessing operation (cpo.retrafo). +The cpo.train-function must return a "control" object which contains all information about how to transform a given dataset. +cpo.retrafo takes a (potentially new!) dataset and the "control" object returned by cpo.trafo, and transforms the new data according to plan. +See mlrCPO vignettes or help(makeCPO) for a more thorough description of how to create custom CPOs.

+
names(formals(makeCPO))  # see help(makeCPO) for explanation of arguments
+#>  [1] "cpo.name"                       "par.set"                       
+#>  [3] "par.vals"                       "dataformat"                    
+#>  [5] "dataformat.factor.with.ordered" "export.params"                 
+#>  [7] "fix.factors"                    "properties.data"               
+#>  [9] "properties.adding"              "properties.needed"             
+#> [11] "properties.target"              "packages"                      
+#> [13] "cpo.train"                      "cpo.retrafo"
+
+ +
constFeatRem = makeCPO("constFeatRem",
+  dataformat = "df.features",
+  cpo.train = function(data, target) {
+    names(Filter(function(x) {  # names of columns to keep
+      length(unique(x)) > 1
+    }, data))
+  },
+  cpo.retrafo = function(data, control) {
+    data[control]
+  })
+
+!constFeatRem
+#> <<CPO constFeatRem()>>
+#> 
+#> cpo.trafo:
+#> function(data, target) {
+#>     names(Filter(function(x) {  # names of columns to keep
+#>       length(unique(x)) > 1
+#>     }, data))
+#>   }
+#> <environment: 0x55f52aa2a488>
+#> 
+#> cpo.retrafo:
+#> function(data, control) {
+#>     data[control]
+#>   }
+#> <environment: 0x55f52aa2a488>
+
+ +

This CPO can be used on the head() of the iris dataset. Since the "Species" entry for the first six rows of iris is constant, it is removed +by this CPO.

+
head(iris)
+#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
+#> 1          5.1         3.5          1.4         0.2  setosa
+#> 2          4.9         3.0          1.4         0.2  setosa
+#> 3          4.7         3.2          1.3         0.2  setosa
+#> 4          4.6         3.1          1.5         0.2  setosa
+#> 5          5.0         3.6          1.4         0.2  setosa
+#> 6          5.4         3.9          1.7         0.4  setosa
+
+ +
head(iris) %>>% constFeatRem()
+#>   Sepal.Length Sepal.Width Petal.Length Petal.Width
+#> 1          5.1         3.5          1.4         0.2
+#> 2          4.9         3.0          1.4         0.2
+#> 3          4.7         3.2          1.3         0.2
+#> 4          4.6         3.1          1.5         0.2
+#> 5          5.0         3.6          1.4         0.2
+#> 6          5.4         3.9          1.7         0.4
+
+ +

Complete code listing

+

The above code without the output is given below:

+
## vignette("a_1_getting_started", package = "mlrCPO") 
+library("mlrCPO") 
+cpoAddCols  # a cpo constructor 
+!cpoAddCols  # more information 
+
+# create a CPO object that adds a new column 
+cpo = cpoAddCols(Sepal.Area = Sepal.Length * Sepal.Width)  
+head(iris %>>% cpo) 
+head(getTaskData(iris.task %>>% cpo)) 
+cpo %>>% cpoScale() 
+lrn = cpo %>>% makeLearner("classif.randomForest") 
+model = train(lrn, iris.task) 
+getFeatureImportance(model$learner.model$next.model) 
+listCPO()$name 
+cpoScale 
+do.center = cpoScale(scale = FALSE, center = TRUE) 
+!do.center  # note the 'scale.' prefix 
+do.scale = setHyperPars(do.center, 
+  scale.scale = TRUE, scale.center = FALSE) 
+cpo = cpoScale() %>>% cpoPca() 
+lrn = cpo %>>% makeLearner("classif.logreg") 
+print(lrn) 
+combined = cpoScale(scale = TRUE, center = FALSE, id = "scale") %>>% 
+  cpoScale(scale = FALSE, center = TRUE, id = "center") 
+!combined 
+center = cpoScale(scale = FALSE, center = TRUE, export = "center") 
+!center 
+cpo = cpoScale(affect.pattern = "^Sepal") 
+head(iris %>>% cpo) 
+lrn = cpoPca() %>>% makeLearner("classif.randomForest") 
+model = train(lrn, iris.task) 
+
+retr = retrafo(model) 
+state = getCPOTrainedState(retr) 
+state$control$rotation 
+(clrn = cpoFilterFeatures(export = c("method", "abs")) %>>% makeLearner("classif.knn")) 
+getParamIds(getParamSet(clrn)) 
+ps = makeParamSet( 
+    makeDiscreteParam( 
+        "filterFeatures.method", 
+        values = list("anova.test", "variance", "chi.squared")), 
+    makeIntegerParam( 
+        "filterFeatures.abs", 
+        lower = 1, upper = 8), 
+    makeIntegerParam( 
+        "k", 
+        lower = 1, upper = 10)) 
+
+tuneParams(clrn, pid.task, cv5, par.set = ps, 
+           control = makeTuneControlGrid(), 
+           show.info=FALSE) 
+identical(iris %>>% NULLCPO, iris) 
+identical(cpoPca() %>>% NULLCPO, cpoPca()) 
+identical(NULLCPO %>>% makeLearner("classif.logreg"), makeLearner("classif.logreg")) 
+cpm = cpoMultiplex(list(cpoScale, cpoPca)) 
+!cpm 
+head(iris %>>% setHyperPars(cpm, selected.cpo = "scale")) 
+head(iris %>>% setHyperPars(cpm, selected.cpo = "scale", scale.center = FALSE)) 
+head(iris %>>% setHyperPars(cpm, selected.cpo = "pca")) 
+cbnd = cpoCbind(scaled = cpoScale(), pca = cpoPca()) 
+head(iris %>>% cbnd) 
+flt = cpoFilterVariance(abs = 2, export = "abs") 
+cbnd = cpoCbind(scale = flt %>>% cpoScale(), pca = flt %>>% cpoPca(), NULLCPO) 
+head(getTaskData(iris.task %>>% cbnd)) 
+!cbnd 
+getParamSet(cbnd) 
+names(formals(makeCPO))  # see help(makeCPO) for explanation of arguments 
+constFeatRem = makeCPO("constFeatRem", 
+  dataformat = "df.features", 
+  cpo.train = function(data, target) { 
+    names(Filter(function(x) {  # names of columns to keep 
+      length(unique(x)) > 1 
+    }, data)) 
+  }, 
+  cpo.retrafo = function(data, control) { 
+    data[control] 
+  }) 
+!constFeatRem 
+head(iris) 
+head(iris) %>>% constFeatRem()
+