在使用purrr的時候,會把要計算或是處理的資料以list或是vector的型式導入,通常都會是較複雜的nested資料結構,此時為了方便ETL,需要在pipeline中去調整資料結構,此時就會用到purrr這類vector transformation的函數,主要有七大類:
函數名稱 | 功能 |
---|---|
accumulate, accumulate_right | accumulate recursive folds across a list |
cross, cross2, cross3, cross_df | produce all combinations of list elements |
flatten, flatten_lgl,flatten_int,flatten_dbl, flatten_chr, flatten_dfr, flatten_dfc | Flatten a list of lists into a simple vector |
list_modify, list_update | modify a list |
reduce, reduce_right, reduce2, reduce2_right | reduce a list to a single value by iteratively applying a binary function |
splice | splice objects and lists of objects into a list |
transpose | transpose a list |
這七大類中,在做資料ETL時,flatten系列、reduce系列、list_modift系列(這次整理時才發現新增的,應該會很好用)、splice和transpose的使用率會比較高,在做一些統計運算時,cross系列和accumulate系列則會幫助頗多!