Install R & RStudio locally.
Install package
To install a package: install.packages("DBI")
To install a specif version of a package: install_version("lubridate", version = "1.6.0")
Export to CSV
write.csv(, file=".csv")
SELECT DISTINCT
unique(<data table>$<column>)
Find duplicates
temp <- as.data.frame(table(<data table>$<column>)) temp[temp$Freq > 1,]
Data frame
Number of rows: nrow(<dataframe>)
Top n rows: head(<dataframe>, n)
Filter: <dataframe>[which(column1=="value1" & column2 > value2), ]<dataframe>[which(column1=="value1" & column2 > value2), c("col1", "col2")]
Unpivot
melt
SQL with sqldf
Easy syntax, but overhead to load the table in the database engine. library(sqldf) defaults to SQLite.
To use variables, add the prefix “fn$” (requires package gsubfn)fn$sqldf("select * from iris where \"Sepal.Length\" > $minSL and species = '$species' limit $limit")
References:
- Package documentation (pdf)
- Example taken from here (StackOverflow)
Benchmarking
SET STATISTICS TIME ON
system.time(<statement>)
Save & Load
Provide variable and file name (this example saves in the working directory)
save(<data table>, file ="<file name>.RData")load("<data table>.RData") #load into the eponymous variable