|  | 
|  | 1 | +--- | 
|  | 2 | +description: 'R language and document formats (R, Rmd, Quarto): coding standards and Copilot guidance for idiomatic, safe, and consistent code generation.' | 
|  | 3 | +applyTo: '**/*.R, **/*.r, **/*.Rmd, **/*.rmd, **/*.qmd' | 
|  | 4 | +--- | 
|  | 5 | + | 
|  | 6 | +# R Programming Language Instructions | 
|  | 7 | + | 
|  | 8 | +## Purpose | 
|  | 9 | + | 
|  | 10 | +Help GitHub Copilot generate idiomatic, safe, and maintainable R code across projects. | 
|  | 11 | + | 
|  | 12 | +## Core Conventions | 
|  | 13 | + | 
|  | 14 | +- **Match the project’s style.** If the file shows a preference (tidyverse vs. base R, `%>%` vs. `|>`), follow it. | 
|  | 15 | +- **Prefer clear, vectorized code.** Keep functions small and avoid hidden side effects. | 
|  | 16 | +- **Qualify non-base functions in examples/snippets**, e.g., `dplyr::mutate()`, `stringr::str_detect()`. In project code, using `library()` is acceptable when that’s the repo norm. | 
|  | 17 | +- **Naming:** `lower_snake_case` for objects/files; avoid dots in names. | 
|  | 18 | +- **Side effects:** Never call `setwd()`; prefer project-relative paths (e.g., `here::here()`). | 
|  | 19 | +- **Reproducibility:** Set seeds locally around stochastic operations using `withr::with_seed()`. | 
|  | 20 | +- **Validation:** Validate and constrain user inputs; use typed checks and allowlists where possible. | 
|  | 21 | +- **Safety:** Avoid `eval(parse())`, unvalidated shell calls, and unparameterized SQL. | 
|  | 22 | + | 
|  | 23 | +### Pipe Operators | 
|  | 24 | + | 
|  | 25 | +- **Native pipe `|>` (R ≥ 4.1.0):** Prefer in R ≥ 4.1 (no extra dependency). | 
|  | 26 | +- **Magrittr pipe `%>%`:** Continue using in projects already committed to magrittr or when you need features like `.`, `%T>%`, or `%$%`. | 
|  | 27 | +- **Be consistent:** Don't mix `|>` and `%>%` within the same script unless there's a clear technical reason. | 
|  | 28 | + | 
|  | 29 | +## Performance Considerations | 
|  | 30 | + | 
|  | 31 | +- **Large datasets:** consider `data.table`; benchmark with your workload. | 
|  | 32 | +- **dplyr compatibility:** Use `dtplyr` to write dplyr syntax that translates to data.table operations automatically for performance gains. | 
|  | 33 | +- **Profiling:** Use `profvis::profvis()` to identify performance bottlenecks in your code. Profile before optimizing. | 
|  | 34 | +- **Caching:** Use `memoise::memoise()` to cache expensive function results. Particularly useful for repeated API calls or complex computations. | 
|  | 35 | +- **Vectorization:** Prefer vectorized operations over loops. Use `purrr::map_*()` family or `apply()` family for remaining iteration needs. | 
|  | 36 | + | 
|  | 37 | +## Tooling & Quality | 
|  | 38 | + | 
|  | 39 | +- **Formatting:** `styler` (tidyverse style), two-space indents, ~100-char lines. | 
|  | 40 | +- **Linting:** `lintr` configured via `.lintr`. | 
|  | 41 | +- **Pre-commit:** consider `precommit` hooks to lint/format automatically. | 
|  | 42 | +- **Docs:** roxygen2 for exported functions (`@param`, `@return`, `@examples`). | 
|  | 43 | +- **Tests:** prefer small, pure, composable functions that are easy to unit test. | 
|  | 44 | +- **Dependencies:** manage with `renv`; snapshot after adding packages. | 
|  | 45 | +- **Paths:** prefer `fs` and `here` for portability. | 
|  | 46 | + | 
|  | 47 | +## Data Wrangling & I/O | 
|  | 48 | + | 
|  | 49 | +- **Data frames:** prefer tibbles in tidyverse-heavy files; otherwise base `data.frame()` is fine. | 
|  | 50 | +- **Iteration:** use `purrr` in tidyverse code. In base-style code, prefer type-stable, vectorized patterns such as `vapply()` | 
|  | 51 | +   (for atomic outputs) or `Map()` (for elementwise operations) instead of explicit `for` loops when they improve clarity or performance. | 
|  | 52 | +- **Strings & Dates:** use `stringr`/`lubridate` where already present; otherwise use clear base helpers (e.g., `nchar()`, `substr()`, `as.Date()` with explicit format). | 
|  | 53 | +- **I/O:** prefer explicit, typed readers (e.g., `readr::read_csv()`); make parsing assumptions explicit. | 
|  | 54 | + | 
|  | 55 | +## Plotting | 
|  | 56 | + | 
|  | 57 | +- Prefer `ggplot2` for publication-quality plots. Keep layers readable and label axes and units. | 
|  | 58 | + | 
|  | 59 | +## Error Handling | 
|  | 60 | + | 
|  | 61 | +- In tidyverse contexts, use `rlang::abort()` / `rlang::warn()` for structured conditions; in base-only code, use `stop()` / `warning()`. | 
|  | 62 | +- For recoverable operations: | 
|  | 63 | +- Use `purrr::possibly()` when you want a typed fallback value of the same type (simpler). | 
|  | 64 | +- Use `purrr::safely()` when you need to capture both results and errors for later inspection or logging. | 
|  | 65 | +- Use `tryCatch()` in base R for fine-grained control or compatibility with non-tidyverse code. | 
|  | 66 | +- Prefer consistent return structures—typed outputs for normal flows, structured lists only when error details are required. | 
|  | 67 | + | 
|  | 68 | +## Security Best Practices | 
|  | 69 | + | 
|  | 70 | +- **Command execution:** Prefer `processx::run()` or `sys::exec_wait()` over `system()`; validate and sanitize all arguments. | 
|  | 71 | +- **Database queries:** Use parameterized `DBI` queries to prevent SQL injection. | 
|  | 72 | +- **File paths:** Normalize and sanitize user-provided paths (e.g., `fs::path_sanitize()`), and validate against allowlists. | 
|  | 73 | +- **Credentials:** Never hardcode secrets. Use env vars (`Sys.getenv()`), config outside VCS, or `keyring`. | 
|  | 74 | + | 
|  | 75 | +## Shiny | 
|  | 76 | + | 
|  | 77 | +- Modularize UI and server logic for non-trivial apps. Use `eventReactive()` / `observeEvent()` for explicit dependencies. | 
|  | 78 | +- Validate inputs with `req()` and clear, user-friendly messages. | 
|  | 79 | +- Use connection pooling (`pool`) for databases; avoid long-lived global objects. | 
|  | 80 | +- Isolate expensive computations and prefer `reactiveVal()` / `reactiveValues()` for small state. | 
|  | 81 | + | 
|  | 82 | +## R Markdown / Quarto | 
|  | 83 | + | 
|  | 84 | +- Keep chunks focused; prefer explicit chunk options (`echo`, `message`, `warning`). | 
|  | 85 | +- Avoid global state; prefer local helpers. Use `withr::with_seed()` for deterministic chunks. | 
|  | 86 | + | 
|  | 87 | +## Copilot-Specific Guidance | 
|  | 88 | + | 
|  | 89 | +- If the current file uses tidyverse, **suggest tidyverse-first patterns** (e.g., `dplyr::across()` instead of superseded verbs). If base-R style is present, **use base idioms**. | 
|  | 90 | +- Qualify non-base calls in suggestions (e.g., `dplyr::mutate()`). | 
|  | 91 | +- Suggest vectorized or tidy solutions over loops when idiomatic. | 
|  | 92 | +- Prefer small helper functions over long pipelines. | 
|  | 93 | +- When multiple approaches are equivalent, prefer readability and type stability and explain the trade-offs. | 
|  | 94 | + | 
|  | 95 | +--- | 
|  | 96 | + | 
|  | 97 | +## Minimal Examples | 
|  | 98 | + | 
|  | 99 | +```r | 
|  | 100 | +# Base R variant | 
|  | 101 | +scores <- data.frame(id = 1:5, x = c(1, 3, 2, 5, 4)) | 
|  | 102 | +safe_log <- function(x) tryCatch(log(x), error = function(e) NA_real_) | 
|  | 103 | +scores$z <- vapply(scores$x, safe_log, numeric(1)) | 
|  | 104 | + | 
|  | 105 | +# Tidyverse variant (if this file uses tidyverse) | 
|  | 106 | +result <- tibble::tibble(id = 1:5, x = c(1, 3, 2, 5, 4)) |> | 
|  | 107 | +dplyr::mutate(z = purrr::map_dbl(x, purrr::possibly(log, otherwise = NA_real_))) |> | 
|  | 108 | +dplyr::filter(z > 0) | 
|  | 109 | + | 
|  | 110 | +# Example reusable helper with roxygen2 doc | 
|  | 111 | +#' Compute the z-score of a numeric vector | 
|  | 112 | +#' @param x A numeric vector | 
|  | 113 | +#' @return Numeric vector of z-scores | 
|  | 114 | +#' @examples z_score(c(1, 2, 3)) | 
|  | 115 | +z_score <- function(x) (x - mean(x, na.rm = TRUE)) / stats::sd(x, na.rm = TRUE) | 
|  | 116 | +``` | 
0 commit comments