Skip to main content

Data Conventions

Below a list of data standardization conventions for data curated by the Technology and Data Management Core. See the codebook for more information.

Note

Only data released in or after Decemeber 2022, via the Technology and Data Management Core, has these standardizations applied.

Variable Names

  • Format
    • Always lowercase
    • No special characters
    • No spaces - use underscore as delimeter
    • i.e., snakecase
  • Prefix, Suffixes
    • Where reasonable, applying prefixes (e.g., promis29_fatigue) or suffixes (e.g., cog_raw, cog_primary_score) that make programmatic variable search easier

File formats

  • CSV is the preferred data format from the perspective of interoperability
  • If other data formats are readily needed, please communicate this to Nelson Roque

Missing Data

  • To denote missing data, a file with NA (for R users) and -999 (for SAS users) will be provided.

Filenames

  • Format: verbose filenames with date saved as the suffix before the file type.
    • Example: tidy_eas_ema_all_surveys_session_level_2023_01_05_09_40_02.csv
      • tidy = processed via the TDM Core
      • eas = study shortname (EAS)
      • ema_all_surveys = data type
      • session_level = granularity of the data
      • 2023_01_05_09_40_02 = date and time saved (in EST)