Skip to main content

Data Conventions

Below a list of data standardization conventions for data curated by the Technology and Data Management Core:

warning

Please note that for data released in or after Decemeber 2022, via the Technology and Data Management Core has these standardizations applied.

Variable Names

  • Format
    • Always lowercase
    • No special characters
    • No spaces - use underscore as delimeter
    • i.e., snakecase
  • Prefix, Suffixes
    • Where reasonable, applying prefixes (e.g., promis29_fatigue) or suffixes (e.g., cog_raw, cog_primary_score) that make programmatic variable search easier

File formats

  • CSV is the preferred data format from the perspective of interoperability
  • If other data formats are readily needed, please communicate this to Nelson Roque

Missing Data

  • To denote missing data, a file with NA (for R users) and -999 (for SAS users) will be provided.

Filenames

  • Format: verbose filenames with date saved as the suffix before the file type.
    • Example: tidy_eas_ema_all_surveys_session_level_2023_01_05_09_40_02.csv
      • tidy = processed via the TDM Core
      • eas = study shortname (EAS)
      • ema_all_surveys = data type
      • session_level = granularity of the data
      • 2023_01_05_09_40_02 = date and time saved (in EST)