Skip to main content

Data Conventions

Below a list of data standardization conventions for data curated by the Technology and Data Management Core. See the codebook for more information.

Note

Only data released in or after Decemeber 2022, via the Technology and Data Management Core, has these standardizations applied.

Variable Names

Format
- Always lowercase
- No special characters
- No spaces - use underscore as delimeter
- i.e., snakecase
Prefix, Suffixes
- Where reasonable, applying prefixes (e.g., promis29_fatigue) or suffixes (e.g., cog_raw, cog_primary_score) that make programmatic variable search easier

File formats

CSV is the preferred data format from the perspective of interoperability
If other data formats are readily needed, please communicate this to Nelson Roque

Missing Data

To denote missing data, a file with NA (for R users) and -999 (for SAS users) will be provided.

Filenames

Format: verbose filenames with date saved as the suffix before the file type.
- Example: tidy_eas_ema_all_surveys_session_level_2023_01_05_09_40_02.csv
  - tidy = processed via the TDM Core
  - eas = study shortname (EAS)
  - ema_all_surveys = data type
  - session_level = granularity of the data
  - 2023_01_05_09_40_02 = date and time saved (in EST)

Variable Names
File formats
Missing Data
Filenames