Data Conventions
Below a list of data standardization conventions for data curated by the Technology and Data Management Core:
warning
Please note that for data released in or after Decemeber 2022, via the Technology and Data Management Core has these standardizations applied.
Variable Names
- Format
- Always lowercase
- No special characters
- No spaces - use underscore as delimeter
- i.e., snakecase
- Prefix, Suffixes
- Where reasonable, applying prefixes (e.g., promis29_fatigue) or suffixes (e.g., cog_raw, cog_primary_score) that make programmatic variable search easier
File formats
- CSV is the preferred data format from the perspective of interoperability
- If other data formats are readily needed, please communicate this to Nelson Roque
Missing Data
- To denote missing data, a file with NA (for R users) and -999 (for SAS users) will be provided.
Filenames
- Format: verbose filenames with date saved as the suffix before the file type.
- Example:
tidy_eas_ema_all_surveys_session_level_2023_01_05_09_40_02.csv
tidy
= processed via the TDM Coreeas
= study shortname (EAS)ema_all_surveys
= data typesession_level
= granularity of the data2023_01_05_09_40_02
= date and time saved (in EST)
- Example: