Data Conventions
Below a list of data standardization conventions for data curated by the Technology and Data Management Core. See the codebook for more information.
Note
Only data released in or after Decemeber 2022, via the Technology and Data Management Core, has these standardizations applied.
Variable Names
- Format
- Always lowercase
- No special characters
- No spaces - use underscore as delimeter
- i.e., snakecase
- Prefix, Suffixes
- Where reasonable, applying prefixes (e.g., promis29_fatigue) or suffixes (e.g., cog_raw, cog_primary_score) that make programmatic variable search easier
File formats
- CSV is the preferred data format from the perspective of interoperability
- If other data formats are readily needed, please communicate this to Nelson Roque
Missing Data
- To denote missing data, a file with NA (for R users) and -999 (for SAS users) will be provided.
Filenames
- Format: verbose filenames with date saved as the suffix before the file type.
- Example:
tidy_eas_ema_all_surveys_session_level_2023_01_05_09_40_02.csvtidy= processed via the TDM Coreeas= study shortname (EAS)ema_all_surveys= data typesession_level= granularity of the data2023_01_05_09_40_02= date and time saved (in EST)
- Example: