Data Access, Usage, and Sharing Guidelines
Effective data sharing is crucial for advancing research, collaboration, and knowledge discovery. This document provides guidelines for different use cases to ensure responsible and ethical data sharing practices, as well as adherence to NIH data management and sharing policies.
This document outlines some policies and use cases to help address data usage by: (1) principal investigators and their labs, (2) secondary data users, and (3) users of highly identified data.
Please note that these guidelines are in draft form and should be adapted and customized based on feedback from the EAS mPIs.
General Guidelines
Origin of data and Closed Datasets: Data generated at Albert Einstein College of Medicine will first be stored on Einstein Aging Study servers and databases. Data generated at other primary sites will be stored as per local regulations and regulations described in data use agreements. Thereafter digital data will be shared with the EAS Tech and Data Management Core for integration, documentation, validation and archiving. Once closed datasets are ready, they will be shared via the EAS Sharepoint site with all data users.
Emailing data: As the EAS Sharepoint site grows to maturity, ideally we would leverage that storage (or Box) to share data. If you feel that data MUST be emailed, please consult with [Dr. Nelson Roque][mailto:nur375@psu.edu] for alternatives first.
Valid emails: Unless special conditions apply (e.g., sharing with an industry partner with DUA, reliance agreements), data will only be shared with email addresses ending in .edu
Data Use Agreements All users of data outside of Albert Einstein College of Medicine will require a data use agreement (DUA). DUAs are initiated by emailing Mindy Katz (mindy.katz@einsteinmed.edu) and copying Dr. Nelson Roque (nur375@psu.edu). Please allow up to 12 weeks for this process (start to finish). Timelines may vary by the nature of the request and type of institution (university vs industry).
Concept Proposals
- All uses of data must be documented by completing the EAS Concept Proposal Form.
- Concept proposals are reviewed by EAS Study Team members within a reasonable time (~ 1 month or less).
- Feedback is provided that captures any concerns of scientific overlap with existing grant aims, data usage, or other issues.
- Concept proposals may be withdrawn by the proposer at any time via email to the EAS Tech and Data Management Core.
- If and when a concept proposal is approved, the EAS Tech and Data Management Core will be engaged to either prepare a bespoke dataset or share access to - an existing closed dataset that encompasses the needs listed in the request.
- Before data is shared, Data Use Agreements (DUAs; as well as MTAs or Reliance Agreements if needed) will be initiated and approved by all relevant - institutions.
- Failure to submit concept proposals ahead of usage of data may prohibit future usage.
Principal Investigators (and their labs)
As a principal investigator (PI), you are responsible for managing and sharing data generated by your research and data shared with you from the EAS Tech and Data Management Core.
As lab personnel move on to next steps, it is the PIs responsibility to initiate with the EAS Tech and Data Management Core that a new institution will require IRB approval and related Data Usage Agreements (or Material Transfer Agreements, MTAs).
Data cannot be shared with a new university that is not covered under existing Data Use Agreements.
Users of De-identified Secondary Data
Like all other users of data, secondary data users beyond the investigative team must fill out a concept proposal to initiate their data request. All other procedures apply to this user class.
Users of Highly Identified Data
Highly identified data refers to datasets containing personally identifiable information (PII) or sensitive data (e.g. residential addresses).
Any users of highly identified data will need a reliance Agreement between the Albert Einstein College of Medicine and their respective institution.
Users of such data must adhere to strict privacy and confidentiality guidelines.
When using highly identified data, you must ensure it is only for approved research purposes and ensure it is not used for unauthorized or secondary purposes (i.e., not used for anything beyond what is listed in the concept proposal submitted to initiate data access).
Implement appropriate de-identification techniques to reduce the risk of re-identification when sharing or publishing research outputs.
Depending on the nature of the data (e.g., GPS), data access may only be permissible inside of a secure digital enclave.
Data Access and Use
Familiarize yourself with the data's documentation and any specific usage restrictions or licensing agreements. Adhere to any ethical and legal obligations, such as protecting individual privacy and complying with applicable data protection regulations. Use data solely for the stated research purpose and avoid unauthorized redistribution or reuse without proper consent. Determine appropriate data sharing mechanisms and platforms based on the sensitivity and size of the data. Comply with institutional and funding agency policies on data sharing, including any necessary consent requirements. Provide metadata and documentation accompanying shared data to enhance its discoverability and reuse potential.
Data Citation and Attribution
Provide proper attribution and acknowledge the data sources used in your research. Cite the shared dataset or publication associated with the dataset when referencing or publishing research outputs.
The following language may be used exactly as written below:
Data were from the Einstein Aging Study (EAS): Study information, codebooks, and a form to request data access are available through the Einstein Aging Study website, https://einsteinagingstudy.com