Research data may have a lifecycle that goes through various phases. Initially, the data may be considered public until the actual research is started. During the data gathering and analysis phase, the information is likely to fall more in the restricted category due to the level of harm if the information were to be released inadvertently. After this stage, the data may be available for publication, which would move the information into a more public category.
Categorize your Data
The University has defined three levels of data categories and there are types of data examples within each category. However, as described above, the categories may not fully represent the data set depending on the phase of the research.
To categorize your data appropriately, consider the following.
- What phase is your research in at the moment?
- What is the level of harm that would be endured if the data were to be inadvertently disclosed?
- Does the data have legislative obligations? Legislatively controlled data may include Protected Health Information (PHI), credit card information being used, stored or processed as a merchant, Personally Identifiable Information (PII) such as a first name or first initial and last name combined with a Social Security Number (SSN), Driver’s License Number, credit card number or bank account number. Research may also require Institutional Review Board (IRB) or other committee approval (http://www.research.psu.edu/orp).
Categories of Data and Level of Harm
Public: Public data are intended for distribution to the general public, both internal and external to the University. The release of the data would have no or minimal damage to the institution.
Internal/Controlled: Internal/controlled data is intended for distribution within the University only, generally to defined subsets of the user population. The release of the data has the potential to create moderate damage to the institution. (Such damage may be legal, academic [loss or alteration of intellectual property] financial, or intangible [loss of reputation]).
Restricted: Restricted data are those which the University has legal, regulatory, policy or contractual obligations to protect. Access to restricted data must be strictly and individually controlled and logged. The release of such data has the potential to create major damage to the institution. (Such damage may be legal, academic [loss or alteration of intellectual property], financial, or intangible [loss of reputation]).
Key Concepts for Data Categorization
(Resources to all four concepts can be found at: http://datacat.psu.edu/data-profile-search)
- Categorize. Know how to categorize data.
- Secure. Do your part and secure the data (everyone is responsible, not just IT staff).
- Store. Store data according to the category in permissible locations only.
- Preserve. Keep data for the proper amount of time and destroy according to the retention schedule.
Categorize your data accordingly by using the criteria above.
- Store data only where permissible. There may be free or for fee cloud resources available that allow ease of use, large amounts of space and quick access to the data. However, if the proper channels are not followed before storing non-public data on cloud resources, your research or other data could be at risk.
- Before selecting a storage solution, consider the following.
- Does the University have a site license available with the vendor?
- If not, get Risk Management and/or Purchasing involved so the proper contractual language can be negotiated with the vendor to ensure the data safeguards, liability, confidentiality, etc.
- Click through agreements also require special vetting through Risk Management if the data is categorized as Restricted.
- Some research grants require special data safeguards and/or data use agreements. Before signing off on the grant, be sure to involve your local IT staff or consult with Security Operations and Services (firstname.lastname@example.org) to ensure the proper controls are in place. It is especially important to be aware of any legislative obligations the data may carry, such as Protected Health Information (PHI) that falls under the Health Insurance Portability Accountability Act (HIPAA) and requires certain administrative, physical and technical controls be in place prior to use, processing or storage.