ELI5: What is Data Classification?

Think about how a library organizes books. Some are on open shelves anyone can grab, some are in a special section you need a library card for, and some rare books are locked in a glass case. Data classification works the same way — a company sorts its information into groups based on how secret or important it is, then decides who can see it and how carefully it needs to be protected. The most sensitive stuff gets the strongest locks.

Overview

Data classification is the process of categorizing data based on its sensitivity, value, and regulatory requirements to determine the appropriate level of protection. Proper classification ensures that the most sensitive data receives the strongest controls while avoiding excessive spending on low-value data. Classification is a prerequisite for effective data loss prevention and access control.

Key Concepts

  • military classifications: Top Secret, Secret, Confidential, Unclassified
  • private sector classifications: Confidential/Restricted, Private/Internal, Public
  • Classification criteria — regulatory requirements, business value, sensitivity, impact if disclosed
  • Data roles:
    • Data owner — senior leader accountable for the data; sets classification level
    • Data custodian — IT staff responsible for implementing controls (backups, encryption)
    • Data steward — ensures data quality and proper use of metadata
    • Data processor — entity that processes data on behalf of the controller (GDPR term)
    • Data controller — entity that determines purposes and means of processing (GDPR term)
  • Data states — data at rest, data in transit, data in use; each requires appropriate protection
  • Labeling and marking — applying headers, footers, watermarks, or metadata tags to classified data
  • Handling procedures — storage, transmission, retention, and destruction rules per classification level
  • Declassification — reducing the classification level when sensitivity decreases over time
  • Information life cycle — creation, classification, storage, usage, archival, destruction
  • PIA (Privacy Impact Assessment) — analysis of how personally identifiable information is collected, used, shared, and protected
  • DPO (Data Protection Officer) — role responsible for ensuring the organization’s compliance with privacy regulations
  • Pseudo-anonymization — replacing identifying fields with artificial identifiers; reversible with the right key (unlike full anonymization)

Exam Tips

Remember

Data owner = business leader who decides classification. Data custodian = IT person who implements technical controls. The exam tests these role distinctions heavily. Remember: owner decides, custodian protects.

Connections

  • Closely tied to data-protection which implements the technical controls based on classification levels
  • Supports privacy by ensuring personal data is identified and handled according to regulations
  • See also dlp for tools that enforce classification-based data handling rules

Practice Questions

Scenario

See case-data-classification for a practical DevOps scenario applying these concepts.