Skip to content

Unstructured Data Discovery Is Really Organizational Knowledge Management

The purpose of Unstructured Data Discovery

Data discovery is generally used to create an inventory of all corporate data, structured and unstructured, identifying regulated data (e.g. CCPA, GDPR) and data that is business-sensitive and / or mission-critical. Furthermore, it is the first step in establishing data-centric security, governance, policies and controls. For such controls to be effective, they need to be closely aligned with how business users share data. This ensures that controls for unstructured data are aligned with how business users share data. Mapping the discovered data to business processes, business-orientated functions and artifacts must be the goal of discovery. This in turn ensures that discovery aligns with (and actively contributes to) knowledge management.

What is Knowledge Management

What is Knowledge Management

Knowledge Management is the process of capturing, distributing, and effectively using knowledge. - Davenport

Knowledge Management is often based on building a catalog or dictionary of information. Software platforms which support knowledge management are almost universally built around a hierarchical taxonomy. For the purpose of governance, a hierarchical business process / function catalog is usually the best option.

Unstructured data discovery requires investment and technology as the volumes of data are often very large. You can read about how DocAuthority saves businesses time and money here.

Why use the Knowledge Management Approach

To protect the corporate landscape against a potential breach, relying on existing data classifications or legacy permissions may not be the best method. Without a well-planned, centralized approach, managing the threat of a breach is a substantial challenge.

Limitations of regional and siloed data protection implementations

Big Picture

Without a unified view of data assets and their business affiliation and associated risk, no business-wide, holistic policy baseline can exist. Furthermore, it is hard to quantify (in financial, legal and brand terms) the scale or business impact of a breach. A unified view makes it much easier to justify what data should be protected and to what extent.


Data classification, DLP, access management and data retention activities will vary significantly among the different business units and departments. When defining policies on a per file or folder basis, specifically classification, we rely on end users to make informed decisions. Unfortunately, consistent security knowledge or the prioritization of security activities by end users cannot be relied upon. The resulting variance is considerable. Hence, classification is inconsistent and, nearly always, incomplete.


It is difficult to recover from large-scale breaches as it will take significant resources and time to identify what was compromised or lost for a large, cross silo dataset.


Data privacy is a challenge. It is hard to differentiate between the different types of documents that contain PII, identify their purpose and their authorized use.

What is a Sensitive Data Catalog and How Does it Address the Above Limitations?

To see the big picture and reduce risk management complexity, a data catalog is required. A catalog is a centralized “Yellow Pages” for sensitive / mission critical and regulated information. The catalog maps the data hierarchically within the organization in a way that is comprehensible to security, business, and management.

The data catalog items describe the “What” and answer the question “What data do we actually handle here?”. Each catalog item deals with the type of data, the essence. Therefore it makes it possible to subsequently define the specific policy for each data type. As an example, you might have a department (or more than one) that handles “suppliers’ contracts”. This type of data may not be homogeneous, as there might be several types of “suppliers’ contracts” across the company, and several related assets associated with those contracts. These may span to many physical locations across the company, making it difficult to apply the exact intended policy to govern this specific data type everywhere, in any other approach.

A data catalog enables policy baselining for data risk management, protection, and governance across the entire organization, enabling high quality, consistent and systematic enforcement. Consistency and quality are enabled because policies are assigned to catalog items that share the same business use. Departmental SMEs define policies based on the catalog items (and their business usage), rather than rely on end users’ decisions.

How to Build a Data Catalog

Building a sensitive data catalog involves five steps:

  • Catalog taxonomy assembly – Build the catalog directory.
  • Discovery and identification – Create an inventory of all data (business) categories (e.g., employment agreements, offer letters, quotes, price lists)
  • Mapping – Map the discovered data categories to the data catalog (e.g., map employment agreements and offer letters into the “Employment” category item).
  • Policy definition – Define and assign access, DLP, retention and classification policies.
  • Remediation – Remediate according to the policies – first pass and on an ongoing basis.


A data catalog enables a consistent, systematic and unified approach to data-centric security and governance for mid-size and large organizations. If done correctly, it is a one-time effort that can sustain itself over time. It is best to manage an unstructured data catalog separately from the structured data catalog, as their structure and policies differ.

Vertical Intelligence Insights

Please fill out the form and someone from our team will be back to you within 48 hours

Get in Touch