We hear a common challenge every day: I have too much stuff on my file shares! These information sources are crucial but start to resemble a hoarder’s garage after decades of use. Get control of file shares through some understanding of information governance, an information audit, and a thorough cleaning.
Enterprises typically experience a core set of problems when their file shares have grown out of control:
Addressing these risks requires a general approach and the use of common tools.
The first step in a clean-up initiative is to recognize that the content of a file share isn’t homogenous. Every document varies with respect to:
High-risk information primarily consists of records. There is a fundamental difference between a record and a non-record. Records are those things that an organization has to maintain for a specific period to meet particular compliance or regulatory requirements. Ideally, a retention schedule created by legal counsel dictates the requirements of records management. Organizations that fail to distinguish between records and documents face considerable litigation and compliance risk and struggle to organize information. If there is no distinction between records and documents, IT has to treat everything as a record with infinite retention requirements and can’t delete anything!
The general approach is to:
The most important task in a file share cleanup is identifying records and other high-risk information. Fortunately, most Info-Tech clients possess a little-known tool that greatly simplifies this process. The File Classification Infrastructure (FCI) is a component of File Storage Resource Manager (FSRM) on Windows Server. It gives administrators the ability to apply metadata to files either manually or programmatically. Furthermore, Microsoft offers a Solution Accelerator called the Data Classification Toolkit as well as specific accelerators for PCI-DSS and NIST SP 800-53. The Data Classification Toolkit creates classification properties that are commonly used in records management schemes, for example:
In practice, an IT administrator could use FSRM and FCI in the following way:
It is incredibly important to be rigorous in identifying high-risk information. Once it is isolated and protected, the enterprise is free to address the low-risk information in any way it wants. There is no regulation, for example, preventing administrators from shredding everything low risk. This course of action, however, will be incredibly unpopular with business users!
Value is a considerably more subjective consideration than risk. Ideally, high-value non-records have appropriate descriptors or tags to facilitate findability. Enterprise search is different from web-based search. Filtering mechanisms are much more important. A user looking for an “operations budget” for example will not be satisfied with a 10-year-old budget for a different part of the business on the far side of the world. Managing high-value information requires the application of appropriate descriptors controlled by a standard enterprise taxonomy. Many IT professionals severely underestimate the rigor of taxonomy design. It’s not complicated, but the process should follow the basic rules of ANSI/NISO Z39.19-2005 Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies and focus on creating business-unit-specific terms for a few core facets:
With appropriate terms, a user looking for something like the recent proposal for a foundations project can perform a general search and then apply appropriate filters (proposals, FY2019, San Francisco, Cana2 Foundations) to find all relevant proposals for 2019 related to Cana2 Foundations in San Francisco. Info-Tech’s approach for developing this kind of taxonomy is described in the following blueprint: Move Away From File Shares and Organize Enterprise Information.
It is difficult to apply taxonomy descriptors programmatically via FCI. For example, proposals generally don’t contain recognizable character strings that are amenable to regular expressions. Instead, the IT administrator will have to work with the business steward to ensure that metadata is applied effectively. This task, however, is made easier by the tendency to group similar documents together. Extending our example, administrators might just find a folder exclusively containing proposals from FY2019 or all Cana2 Foundations documentation. FCI enables the administrators to apply metadata at the folder level, which is then inherited by the underlying documents.
The final step is to use the metadata to manage the information appropriately via appropriate systems of record, SharePoint, or by identifying kipple for destruction.
There will inevitably be documents that are difficult to appropriately assess and classify. For example, there will be documents that don’t have any of the characteristics of records or high-value information but have some inherent trait that make them potentially valuable. Humans are very bad at making these kinds of judgement calls according to the tenets of Prospect Theory. One strategy is to place these Documents of Ambiguous Value (DoAV) into a kind of escrow account. For example, move the documents to a protected file share. To restore these files, users must file a request via the helpdesk. Documents in this escrow account could have an appropriate retention period from the date of movement (e.g. two or five years). As the retention period expires, IT shreds the documents.
Cleaning up SharePoint, particularly old SharePoint, is a different kind of challenge. The same methodology is generally applicable to SharePoint, but there are a few challenges when it comes to identifying high-risk information. FCI capability isn't available for SharePoint. Modern deployments can leverage the DLP capability of Office 365 to create rules and identify various types of potentially high-risk information. Older versions have more limited capabilities:
Clean up those file shares. Use a structured process by rating documents based on their inherent level of risk and value. Use Microsoft’s File Classification Infrastructure to facilitate the process.