Following on from my previous post on Commvault File System Optimization, Sensitive Data Analysis is the second feature pack under Commvault Activate licensing. The Commvault Complete license guide describes Sensitive Data Governance as “Extends indexing and analytics into content and provides details on data that may contain information that could be considered sensitive and may need further attention”.
Licensing for these features is covered by the aforementioned license guide. The current method for Sensitive Data Analysis is per-TB for File/VM or per user account analysed for on-premises or cloud mailbox.
Sensitive Data Analysis enables organisations to seek out PII (Personally Identifiable Information) across multiple data types throughout there organisation. Examples of supported data types include:
- Exchange User Mailbox
- File System
- Gmail
- Google Drive
- Microsoft OneDrive for Business
- Microsoft SharePoint Server
- Microsoft SQL Server databases
- Oracle databases
- Virtual Machines (When collect file details is used)
Examples of PII (Personally Identifiable Information) include:
- Credit Card Numbers
- Email Addresses
- Driving Licenses
- Hostnames
- Social Security Numbers
- IP Addresses
- NHS Number
The ability to identify where PII sits in your organisation can aide in keeping compliance with regulations such as the EUs General Data Protection Regulation (GDPR).
System Requirements
- Commserve
- V11 SP18 for the examples given here
- Server with the Index Store, Content Analyzer, and Web Server packages installed.
- System Requirements for 20TB File Data as 40 million objects is as follows:
- 16 vCPU cores
- 32GB RAM
- 1TB Index volume (SSD class)
- System Requirements for 15TB mailbox data as either 150 million objects or 2000 mailboxes is as follows:
- 16 vCPU cores
- 32GB RAM
- 6TB Index Space
- 1600IOPS for Index Drive
- Full range of sizing options can be seen here.
- System Requirements for 20TB File Data as 40 million objects is as follows:
Configuration
- From the Java GUI, Right-Click the Index Servers group and click New Index Server.
- Specify a Cloud Name and location for the Index Data.
- Add the following Roles
- Data Analytics
- EDGE Drive
- Exchange Index (If Required)
- Select the server with the Index Store package from the Nodes tab.
- Click OK.
- You should now see the name of the server with the Index Store package installed, listed under the Content Analyzer Cloud computer group. It will have a “_ContentAnalyzer” suffix added.
- The next step is to add a domain. This is done from the Command Center. You may have already done this if you followed my previous post for File System Optimization. If not follow the official documentation here.
- Use the Guided Setup Wizard to configure Sensitive data analysis. This can be done by clicking Guided Setup from the command center, clicking the Activate tab and selecting Sensitive Data Analysis.
- Create a Data Classification Plan. Select the Index Server you created earlier. Click Next.
- On the Content Search tab, select Enable and choose Metadata and content. Click Next
- On the Content Analysis page, check Entity detection and select a few relevant entities. Only select entities that are required, the more you select, the bigger the resource load. Select the Content Analyser and Click Next.
- Adjust the Advanced options (File Extensions, exclusions, File Sizes etc) if required and click Save.
- The next step is to create an Inventory. Inventories are the logical groupings of resources from your CommCell environment. Choose a Name and select the Index Server and Name Server created earlier. Click Save.
- You now need to add a File Server (assuming this is what you want to analyze) to the Inventory. Navigate to Activate –> Inventory Manager. Select the “…” to the right of the inventory name and click details. Click Add –> File Server.
- Enter the details for the file server and click Save.
- You can now add a project. From Activate –> Sensitive data Analysis. Create a New Project.
- Add a File System data source to the project. Select the File Server you added to the Inventory earlier.
- On the configuration page, enter the credentials and paths as required. Click Finish
- Collection should start immediately, you can view the current progress back on the project data sources page.
- Once the analysis has completed, navigate back to Activate –> Sensitive Data Analysis –> (Project Name) to see the results!
Example Results
Once your data has been analysed you start to get a real insight into the levels of PII across your dataset. Some examples are shown below:
This shows a dashboard view of discovered sensitive data for a given dataset:
This shows results filtered by the exception “Accessible by everyone”
This shows results filtered by files containing IP addresses (an example of PII)
The following options are available for discovered files: