Foldr Search – Setup (v2)

Introduction

Foldr appliance update v4.14.0.0 introduces an updated powerful search function that allows users to perform searches across multiple storage locations at once. Users can search both SMB shares and the following cloud platforms with a search query: Google Drive, OneDrive, SharePoint Online, Box and Dropbox. Alternatively, a user can select to search a specific network location.

It is possible to search for files by name, filter results by modified date or search for specific key words within files. Foldr Search contains an optional Optical Character Recognition (OCR) engine to read text from images, such as scanned documents or graphical PDFs. OCR is not required to search for keywords inside common file formats such as Microsoft Office documents or textual PDFs.

Search results for SMB shares are returned by querying an index held within a dedicated Foldr appliance. This provides incredibly fast search results, regardless of the number of storage areas being indexed and searched. The index itself is built up from crawl jobs that are run on a scheduled basis. The initial crawl process of a share may take some time depending on the amount of data to be indexed, however subsequent crawls will only index changed or new files so the index jobs will complete much faster.

By default, the search feature is disabled and must be enabled and configured by the administrator before it is available for use.

Foldr can ensure that only appropriate search results are returned to users, based upon the shares that are available to them under My Files and by parsing the file server backend NTFS permissions / ACLs.

System Requirements & Deploying a Dedicated Search Appliance

The Search role can be resource intensive both in terms of CPU and memory and as such it is strongly recommended that a separate virtual appliance is deployed specifically to host the search indexes and perform crawl operations. If regular client access and search is hosted upon a single appliance, it may have an impact on the user experience when performing regular file access operations, even when increasing the specifications of the VM. The following minimum specifications are recommended for the Foldr appliance that is going to be hosting the search role:

2 vCPU
4GB RAM

If you provide more CPU / RAM resources to the search appliance beyond the specification above, the crawl process will, within reason, consume most of the resources it is provided with during an indexing operation. The above specification is the recommended minimum for Search to operate correctly.

This article will describe configuring two Foldr appliances. One will act as our primary client access / infrastructure (database) appliance, and the other our Search appliance. In an existing installation, the primary appliance is the virtual machine that is currently being accessed by users, however with version 2 Search it is vital that the Search appliance is set to use the same backend configuration database as the primary appliance. This was not a requirement with the deprecated v1 search feature.

Ensuring both appliances are using the same backend configuration database

To allow version 2 Search to function correctly, it must be able to read the main configuration database from the primary (or other appliance). To do this:

This should be done on the CLIENT ACCESS/PRIMARY Appliance:

1. Log into Foldr Settings >> Infrastructure tab >> Configuration >> Appliance Mode to:

Provide database services to other appliances

2. Enter the IP address of the SEARCH appliance into the box labelled ‘Trusted Servers’

3. Navigate to the Keys tab and make a note of the Hashing Salt – this will be required later

4. Click the orange text link in the keys tab to reveal the current Encryption Key and supply the fadmin password in the pop-up dialog



5. Make a note of the Encryption Key – this will be required later

The following should be done on the SEARCH appliance:

6. Log into Foldr Settings >> Infrastructure tab >> Configuration >> Appliance Mode to:

Connected to other appliance(s) for database services

7. Enter the IP address of the CLIENT ACCESS/PRIMARY DB appliance in the two fields labelled:

  • Use this server for database services
  • (Optional) Use this server for database writes

8. Click the Keys tab and copy/paste the Hashing Salt from the CLIENT ACCESS\PRIMARY DB appliance. Click SAVE.

9. Copy / paste the Encryption Key from the CLIENT ACCESS\PRIMARY DB appliance. Click SAVE CHANGES.

 



NOTE – You must not change both the Hashing Salt and the Encryption Key at the same time (with one Save action) as the encryption key will not be saved successfully.

Confirm database accessibility

The Search appliance should now be able read/write the database hosted on the primary Clicking the General tab or Shares should confirm that the configuration (licence keys, service accounts, shares etc) is now being read from the Primary appliance.  Configuration changes can be made on either appliance and they will be reflected on the other system immediately.

Enabling the search index service

The following should be done on the SEARCH appliance

Within Foldr Settings, navigate to the Search tab and enable the Index Service.



Setting the Crawl Schedule

This should be done on the Search Appliance

If you require shares to be re-indexed on a schedule, enable the toggle ‘Enable scheduled indexing’ on the Service tab.  Specific schedules for each share / URI will be configured later.

Specify other Foldr appliances (Trusted Servers)

This should be done on the Search Appliance

All other Foldr appliance(s) that will be using search must be entered within the Trusted Servers field with each entered onto a separate line.

This will change the configuration of the built-in firewall to allow connections from these IP addresses.



Creating a Search Core

This should be done on the Search Appliance

A ‘Core’ can be thought of as a container that holds both the configuration and the index files for one or more share paths (URIs).

Whilst the search function can host multiple cores with numerous share URIs in each, it is recommended to configure a single core with all share URIs within it.

On a multi-tenant installation, you should use one core per tenant.

1. To create a core, click the Cores tab >> + Add Core

2. Give the Core a suitable name and click ADD CORE.  The core version should be left as Version 2.  Please note only lowercase characters are permitted.


3. An empty core will be created

Enabling Search for SMB shares

This should be done on the Search appliance

1.  You should now populate the core with shares to be indexed by Foldr search.  To do this, navigate to Foldr Settings >> Shares and edit a share (double click the share to edit)

2.  Click the Search tab and enable the toggle labelled ‘Show as location in Search?

3.  Ensure that the Search mode is set to ‘Foldr Search (requires a version 2 search core)’

4.  Populate the address and core name within the box labelled ‘Use this host for search services’ – This should point to the Search appliance

5.  Click SAVE CHANGES



Scheduling the Index Operations


Within the Schedule section in Crawl Settings, you can select from basic daily, weekly or monthly options to crawl this URI, however more granular scheduling is available using the Cron option.

Using Cron for advanced scheduling

Using the Cron option, it is possible to configure granular schedules.  Example Cron syntax is shown in the graphic below:

As an example, to crawl a URI every Monday, Wednesday and Friday at 8pm you would use:

‘Crawl As’ vs Indexing ACLs (Permissions)

When a share is indexed the crawl job can be run as an individual user (or users of a group) or if that is unavailable (lack of credentials / user is unknown to Foldr) then the crawl will be performed as the service account set on the share.

In most cases, ‘Crawl As’ would be used on personal shares such as SMB home folders where the %username% Share URI is used on the share, or cloud storage such as OneDrive, Google Drive.  Using Crawl As, allows Foldr to resolve paths for each user in turn (i.e. pull these from Active Directory and index as required)

 Indexing ACLs is generally used on shared locations such as a common SMB share used by the whole organisation or groups of users.

Crawl As and Indexing ACLs are mutually exclusive options when configuring Search on a Share.

Recommended settings for SMB Home Folders (%homefolder%)

Crawl Settings:

1.  Crawl As – Using an Active Directory Security Group that contains all users that the share applies to
2.  Index ACLs – toggle should be disabled

Where SMB shares are configured in Foldr Settings > Shares using the %homefolder% variable (to dynamically get the users home folder from the Active Directory homeDirectory attribute / profile tab in ADUC) the following should be configured.

Recommended settings for central/common shares (SMB)

Crawl Settings:

1.  Crawl As – should be unconfigured
2.  Index ACLs – toggle should be enabled

The above ensures Foldr to only returns search results that are applicable to the signed in user,  with common shares it is recommended that the Crawl As section is not configured and the Index ACL option is enabled:

Searching Cloud services (OneDrive, SharePoint Online, Google Drive etc)

Foldr is able to use the cloud providers search API directly or crawl/index cloud locations in the same way as on-premise SMB shares.  On shares using cloud related variables for the Share URI (%onedrive%, %googledrive% and so on) you can select the required mode within the Search Settings tab:

Using the cloud provider’s search API (no indexing required)

Note this option does not use the search appliance as no indexing takes place and all search queries are performed ‘live’ against the relevant cloud provider.  Using the cloud providers API will provide basic search capabilities but certain features (indexing ACLs, file content, scheduling, OCR and so on) do not apply.   Some search terms/queries may not be supported.

To configure:

1.  Edit the Share in question (OneDrive, Google Drive etc) in Foldr Settings > Shares

2.  Select the Search tab and select ‘Use service APIs’



3.  Leave the Search host and core name blank / unconfigured

4.  Click SAVE CHANGES

All other search settings should be left unconfigured.

Indexing cloud services with Foldr

All cloud services that Foldr can present to a user, may be indexed and stored in the Search appliance.   This allows Foldr to index file content, schedule crawls and use the same search terms/queries as on-premise shares.

To Configure:

1.  Edit the Share in question (OneDrive, Google Drive etc) in Foldr Settings > Shares

2.  Select the Search tab and select ‘Index with Foldr’



3.  The search host and core name should be configured



1.  Crawl As.. Specify an Active Directory Security Group that contains all users that the share applies to
2.  Index ACLs toggle should be disabled

All other Search options can be configured as required.

Indexing File Content

By default Foldr search will index file names only.  To index textual content found in common Office formats, PDF, txt, rtf enable the ‘Index file contents’ toggle in Search Settings.  A predefined template / list of common files formats is already configured.

 

Optical Character Recognition (OCR)

During an index crawl job Foldr can optionally process image files (jpg, png and gif) or graphical PDF documents with a built-in OCR engine to extract text which is then stored inside the Search index. This text is then searchable by the user.

Note – OCR is not required to index the content of Office, PDF or other textual files and it will . It applies only to extracting text from images, or images inside files.

An example use case, with OCR enabled, would be a user could searching for an invoice number on a scanned document and being able to locate the specific invoice number file on the network quickly.

To enable the OCR feature, do this on a per share basis simply turn on the toggle when creating or editing a URI within Foldr Settings >> Shares >> Search >> OCR.  Enabling OCR will result in crawl jobs taking a longer amount of time to complete as each image file and non-textual PDF will be processed by the OCR engine.



OCR Language Support

By default Foldr will use English language only, however the administrator can enable the following languages if required:  French, German, Spanish, Dutch (Flemish), Chinese Simple and Traditional.  If additional language support is required that is not listed, contact [email protected]

Exclusions

Within the Exclusions section in Search Settings the administrator can exclude certain file types or files / folders matching certain naming patterns using wildcards (*) from the index. Foldr has pre-defined exclusions for common files that are not usually of interest to a user (such as temporary, system generated or .dsstore files and so on) and additional exclusions can be entered one per line. If the exclusion contains a / it is assumed to be a directory rather than a file.

Example syntax:

temp.docx – Excludes any file called temp.docx from the index
*.png – Excludes all PNG files from the index
*temp* – Excludes all files containing ‘temp’ in the file name
*Temp*/* will exclude any directories called Temp (and also exclude all subdirectories / files within)

Need more help?

Get in touch and we'll be happy to assist you, [email protected]

© Minnow IT. Registered in England and Wales with company number 07970411.

Made with in Bristol, UK

<