Foldr Search – Setup

Posted on 26th April 2018

Introduction

Foldr appliance update v4.3 introduces a powerful search function that allows users to perform searches across multiple storage locations at once.  Users can search both SMB shares and the following cloud platforms with a search search query: Google Drive, OneDrive, SharePoint Online, Box and Dropbox.   Alternatively, a user can select to search a specific network location.

It is possible to search for files by name, filter results by modified date or search for specific key words within files.  Foldr Search contains an optional Optical Character Recognition (OCR) engine to read text from images, such as scanned documents or PDFs.

Search results for SMB shares are returned by querying an index held within a Foldr appliance.  This provides incredibly fast search results, regardless of the number of storage areas being indexed and searched.  The index itself is built up from crawl jobs that are run on a scheduled basis.  The initial crawl process of a share may take some time depending on the amount of data to be indexed, however subsequent crawls will only index changed or new files so the index jobs will complete much faster.

By default, the search feature is disabled and must be enabled and configured by the administrator before it is available for use.

Foldr automatically ensures only appropriate search results are returned to users, based upon the shares that are available to them under My Files and by analysing the file server backend NTFS permissions / ACLs.

System Requirements & Deploying a Dedicated Search Appliance

The Search role can be resource intensive both in terms of CPU and memory and as such it is strongly recommended that a separate virtual appliance is deployed specifically to host the search indexes and perform crawl operations.  If regular client access and search is hosted upon a single appliance, it may have an impact on the user experience when performing regular file access operations, even when increasing the specifications of the VM.  The following minimum specifications are recommended for the Foldr appliance that is going to be hosting the search role:

2 vCPU
4GB RAM

If you provide more CPU / RAM resources to the search appliance beyond the specification above, the crawl process will, within reason, consume most of the resources it is provided with during an indexing operation. The above specification is the recommended minimum for Search to operate correctly.

This article will describe configuring two Foldr appliances.  One will act as our primary client access / infrastructure appliance, and the other our Search appliance.  In an existing installation, the primary appliance is the virtual machine that is currently being accessed by users.

Enabling and Configuring the Index Service

This should be done on the Search Appliance

Within Foldr Settings, navigate to the Search tab and enable the Index Service.

Setting the Crawl Schedule

This should be done on the Search Appliance

Enable scheduled indexing and choose a suitable time to run the crawl jobs. Jobs are run sequentially, one URI / share at a time.

Specify other Foldr appliances (Trusted Servers)

This should be done on the Search Appliance

All other Foldr appliance(s) that will be using search must be entered within the Trusted Servers field with each entered onto a separate line.

This will change the configuration of the built-in firewall to allow connections from these IP addresses.

Creating a Search Core

This should be done on the Search Appliance

A ‘Core’ can be thought of as a container that holds both the configuration and the index files for one or more share paths (URIs).

Whilst the search function can host multiple cores with numerous share URIs in each, it is recommended to configure a single core with all share URIs within it.

On a multi-tenant installation, you should use one core per tenant.

1. To create a core, click the Cores tab >> + Add Core

2. Give the Core a suitable name and click ADD CORE. Please note only lowercase characters are permitted.

3. A new, empty Search Core will be created. Click + Add URI to add a share path to be indexed.

Add URIs (Shares) to the Core

4. Click + ADD URI and the Add URI dialog will be displayed. Here you must configure the address (the network path) for the shares that you wish to index, provide a suitable service account that has permission to read the files and finally set the crawl schedule. If no service account is available for selection, one can be configured within General tab >> Service Accounts. Note the username should be supplied in UPN format.

Optional – The search appliance may be pointed at an Infrastructure appliance to retrieve the service accounts along with other configuration held upon it.  This would remove the need to manually configure service accounts on the Search appliance.  See the following KB for more information

IMPORTANT – The ‘Address’ (smb share path) must match EXACTLY as the share is configured in Foldr Settings >> Shares. If the share path is configured fully qualified or using a short unqualified path it must match exactly. If shares are configured under the Shares tab using an IP address rather than DNS hostname, ensure the URIs for search are configured the same way.

Scheduling

You can select from basic daily, weekly or monthly options to crawl this URI being configured, however more granular scheduling is available using the Cron option in the Crawl Schedule tab.

Using Cron

Using the Cron option within the Crawl Schedule tab, it is possible to configure granular schedules on a per URI.  Example Cron syntax is shown in the graphic below:

As an example, to crawl a URI every Monday, Wednesday and Friday at 8pm you would use:

5.  Build up a list of URIs to index (which are to be searchable by end users), generally this would include all SMB shares configured for user access on the client access appliance.

Adding URIs for Active Directory Home Folders

This should be done on the Search Appliance

If the %homefolder% variable is being used to present users home folders, the share path should be confirmed within in Active Directory Users & Computers > Properties > Profile tab > Home Folder > Connect and the root / top level share added as a URI within the Search Core.

If you use multiple paths for different groups of users, each should be added as a separate URI in the Core.

Example:

For the example above, you would need to add the following URI to the Search Core to allow sales users to search their home folders

Optical Character Recognition (OCR)

During an index crawl job Foldr can optionally process image files (jpg, png and gif) or graphical PDF documents with a built-in OCR engine to extract text which is then stored inside the Search index.  This text is then searchable by the user.  An example use case, with OCR enabled, would be a  user could searching for an invoice number on a document and being able to locate the specific scanned image file on the network quickly.

To enable the OCR feature, do this on a per share basis simply turn on the toggle when creating or editing a URI within Foldr Settings >> Search >> Cores

Note that enabling OCR will result in crawl jobs taking a longer amount of time to complete as each image file and non-textual PDF will be processed by the OCR engine.

OCR Language Support

By default Foldr will use English language only, however the administrator can enable the following languages if required:  French, German, Spanish, Dutch (Flemish), Chinese Simple and Traditional.  If additional language support is required that is not listed, contact support@foldr.io

Enabling Search for Users

This should be done on the Client Access appliance

On the Foldr appliance(s) that provide client access (typically those that users directly interact with via the web, desktop and mobile apps) you must enable Client Access in Search before they can use it.

Navigate into Foldr Settings >> Shares >> Edit a share >> Search tab

Enable the toggle ‘Show as a location in Search

Finally, provide the Core Name as configured at step 2 above and click SAVE.

Repeat this process on each SMB share that you wish to enable search for (i.e. those that have a URI configured on your search appliance) – Note that you do not need to specify an address or Core name for cloud services such as Google Drive, OneDrive or Dropbox  etc.   This is because all search queries for cloud platforms are performed by the client access appliance directly against their APIs, rather than the search appliance (which is used solely to search the index for SMB shares)

Once ‘show as location‘ is enabled on a client access appliance, the Search feature will become functional in the web, iOS and Android apps.


Search Results & Permissions

As part it’s scheduled crawl operations, Foldr Search will by default respect the backend file server permissions / ACLs. This ensures only appropriate search results are returned to a user for each search query by using the ACLs and checking which shares a user actually has access to in Foldr itself. (My Files)

For example, if a sales department specific network share URI is being crawled, only users who can see the sales share in the Foldr interface will be shown search results relating to files contained upon it.

You can disable this behaviour and configure the search appliance to not index the permissions by enabling the option ‘Do not index file permissions‘ when adding / editing a URI in your search core

Indexing New Files

Foldr will automatically index files as they are uploaded by users and will also update the index with regard to move and file deletion events. If new files are placed onto an SMB share outside of Foldr (i.e. from a domain bound workstation in Explorer) the files will be added to the Foldr search index when the next scheduled crawl job takes place.  You can manually start an index / crawl job at any time from the Cores tab.

Manually Starting a Crawl / Index Job

This should be done on the Search Appliance

To start a crawl job manually, click the inline button shown below from the Cores tab for the relevant URI / share path.

Note – You do not need to enable ‘force re-indexing of all files’ in order to pick up new files that may have been uploaded from outside of Foldr.

 

Need more help?

Get in touch with our friendly help desk who will be happy to assist you, support@foldr.io