Setup and configuration

Setup and configuration

Requirements #

  • Java 17 or later runtime environment.
  • For best performances, the index should be stored on a unit having a low random-access IO latency, like a SSD.

SICS Search Folder Mode #

SICS Search Folder Mode is not optimal and can lead to slow search performances and data inconsistency between the index and the database in a multi-user environment. It is therefore advised only for test or demonstration purposes.

For this solution, no Application Server is required, but a simple SICS installation.

  1. Open SICS System Administration Utility and select System Administration -> SICS Search Configuration.
  2. In the SICS Search Mode drop down menu, select Folder Mode
  3. In the Index Folder text field, provide the file path of where you want to store the index.
  4. In the Results Per Page number field, provide the number of results you want per page.
  5. Click OK

Each user running SICS must have full read/write access to the folder.

If all SICS instances share the same index folder, it is not necessary to rebuild the index.

Concurrent changes to a commonly shared index are supported, but in this case it is highly advisable to use the SICS Search Server or SICS Search Solr Server/Cloud for performance, security, memory consumption and reliability reasons.

SICS Search Server Mode #

SICS Search Server Installation #

SICS Search Server is a Web Application and can potentially be deployed in any Application Server. It has been successfully tested using Tomcat 10.1.

For deploying an instance of SICS Search Server it is enough to deploy SicsSearchServer.war using the standard procedure for the chosen Application Server.

The WAR file can be found in the SICS delivery folder.

SICS Search Server Configuration #

Database #

SICS Search Server requires a connection to the SICS database, in order to retrieve user access rights. It is therefore necessary to bind the JNDI resource named jndi/sss to the correct JDBC resource. This can be done by editing META-INF/context.xml or though the application server administration pages.

One SICS Search Server instance can serve only one SICS environment/database.

The database schema to be used can be specified by editing WEB-INF/classes/META-INF/persistence.xml

Logging #

Server logging can be configured in WEB-INF/classes/log4j2.xml

Server parameters #

The server parameters can be configured in WEB-INF/web.xml

Parameter Name Description Default Value
SicsSearchIndexFolder Specifies the directory where the index is stored. ${java.io.tmpdir}/${servlet context path}/index
SicsSearchMatchesPerPage Specifies the maximum number of results to be fetched for each query. 20
UsersCacheRefreshInterval Specifies the interval (in seconds) between updates of the users’ cache. It contains the list of users and their access rights. The parameter must be an integer value. 0 (no automatic refresh)
SuggesterRebuildInterval Specifies the interval (in seconds) between rebuilds of the suggester’s cache. The suggester is responsible of providing search string suggestions. The parameter must be an integer value. 0 (no automatic rebuild)
CommitToSafeStorageInterval Specifies The interval (in seconds) between commits to safe storage. The parameter must be an integer value. A zero or negative value will result in changes being persisted at each change. A high value will result in better performances, higher memory consumption and potential data loss on Out Of Memory errors. 0
Parameter Name Description Default Value
UserAuthorizationInterface Specifies the User Authorization Interface Implementation to use. It is relevant only if the database switch for this functionality is on. It must be a canonical class name.
CaseSensitiveUserNames Tells the server whether to use Case Sensitive User Names or not. It must be in sync with the database source setting in SICS Workstation/SysAdmin, stored in conf/sics-database-sources.xml. false
DatabaseSchemaForUAI Specifies the database schema name to be used for User Authorization Interface invocations. The database schema name specified in WEB-INF/classes/META-INF/persistence.xml or the default schema for the connection’s database user.

Secure Socket Layer (SSL) #

To make SICS Search Server redirect any access via HTTP to HTTPS, you must uncomment the element node security-constraint in WEB-INF/web.xml and set the transport-guarantee to CONFIDENTIAL.

Connecting SICS to SICS Search Server #

  1. Open SICS System Administration Utility and select System Administration -> SICS Search Configuration.
  2. In SICS Search Mode drop down menu, select SICS Search Server Mode
  3. In Server URL text field, provide the server URL.
  4. If the server is running in secure mode (HTTPS), check the Secure Connection check-box.
  5. Click OK

The configuration will be stored in the database and shared among all its users.

Checking SICS Search Server status (HTTP and JMX) #

At any time, the SICS Search Server status can be checked using HTTP or JMX (if JMX is enabled in the host JVM)

The status page is accessible at the address

http://<host>:<port>/<application_root>/status

A simple ping service is available at

http://<host>:<port>/<application_root>/ping

The server will expose JMX beans for each installed instance of SICS Search Server, using the application root for identifying each instance.

Through the JMX bean it is also possible to reset counters and refresh the User Cache providing access rights for SICS users. JMX beans can be access via third-party applications like jConsole or JAVA VisualVM, included in JAVA SDKs.

SICS Search Solr Server and SICS Search SolrCloud Mode #

Apache ZooKeeper installation #

The responsibility of Apache ZooKeeper is for caching and distribution of configuration to Nodes. ZooKeeper also maintain an overview of which nodes that are available.

For a reliable ZooKeeper service, you should deploy ZooKeeper in a cluster known as an ensemble. As long as a majority of the ensemble are up, the service will be available.

In a production environment, it is recommended to use more than three and an odd number of ZooKeeper instances in the ensemble. For example, if you have four ZooKeeper instances in the ensemble, ZooKeeper can only handle failure of one instance; if two instances fail, the remaining two instances do not constitute the majority. However, if you have five instances in the ensemble, ZooKeeper can handle failure of up to two instances because the remaining three instances do constitute the majority.

To configure the ZooKeeper ensemble, add the following entry to zoo.cfg for each ZooKeeper instance in the ensemble.

server.<positiveId>=<hostname>:<port1>:<port2>

Example:

server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

Apache ZooKeeper 3.4.13 can be downloaded at:

https://archive.apache.org/dist/zookeeper/zookeeper-3.4.13/zookeeper-3.4.13.tar.gz

Extract zookeeper-3.4.13.tar.gz to a preferred location referred to as <ZK_HOME>.

On Windows you can use tools such as: 7-zip, WinRAR, etc., on Linux please do:

tar -xvfz --strip-components=1 zookeeper-3.4.13.tar.gz <ZK_HOME>

On Linux all the ZooKeeper script file must be made executable, please do:

chmod +x <ZK_HOME>/bin/*

Copy or rename /conf/zoo_sample.cfg to /conf/zoo.cfg

Open /conf/zoo.cfg in a text editor to edit the ZooKeeper configuration.

  • tickTime
    • The basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime.
  • dataDir
    • The location to store the in-memory database snapshot and, unless specified otherwise, the transaction log of updated to the database.
  • clientPort
    • The port to listen for client connections

To start a ZooKeeper instance on Linux, please do:

<ZK_HOME>/bin/zkServer.sh start

To start a ZooKeeper instance on Windows, please do:

<ZK_HOME>/bin/zkServer.cmd

To stop a running ZooKeeper instance on Linux, please do:

<ZK_HOME>/bin/zkServer.sh stop

To stop a running ZooKeeper instance on Windows, exit the command prompt or enter: ctrl + c.

Solr Installation #

Extract to a preferred location.

To override Solr settings, please edit bin/solr.in.sh (Linux) or bin/solr.in.cmd (Windows).

Note that in production SolrCloud environments, the SOLR_HOST variable should be set to the hostname of the server, as this determines the address of the node when it registers with ZooKeeper.

To start Solr for the first time after installation, simply do:

bin/solr start

This will launch a standalone Solr server in the background of your shell, listening on port 8983.

Alternatively, you can launch Solr in “cloud” mode, which allows you to scale out using sharding and replication.

To launch Solr in cloud mode with embedded ZooKeeper ensemble, do:

bin/solr start -cloud

To launch Solr in cloud mode with external ZooKeeper ensemble, do:

bin/solr start -cloud -z zk1:2181,zk2:2181,zk3:2181

To see all available options for starting Solr, please do:

bin/solr start -help

You can also install Solr as a Windows or Linux service.

To see the available options for installing Solr as a service, do:

bin/install_solr_service -help

After starting Solr, follow the instructions depending on whether Solr is running in server or cloud mode.

Server mode #

Create cores, do:

bin/configure create_cores -p 8983

Cloud mode #

If your ZooKeeper ensemble is or will be shared among other systems besides Solr, you should consider defining application-specific znodes, or a hierarchical namespace that will only include Solr’s files.

Once you create a znode for each application, you add it’s name, also called a chroot, to the end of your connect string whenever you tell Solr where to access ZooKeeper.

Creating a chroot is done with a bin/solr command:

bin/solr zk mkroot /solr -z localhost:2181

In SolrCloud mode SICS authentication is not enabled by default, you must upload the security.json file to ZooKeeper.

Enable authentication (i.e. ZooKeeper running at localhost:2181), do:

bin/configure enable_auth -z localhost:2181

Upload configuration to ZooKeeper (i.e. ZooKeeper running at localhost:2181), do:

bin/configure upconfig -z localhost:2181

Create collections, do:

bin/configure create_collections -p 8983

To see available options for create_collections, please do:

bin/configure create_collections -help

Post Solr Installation #

After starting Solr, direct your Web browser to the Solr Admin Console at:

http://localhost:8983/solr/

When finished with your Solr installation, shut it down by executing:

bin/solr stop -all

The -p PORT option can also be used to identify the Solr instance to shutdown, where more than one Solr is running on the machine.

SICS Search Solr Configuration #

Database #

Each SICS Search Solr Node requires a connection to the SICS database in order to retrieve user access rights. The database connection must be set to the same SICS environment/personal database which is going to be configured to use this node. It is therefore necessary to bind the JNDI resource named jndi/sss to the correct JDBC resource. This can be done by editing server/solr-webapp/webapp/WEB-INF/jetty-env.xml. The JDBC driver can be copied to server/lib/ext folder.

The database schema to be used can be specified by editing server/solr-webapp/webapp/WEB-INF/classes/META-INF/persistence.xml

Logging #

Server logging can be configured in server/resources/log4j2.xml

Secure Socket Layer (SSL) #

To configure SSL, you must edit etc/jetty-ssl.xml. In this file you can set the path and password to your Java Key Store (JKS) files that you have generated using the Java Key and Certificate Management Tool.

The key store file must include your public/private key pair and its intermediate certificate chain. JKS file used as trust store only needs to include the certificate chain.

To make Solr redirect any access via HTTP to HTTPS, you must edit server/solr-webapp/webapp/WEB-INF/web.xml and uncomment the element node security-constraint.

To make the Solr Cores communicate over HTTPS/SSL, you must edit server/solr/solr.xml. In this file you can set the host port directly or refer it to Java System Property: jetty.ssl.port.

Connecting SICS to SICS Search Solr Server #

  1. Open SICS System Administration Utility and select System Administration -> SICS Search Configuration.
  2. In SICS Search Mode drop down menu, select Apache Solr Server Mode
  3. In Server URL text field, provide the server URL, i.e. localhost:8983/solr
  4. If the server is running in secure mode (HTTPS), check the Secure Connection check-box.
  5. Click OK

Connecting SICS to SICS Search SolrCloud #

  1. Open SICS System Administration Utility and select System Administration -> SICS Search Configuration.
  2. In SICS Search Mode drop down menu, select Apache Solr Cloud Server Mode
  3. In Server URL text field, provide the ZooKeeper host address, i.e. zk1:2181,zk2:2181,zk3:2181/solr
  4. If the server is running in secure mode (HTTPS), check the Secure Connection check-box.
  5. Click OK

Configuring Access Rights #

User Access in SICS Search Server, SICS Search Solr Server or SICS Search Solr Cloud Node is managed using the configuration of the SICS database to which the server or cloud node instance is connected to.

This configuration can be inspected and changed in the security section of the SICS System Administration Utility.

Relevant security use cases for the SICS Search functionality are:

Security Use Case Description
SICS Search (execute) Grants access to the search functionality from the Workstation
SICS Search Configuration (read and update) Grants access to inspecting/changing the SICS Search configuration for the database in use, allowing to enable/disable the functionality and to configure the access to a SICS Search Server, SICS Search Solr Server or SICS Search Solr Cloud Node instance.
SICS Search Index (create and delete) The creation right allows the user to mass-load data in the index and to rebuild the suggester cache. The Delete right allows the user to empty/reset the index. These functionalities are available both through the Update SICS Search Index functionality in the System Administration Utility or through the related SICS Search Index Update Batch Job.

When an instance of SICS Search Server, SICS Search Solr Server or SICS Search Solr Cloud Node is processing a request from SICS Workstation or SysAdmin, it will automatically authenticate the user. For SICS Search Server, if a request comes through a web browser, a login page will be offered to the user in order to establish the correct set of access rights to use.

Creating or Rebuilding the Index #

Creation of the index can be done through the Update SICS Search Index functionality in SICS System Administration Utility (under Database Updating) or through a scheduled job.

The index can be recreated by indexing different type of objects at the same time and it is possible to run several updates concurrently. For some object types, it is also possible to specify a filter on the data to be indexed. This allows multiple concurrent indexing processes for the same object type or a selective index rebuild.

When the index for a specific object type is rebuilt, objects of that type are cleared from the index (considering eventual data filtering). This is necessary in order to remove objects not present in SICS but still present in the index.

During a complete or partial index rebuild the index will contain partial data. In order to not to affect standard SICS users, it is possible to rebuild the index in the background by applying changes to a “secondary” index. When the re-indexing is complete, the old index can be replaced with the new one.

SICS checks for the presence of duplicates. If an object is already present in the index, it is being re-indexed, the new version will replace the old one.

For making sure that all data in the index is consistent with what is stored in the database, it is advisable to run scheduled index rebuilds.

Index maintenance examples #

The following examples describe how to perform standard maintenance operations on the SICS Search index. These steps assume that SICS is already configured in order to use the index and that the user has all relevant access rights. All operations can be performed in the SISC System Administration Utility, through the Database Updating -> Update SICS Search Index

Creation or complete rebuild of the index #

If we want to remove all data present in the index, including documents not indexed by SICS, select “Reset index”.

Enable the Extractors we want to execute. Each extractor corresponds to a specific SICS object type. For a complete rebuild, simply click on “Enable All”.

Select Rebuild Suggester. This option will make sure that relevant suggestions will be presented to the user when searching. The Suggester will be rebuilt only after all the data for selected Extractors will be re-indexed.

The configuration below will remove all objects in the index, rebuild the index completely and rebuild the suggester.

sics_search_index_window.png

Concurrent Creation or complete rebuild of the index #

This example explains how to rebuild the index using several parallel processes. Each process needs its own SICS System Administration Utility.

The Reset Index can be selected only for the process starting first. If this is not ensured, some data might be not present in the index after the re-indexing is complete.

Each process must select a different set of extractors. If an extractor is present in more than one process, its filter parameters should be entered to not to extract the same object twice. This should anyway not lead to duplicates in the index, but surely will lead to worse performances. To enter filter parameters for a specific Extractor, select the extractor and enable it. If any filtering options are available, they will be shown on the right.

The Rebuild Suggester option should be selected in the process terminating last. Selecting it for all processes, anyway, does not have any undesired effect.

The configuration below will remove all objects in the index and re-index only Worksheets with a Due Date between the 1st of January 2001 and the 31st of December 2019.

sics_search_index_ws_extractor.png

Rebuild index in the background #

For rebuilding the index in the background, it is necessary to select the Apply Changes To Secondary Index option. All changes will not affect the index used for current searches. When such option is selected also the Start Using Secondary Index When Done option is presented to the user. In the case we rebuild the index with several concurrent processes this second option must be selected only for the process terminating last.

The changes will be visible to new searches only when the main index will be replaced with the secondary. Such operation deletes the secondary index and copies the moves the old main index to the backup sub folder in the index directory.

Any change to data in the database performed by SICS will affect both the main and the secondary index.

The configuration below will rebuild the index completely in the background, make the new index available for searching only when the re-indexing is complete and rebuild the suggester.

sics_search_index_rebuild.png

Restore an old main index from a backup #

When a Secondary index is promoted to be a main index, the old main index is moved to the backup sub folder.

Such backup can be restored by stopping the SICS Search Server, SICS Search Solr Server, SICS Search SolrCloud or SICS in Folder Mode, deleting all files (not folders) in the index folder and moving all the content of the backup folder to the index folder.

When the SICS Search Server, SICS Search Solr Server, SICS Search SolrCloud or SICS in Folder Mode is restarted the old index will be used for searches again.

Index External Documents #

Indexing external documents is only supported by SICS Search in SolrCloud or Solr Server Mode.

SICS Search Solr Nodes is responsible for reading, parsing and indexing the external documents, and for that reason, each node must be able to access the directory we want to index.

  1. Open SICS System Administration Utility and select Database Updating -> Update SICS Search Index.
  2. Select and enable the External Documents Extractor.
  3. Select data import command, full or delta-import.
  4. Write or find base directory.
  5. Check clean if you want to clean the index of old external documents previously indexed.
  6. Click Execute to start indexing

The following document types will be indexed in the base and its sub directories.

  • Microsoft Word Document (DOC)
  • Microsoft PowerPoint (PPT)
  • Microsoft Excel (XLS)
  • Office Open XML Document (DOCX)
  • Office Open XML Presentation (PPTX)
  • Office Open XML (XLSX)
  • Portable Document Format (PDF)
  • Text Document (TXT)

sics_search_index_external_docs.png