Using Martini with SolrCloud
In Solr, it's possible to configure a cluster of Solr servers that combine fault tolerance and high availability. Solr refers to this type of cluster as SolrCloud.
Compared to embedded and stand-alone Solr servers where Solr cores are stored on a single machine, with SolrCloud, cores are abstracted into collections. A collection is a core whose parts are distributed among multiple Solr servers. Storing data in various servers enable replication and sharding. In other words, with SolrCloud you are provided distributed indexing and search capabilities.
Familiarizing yourself with SolrCloud
We recommend reading Solr's guide on SolrCloud if you are not yet acquainted with it.
Solr is the engine behind Tracker and the invoke monitor's family of features. Custom search indices also rely on Solr. If you are to use any of these search-reliant components extensively, then optimizing Martini's Solr server(s)1 should be a priority. Using SolrCloud is one way of making Solr more reliable2. SolrCloud ensures that Solr cores are always available when needed, and makes your systems resilient to outages. It's highly recommended to use this set-up in a production environment.
Luckily, connecting your SolrCloud cluster to your Martini instance3 is a piece of cake. The complicated parts are in configuring your ZooKeeper ensemble and SolrCloud cluster. This three-part guide, will teach you the things you need to do in order to run Martini with SolrCloud. It's recommended you read the documents in order:
The steps will also be described with examples. In this case, the examples will be setting up three instances of
ZooKeeper and three instances of SolrCloud – quite similar to how a production environment would be configured. For
ease of configuration and installation, the instances are connected
to shared storage via a network-attached storage (NAS) device. A shared
/datastore is mounted across all servers. Setting up the shared storage server, however, will not be
covered in this guide. The diagram below summarizes the set-up:
Of course, none of the variables in this example configuration is absolute; to each their own! It's recommended to analyze how your organization uses Martini and its Solr-dependent features and from there, you can decide where to go.
Inability to create new cores on its own
Just like remote instances of Solr, with SolrCloud Martini will not have the ability to create new cores. If a package needs its own Solr collections, you have to create it manually through the Solr Collections API, otherwise the package will not start.
Martini uses an embedded instance of Solr by default. ↩
SolrCloud will add some overhead when processing data (e.g. network latency, distribution of data in the cluster). When it comes to indexing small data, the embedded version of Solr performed better but the difference is quite negligible. Solr in SolrCloud mode provides better performance when indexing huge chunks of data. In addition to this, it increases the reliability and availability of the Solr cores. ↩