Skip to content

Toro Cloud Dev Center


Indexing a document to a custom search index

In this guide, we will show you how to add a new document to your custom search index. We will be creating a script that indexes movie data, and we will discuss the objects and methods used in said script that enabled indexing. For simplicity's sake, the data we're going to index will be manually entered via service inputs.

Stuff you need to know...

​This guide assumes that you have gone through the process of creating a custom Solr core or collection, and you already know how to create services in Gloop or Groovy.

Get the code!

The scripts mentioned in this guide are available in the examples package. As bonus, you can find other services in the examples package that demonstrate the use of functions from the SolrMethods class, as well as other Solr-related functionality.

Preparation

Before we get to indexing documents, we must ensure that our custom Solr core is already setup and connected to the Martini package we're going to use.

Here's the outline of our set-up:

  • Our package is called examples. This is where our scripts will reside.
  • Our target Solr core is embedded and named movie-core. As a result, the directory structure of the examples package is:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    examples
    ├── classes
    ├── code
    ├── conf
    ├── web
    └── solr
        └── movie-core
            └── core.properties
            └── conf
                └── schema.xml
                └── solrconfig.xml
    
  • The examples package's package.xml file has already been edited to make the embedded Solr core known:

    1
    2
    3
    4
    5
    6
    7
    <package>
        <!-- ... -->
        <solr-cores>
            <solr-core name="movie-core" enabled="true" />
            <!-- ... -->
        </solr-cores>
    </package>
    

Creating the model

We need to create a model that can hold the data we want to index. In this case, we need to create a model for holding movie data.

You can manually create your Gloop model from scratch, or you can extract the fields defined in the schema.xml file to create a model based from it. In our case, we will do the latter using the SchemaToGloopModelGenerator service:

Model-generating service

We have placed this script in examples's code directory, under solr.customSolrCore.model. You should be able to use this script to parse your own schema.xml file. Depending on your setup, you may need to tweak it a little more. Here's a breakdown of the Gloop steps it contains:

  • In Line 1, we have a map step that calls GroovyMethods.getPackage() to get the Martini package where the script resides. The return value is then stored in a variable called martiniPackage.

  • In Line 2, we have another map step that declares and initializes a Path variable that points to schema.xml's location. We'll use martiniPackage#getHome() as the base path and from there, we can traverse to schema.xml's actual location, like so:

    1
    Paths.get(esbPackage.getHome(), 'solr', 'movie-core', 'conf', 'schema.xml')
    
  • In Line 3, we have added a third map step but this time, we use it to declare and initialize a String variable containing schema.xml's content. We did that this way:

    1
    Files.readAllBytes(movieCorePath);
    

    Gloop conversion

    You may have noticed that the last line of code read in a byte array, but the variable was a string. This is possible thanks to the Gloop ObjectToCharSequenceConverter.

  • In Line 4, we create an invoke step that calls SolrMethods.solrSchemaToGloopModel(String, String, String, String, List<GloopModel>). This method will create the Gloop model Movie in solr.customSolrCore.model, based on the schema.xml file.

    1
    SolrMethods.solrSchemaToGloopModel("MovieDocument", schemaContent, null, "solr.customSolrCore.model", null)
    

All you have to do now is run the service and voila! You now have your schema.xml-based Gloop model! If you're following through our example, this will produce MovieDocument.model in solr.customSolrCore.model. We'll use this model later.

1
2
3
4
5
6
7
<package-name>
└── code
   └── solr
       └── customSolrCore
           └── model
               └── MovieDocument.model
               └── SchemaToGloopModelGenerator.gloop

The MovieDocument model would have the following fields:

  • id (String)
  • movieTitle (String)
  • director (String)
  • cast (String[])

In this case, the Groovy bean class MovieDocument.groovy will hold the movie data we want to index. We'll place it under the solr.customSolrCore.model package. Its content will be:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
package solr.customSolrCore.model

import org.apache.solr.client.solrj.beans.Field

class MovieDocument {

    String id;

    @Field
    String movieTitle;

    @Field
    String director;

    @Field
    String[] cast;

}

The @Field annotations indicate which fields we want to index.

Fields defined in the schema

If you will take a look at movie-core's schema.xml file, you will notice that its documents are defined so that it has six fields: id, movieTitle, director, cast, _version_, and text.

  • id is the identifier for our documents and whose value is automatically generated by Solr due to the UpdateRequestProcessorChain configuration in solrconfig.xml
  • _version_ is, once again, a property whose value is automatically supplied by Solr and is an internal field used by the partial update procedure, update log process, and by SolrCloud; this field is required to perform optimistic concurrency
  • text is a compilation of copied fields, and is used as the default search field when clients do their queries

The other fields are provided by the client.

Indexing the model's data

Since our model is ready, we can now create a service that gets and indexes the model's data. We'll populate our models manually to make things simpler.

Insert in bulk

You can use the SolrMethods.insertMany(...) functions to insert documents in bulk.

The MovieIndexer service will be responsible for indexing our MovieDocument's data. Here's a preview of the steps we will have in this service:

The `MovieIndexer` service's steps

MovieIndexer's sole input parameter is called movieDocument, based on the MovieDocument Gloop model we created earlier. Because of this, we will be prompted to enter four fields when we run the service: id, movieTitle, director and casts. Martini will build the movieDocument parameter from our inputs and from there, we can index movieDocument via SolrMethods.index(String, String, GloopModel).

The bullet points below explain each step in the service:

  • In Line 1, we have a try-catch block step. This allows Gloop to mirror Java's try-catch where it wraps the code that could possibly throw an exception in a try block, and perform a "rescue" in the catch block.
  • In Line 3, under the try block, we have an invoke step that calls SolrMethods.index(String, String, GloopModel). This is where the actual indexing will happen. It'll index movieDocument so that it will be available for querying in examples's movie-core Solr core later.
  • In Line 5, we have another invoke step that calls LoggerMethods.error(String); this time, under the catch block. This will just log the exception if anything goes wrong whilst indexing.

Running the service will prompt you to populate the required MovieIndexer model. You can enter whatever values you want to index. The service, if invoked successfully, should return a response similar to below:

`MovieIndexer`'s sample successful response

This time, we'll create an endpoint whose parameters are to be mapped to the MovieDocument bean's fields. We can just call this Spring-based endpoint and the indexing will take place.

Simply create a Groovy file named MovieSolrAPI in solr.customSolr and edit it so that it contains the code below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
package solr.customSolrCore

import java.util.Map
import javax.servlet.http.HttpServletRequest

import org.springframework.web.bind.annotation.*
import org.springframework.web.bind.annotation.RequestMethod
import org.springframework.http.MediaType
import org.springframework.http.ResponseEntity
import org.apache.solr.servlet.SolrRequestParsers

import io.toro.integrate.core.service.annotation.InputType
import io.toro.integrate.core.service.annotation.InputTypeField

import solr.customSolrCore.model.MovieDocument

@RestController
@RequestMapping('/solr-package')
class MovieSolrAPI {
  @RequestMapping(produces = [MediaType.APPLICATION_JSON_VALUE, MediaType.APPLICATION_XML_VALUE],
                  method = RequestMethod.POST)
  ResponseEntity<?> addDocument(@RequestParam String movieTitle,
                                @RequestParam String director,
                                @RequestParam String[] casts) {
    def document = new MovieDocument(movieTitle: movieTitle, director: director, cast: casts)
    'movie-core'.writeToIndex( null, document ).toString()
    return ResponseEntity.ok(document)
  }
}

As you may notice, in:

  • Line 25, we constructed a MovieDocument object (document variable) from the parameters of our request.
  • Line 26, we used SolrMethods.index(String, String, GloopModel) method, a function, to index the data for us. We subsequently called the GloopMethod#toString() method so that our endpoint's response is the indexed MovieDocument model.

With that said, a call to the endpoint will trigger the indexing of your movie data. For example:

1
2
3
4
curl -X POST \
  'http://localhost:8080/api/solr-package?movieTitle=Forrest%20Gump&director=Robert%20Zemeckis&casts=Tom%20Hanks&casts=Robin%20Wright&casts=Gary%20Sinise&casts=Mykelti%20Williamson&casts=Sally%20Field' \
  -H 'accept: application/json' \
  -H 'cache-control: no-cache' \

Try out the service via the service invoker

You can click on the run button shown at the beginning of the signature of a method to run the method.

Invoking a Groovy service via the service invoker