should be defined before the indexes are inserted.
if you change the schema, you will need to remove and recreate all indices.
name of the field.
type of field (string, text, long, double, int etc).
indexed (whether you want to search by that field or not)
store (whether you want to store and retrieve that field or not)
multivalues (whether this field can have multi values or not)
defaults for a field, if any.
whether a field is required.
whether a field has unique keys/primary key.
how to index and search each field (depending on the Analyser, which includes Tokenizer, and Filter)
when the data is inserted in SOLR and indexes are created, the data is 'analysed'.
A tokenizer can be defined for a field type, which will mention how that field should be converted into tokens(field could be converted into individual words, or individual chars, or some chars could be removed)
the data is pruned(lower casing, removing word stems), and token are created(depending on the Tokenizer)
When the search is done, it is done using the tokens.
Example SOLR queries
q=title:Hola => search for Hola in the title.
q=title:Hola* => search for anything starting with Hola in the title.
q=title:Hola*Senorita => search for a word starting with Hola and ending with Senorita.
q=title:"Hola Senorita" => search for "Hola Senorita" in title.
q=title:"Hola Senorita" AND body:"Hello" => search for "Hola Senorita" in title and "Hello" in the body.
q=title:Hola -title:Senorita => search for "Hola" in title but not "Senorita".
q=date:[20020101 TO 20030101] => search for docs whether date is between 20020101 and 20030101.
q=title:"Hola Senorita"~4 => Hola and Senorita are within 4 words of each other.
q=(title:Hola)^10 OR (body:Hola) => search where Hola exists in title or body, and its occurence in title is 10 time more relevant.
SolrJ can be used for writing clients for interacting with SOLR.
applied at query time.
eg boosting a term if it comes in the title.
q=(title:Hola)^10 OR (body:Hola)
applied at document indexing time.
a field can be mentioned to be boosted by a factor in the schema.xml
TODO : GIVE EXAMPLE HERE.
String type Vs Text type:
String type does not split the string with any tokens, so it returns the data if the entire string is searched.
If a field contains String Hola Senorita, search by Hola will not return result, but by Hola Senorita will return result
Text type splits the string with tokens, and does other changes like lower casing, so, searching can be made on word basis.
If a field contains String Hola Senorita, search by Hola or Senorita or Hola Senoritawill return result.