Solr installation
Hi, I thought I would revisit my solr install to try and clarify a few points where people were having problems loading the solr-cell class and getting binary files to index, with errors such as these:Error loading class
'org.apache.solr.handler.dataimport.DataImportHandler'
So, hopefully by following this tutorial you should get a working solr install:
Install base OS, 64 bit variant, in this case Ubuntu 12.04 64 bit
create a solr user and log in with that userGo to 'Software centre' and install Synaptic Package Manager
install tomcat6 + tomcat6-common + tomcat6-admin
sudo apt-get install tomcat6
sudo apt-get install tomcat6-admin
Modify the Tomcat admin configuration to allow a new manager login.
sudo gedit /etc/tomcat6/tomcat-users.xml
Add the following to the file, commenting out other similar lines
Restart tomcat
sudo service tomcat6 restart
Check the root of Tomcat works by visiting: http://localhost:8080
Then log into the Tomcat manager page by visiting http://localhost:8080/manager/html and entering your Tomcat username and password that you setup previously
Prepare the solr environment
Create a temp folder for the solr binary
cd /home/solr/
mkdir tmp/solr/
Navigate to that folder
cd tmp/solr/
download the relevant Apache Lucene solr binary using WGET (you will need to get the URL for the version you wish to install)
wget
http://apache.mirrors.timporter.net/lucene/solr/3.6.2/apache-solr-3.6.2.tgzDecompress the binary file, copy files into their relevant locations and give the tomcat user rights to the working folder
tar xzvf apache-solr-3.6.2.tgz
cd apache-solr-3.6.2
sudo cp dist/apache-solr-3.6.2.war /var/lib/tomcat6/webapps/solr.war
sudo cp -fr example/solr /var/lib/tomcat6/
sudo chown -R tomcat6:tomcat6 /var/lib/tomcat6/solr
Restart tomcat6
sudo service tomcat6 restart
Set the solr home directory
sudo gedit /var/lib/tomcat6/webapps/solr/WEB-INF/web.xml
Modify the file accordingly, by uncommenting the solrhome env entry and adding the line below:
/var/lib/tomcat6/solr/
Restart tomcat6sudo service tomcat6 restart
Check you can start solr
http://localhost:8080/solr
Setup solr-cell for binary file (word, pdf etc) indexing
First we need to load the solr-cell module in solrconf.xml
sudo gedit /var/lib/tomcat6/solr/conf/solrconfig.xml
Find and change the solr-cell line to match the following:
lib dir="/var/lib/tomcat6/webapps/solr/WEB-INF/lib/" regex="apache-solr-cell-\d.*\.jar"
Save the file
Now we must copy all the relevant .jar files into the classpath library
cd /home/solr/tmp/solr/apache-solr-3.6.2/
sudo cp contrib/extraction/lib/*.jar /var/lib/tomcat6/webapps/solr/WEB-INF/lib/
sudo cp contrib/dataimporthandler/lib/*.jar /var/lib/tomcat6/webapps/solr/WEB-INF/lib/
sudo cp dist/*.jar /var/lib/tomcat6/webapps/solr/WEB-INF/lib/
Restart tomcat6
sudo service tomcat6 restart
Test our solr installation
First we must install the curl command to send binary files via http to the request handler
sudo apt-get install curl
Create a text (non image) PDF file by creating a text file in Gedit with a keyword, such as 'otter' and printing the result to the file test.pdf (save this in your /home/documents/ folder)
Now we will take the test file and 'curl' it into solr
cd /home//Documents/
curl "http://localhost:8080/solr/update/extract?literal.id=smoketest&commit=true" -F "myfile=@test.pdf"
Now test to see if you can search the index for the id
From a remote PC enter the following URL:
http://
In the query string box enter the following:
id:smoketest
Press search and the results should show a match for the 'numFound' variable. This shows 1 matching hit for 'smoketest' so we can confirm the system is working correctly.
Once the basic system is running it is time to apply the custom solr schema (if you have/need one). The schema resides in the following location:
/var/lib/tomcat6/solr/conf/schema.xml