Ubuntu 12.04 LTS 64 bit Apache solr Tomcat 6 install with Tika and other JARS
We have chosen this particular Linux variant for it's long term support options for updates etc.
- Install base OS, 64 bit variant
- install tomcat6 + tomcat6-common + tomcat6-admin
sudo apt-get install tomcat6
sudo apt-get install tomcat6-admin
- Modify the admin configuration to allow a new login.
sudo service tomcat6 restart
- Check Tomcat works by visiting: http://localhost:8080
- Prepare the solr environment
- Create a temp folder for the solr binary
mkdir -p ~/tmp/solr/
- Navigate to that folder
cd ~/tmp/solr/
- download the latest Apache Lucene solr binary using WGET (you will need to get the latest URL for the version you wish to install)
wget http://apache.ziply.com/lucene/solr/3.6.0/apache-solr-3.6.0.tgz
- Decompress the binary file
tar xzvf apache-solr-3.6.0.tgz
- Create a folder for the main solr files (this is where the main SOLR config files are, not the expanded webapp)
sudo mkdir -p /var/solr
- Copy the compressed solr webapp into the /var/solr folder
sudo cp apache-solr-3.6.0/dist/apache-solr-3.6.0.war /var/solr/solr.war
- copy the main solr files into /var/solr (you can either copy the default examples SOLR or multicore)
sudo cp -R apache-solr-3.6.0/example/solr/* /var/solr/
- Give the tomcat6 service owner rights to the /var/solr folder
sudo chown -R tomcat6 /var/solr/
- set the SOLR home environment variable (solr needs to know where its config is) and change JAVA memory allocations, set on a per webapp basis
sudo gedit /etc/init.d/tomcat6
JAVA_OPTS="$JAVA_OPTS -Dsolr.home=/var/solr -Xms1g -Xmx4g"
- Restart tomcat6
sudo service tomcat6 restart
- Tell Tomcat catalina where to get everything and the solr webapp is extracted and run from:
/var/lib/tomcat6/webapps/solr/
echo -e '\n ' | sudo tee -a /etc/tomcat6/Catalina/localhost/solr.xml\n 
echo 'TOMCAT6_SECURITY=no' | sudo tee -a /etc/default/tomcat6
- As we need to index binary files such as Word and PDF we need to copy extra libraries from your extracted solr binary folders
cd ~/tmp/solr/
sudo cp apache-solr-3.6.1/contrib/extraction/lib/*.jar /var/lib/tomcat6/webapps/solr/WEB-INF/lib/
sudo cp apache-solr-3.6.1/contrib/dataimporthandler/lib/*.jar /var/lib/tomcat6/webapps/solr/WEB-INF/lib/
- Restart the server
sudo reboot
- We now need to test our solr installation
- First we must install the curl command to send binary files via http to the request handler
sudo apt-get install curl
- Create a text (non image) PDF file by creating a text file in Gedit and printing the result to the file test.pdf (save this in your /home/documents/ folder)
- Now we will take the test file and 'curl' it into solr
cd /home//Documents/ 
curl "http://localhost:8080/solr/update/extract?literal.id=smoketest&commit=true" -F "myfile=@test.pdf"
- If all goes well you should receive output similar to the following:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"><int name="status">0</int><int name="QTime">124</int></lst> </response>
- Now test to see if you can search the index
- From a remote PC enter the following URL:http://
:8080/solr/admin 
- In the query string box enter the following:id:smoketest
- Press search
- If all works well the id will be discovered in the index and results similar to the following will be displayed:
<response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">0</int> <lst name="params"> <str name="indent">on</str> <str name="start">0</str> <str name="q">id:smoketest</str> <str name="version">2.2</str> <str name="rows">10</str> </lst></lst> <result name="response" numFound="1" start="0"> <doc> <arr name="content_type"> <str>application/pdf</str> </arr><str name="id">smoketest</str> </doc> </result> </response>
- the 'numFound' variable shows 1 matching hit for 'smoketest' so we can confirm the system is working correctly
- Once the basic system is running it is time to apply the custom solr schema (if you have/need one). The schema resides in the following location:/var/solr/conf/schema.xml
 
No comments:
Post a Comment