Search

Monday 17 September 2012

Ubuntu 12.04 and SOLR 3.6 Install + TIKA

Ubuntu 12.04 LTS 64 bit Apache solr Tomcat 6 install with Tika and other JARS


We have chosen this particular Linux variant for it's long term support options for updates etc.


  • Install base OS, 64 bit variant
  • install tomcat6 + tomcat6-common + tomcat6-admin
sudo apt-get install tomcat6
sudo apt-get install tomcat6-admin
  • Modify the admin configuration to allow a new login.
sudo service tomcat6 restart
  • Check Tomcat works by visiting: http://localhost:8080
  • Prepare the solr environment
  • Create a temp folder for the solr binary
mkdir -p ~/tmp/solr/
  • Navigate to that folder
cd ~/tmp/solr/
  • download the latest Apache Lucene solr binary using WGET (you will need to get the latest URL for the version you wish to install)
wget http://apache.ziply.com/lucene/solr/3.6.0/apache-solr-3.6.0.tgz
  • Decompress the binary file
tar xzvf apache-solr-3.6.0.tgz
  • Create a folder for the main solr files (this is where the main SOLR config files are, not the expanded webapp)
sudo mkdir -p /var/solr 
  • Copy the compressed solr webapp into the /var/solr folder
sudo cp apache-solr-3.6.0/dist/apache-solr-3.6.0.war /var/solr/solr.war
  • copy the main solr files into /var/solr (you can either copy the default examples SOLR or multicore)
sudo cp -R apache-solr-3.6.0/example/solr/* /var/solr/
  • Give the tomcat6 service owner rights to the /var/solr folder
sudo chown -R tomcat6 /var/solr/ 
  • set the SOLR home environment variable (solr needs to know where its config is) and change JAVA memory allocations, set on a per webapp basis
sudo gedit /etc/init.d/tomcat6
JAVA_OPTS="$JAVA_OPTS -Dsolr.home=/var/solr -Xms1g -Xmx4g" 
  • Restart tomcat6
sudo service tomcat6 restart
  • Tell Tomcat catalina where to get everything and the solr webapp is extracted and run from:
/var/lib/tomcat6/webapps/solr/ 

echo -e '\n\n' | sudo tee -a /etc/tomcat6/Catalina/localhost/solr.xml
echo 'TOMCAT6_SECURITY=no' | sudo tee -a /etc/default/tomcat6
  • As we need to index binary files such as Word and PDF we need to copy extra libraries from your extracted solr binary folders
cd ~/tmp/solr/

sudo cp apache-solr-3.6.1/contrib/extraction/lib/*.jar /var/lib/tomcat6/webapps/solr/WEB-INF/lib/
sudo cp apache-solr-3.6.1/contrib/dataimporthandler/lib/*.jar /var/lib/tomcat6/webapps/solr/WEB-INF/lib/
  • Restart the server
sudo reboot
  • We now need to test our solr installation
  • First we must install the curl command to send binary files via http to the request handler
sudo apt-get install curl
  • Create a text (non image) PDF file by creating a text file in Gedit and printing the result to the file test.pdf (save this in your /home/documents/ folder)
  • Now we will take the test file and 'curl' it into solr
cd /home//Documents/

curl "http://localhost:8080/solr/update/extract?literal.id=smoketest&commit=true" -F "myfile=@test.pdf"
  • If all goes well you should receive output similar to the following:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">124</int></lst>
</response>

  • Now test to see if you can search the index
  • From a remote PC enter the following URL:
http://:8080/solr/admin
  • In the query string box enter the following:
id:smoketest
  • Press search
  • If all works well the id will be discovered in the index and results similar to the following will be displayed:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="indent">on</str>
<str name="start">0</str>
<str name="q">id:smoketest</str>
<str name="version">2.2</str>
<str name="rows">10</str>
</lst></lst>
<result name="response" numFound="1" start="0">
<doc>
<arr name="content_type">
<str>application/pdf</str>
</arr><str name="id">smoketest</str>
</doc>
</result>
</response>

  • the 'numFound' variable shows 1 matching hit for 'smoketest' so we can confirm the system is working correctly
  • Once the basic system is running it is time to apply the custom solr schema (if you have/need one). The schema resides in the following location:
/var/solr/conf/schema.xml










No comments:

Post a Comment