Tags

, , ,

integrate-apache-solr-in-drupal7

Initial Solr Setup:

1. Install the latest Java JDK from http://www.oracle.com/technetwork/java/javase/downloads/index.html.(Make sure to select 64bit version if you need it.)

2. Download Solr 1.4.1 from one of the mirrors at http://archive.apache.org/dist/lucene/solr/1.4.1/(at the time of writing, not all mirrors seem to be hosting 1.4.1, but most seem to have at least 1.4.0)

0

3. Unzip the Solr download. You should have the files listed in the image below. Open the “example” folder.

1

4. Copy the “etc”, “lib”, “logs”, “solr”, “webapps”, and “start.jar” folders to C:\solr (you will need to create the folder at C:\solr)

asd_2

5. Now open the C:\solr\solr folder and copy the contents back to the root C:\solr folder. When you are done you can delete the C:\solr\solr folder.

asd_3

6. At this point your C:\solr directory should contain the following folders and files.

asd_4

7. Solr can now be run at this point if you start it from the command line. Change your directory to c:\solr and then run:
“java -Dsolr.solr.home=c:/solr/ -jar start.jar”

asd_7

8. If you go to http://localhost:8983/solr/ you should be greeted with the Welcome to Solr message.

asd_5

Setup Jetty to Run as a Windows Service using NSSM:

Now that Solr is up and running, we can work on getting Jetty to run as a Windows service. Since Jetty comes bundled with Solr, all we need is a way to run it as a service. There are several options to do this, but the one that I have found works the best and is the most compatible across windows environments is NSSM – the Non Sucking Service.

1. Once you download NSSM, open the win32 or win64 folder as appropriate and copy nssm.exe to your c:\solr folder.

2. Open an elevated command prompt and change the directory to C:\solr. and then run:
“nssm install Solr”

asd_8

3. A dialog will open. Select java.exe as the “Application”, located at “C:\Windows\System32\” .

4. In the “Options” input box enter:
“-Dsolr.solr.home=C:/solr/
-Djetty.home=C:/solr/ -Djetty.logs=C:/solr/logs/ -cp
C:/solr/lib/*.jar;C:/solr/start.jar -jar C:/solr/start.jar”

asd_9

Important! If you copy and paste the line above make sure to take out the line break.

5. Click Install service. You should get a Service successfully installed message.

asd_10

6. Finally run:
“net start Solr”

asd_11

Setting Up Apache Maven on Windows:

1. Download Apache Maven from http://maven.apache.org/download.cgi.

2. Unzip the distribution archive, i.e. apache-maven-3.1.1-bin.zip to the directory you wish to install Maven 3.1.1. These instructions assume you chose C:\maven.

3. Add the MAVEN_HOME environment variable by opening up the system properties , selecting the “Advanced” tab, and the “Environment Variables” button, then adding the MAVEN_HOME variable in the system variables with the value C:\maven. Be sure to omit any quotation marks around the path even if it contains spaces. Note: For Maven 2.0.9, also be sure that the MAVEN_HOME doesn’t have a ‘\’ as last character.

4. In the same dialog, add the MAVEN environment variable in the system variables with the value %MAVEN_HOME%\bin.

5. Optional: In the same dialog, add the MAVEN_OPTS environment variable in the system variables to specify JVM properties, e.g. the value -Xms256m -Xmx512m. This environment variable can be used to supply extra options to Maven.

6. In the same dialog, update/create the Path environment variable in the system variables and prepend the value %MAVEN% to add Maven available in the command line.

7. In the same dialog, make sure that JAVA_HOME exists in your system variables or in the system variables and it is set to the location of your JDK, e.g. C:\Program Files\Java\jdk1.5.0_02 and that %JAVA_HOME%\bin is in your Path environment variable.

8. Open a new command prompt and run mvn –version to verify that it is correctly installed.

Setting up Apache Tika :

1. Download Apache Tika from http://www.apache.org/dyn/closer.cgi/tika/tika-1.4-src.zip.

2. Unzip the archive, i.e tika-1.4-src to the directory you wish to install Tika 1.4. These instructions assume you chose C:\tika-1.4.

3. Open a new command prompt and navigate to C:\tika-1.4.

4. Once you have navigated to C:\tika-1.4 then run the command:”mvn install”. Executing the following command in the base directory will build the sources and install the resulting artifacts in your local Maven repository. This will take some time.

5. Once the above process is completed, the main directory of concern would be C:\tika-1.4\tika-app\target. Inside this directory we will find a “tika-app-1.4.jar” file.

6. Open a new command prompt and navigate to C:\tika-1.4\tika-app\target\ and the execute the command :
“java -jar tika-app-1.4.jar -t [path to you document]”.

For example we give path to a .pdf document in the “[path to you document]” field. It will display the contents of that .pdf document in the command prompt interface. This ensures that Apache Tika is working fine.

7. To have a GUI interface for Apache Tika, open a new command prompt and navigate to C:\tika-1.4\tika-app\target\ and the execute the command :
“java -jar tika-app-1.4.jar -g”.

This will open Apache Tika GUI interface. There we can drag and drop our documents into it, and then we can view the contents of that document.

In the Apache Tika GUI, we can select the desired view mode by selecting an option, listed, after we click on “View” tab on the Apache Tika GUI interface.

Setting up Drupal 7.x :

1. Download this php_uploadprogress.dll file.

2. Copy php_uploadprogress.dll into {PHP_PATH}\ext folder.

3. Open php.ini and add the following:
extension=php_uploadprogress.dll
uploadprogress.file.filename_template=”C:\WINDOWS\TEMP\upt_%s.txt”

4. Restart server.

5. Install a fresh copy of Drupal 7.x.

6. Install the following modules into your Drupal installation.
Apache Solr Search from https://drupal.org/project/apachesolr.
Apache Solr Attachments from https://drupal.org/project/apachesolr_attachments.

7. Open the “solr-conf” directory, located at {DRUPAL_INSTALLATION_PATH}\sites\all\modules\apachesolr\.

asd_15

Then open the directory named as “solr-1.4″(because in this tutorial we are using Apache Solr 1.4.1)

asd_16

Then copy the following files:

asd_17

Then after copying these files open c:\solr\conf\ and paste these files there.

It will ask you to replace the existing files, then proceed further and replace the existing files.

8. Open a new command prompt and start the Apache Solr server.

9. Open your Drupal site and enable the Apache Solr framework, Apache Solr Search and Apache Solr Attachments in the modules section.

asd_18

10. Then goto DRUPAL_ROOT/admin/config/search/apachesolr page and select the following in the “Configuration” section.

asd_19

11. Then click on the “Settings” tab. There we will be displayed the search environment details. Since we had set-up this on local host, we will be displayed “localhost server” as the default search environment.

asd_20

Then for the “localhost server” we will be displayed “Edit” link below the “OPERATIONS” field.

asd_21

Click on the “Edit” link. After that we will be displayed the edit page. At the end of that page, will be button labelled as “Test connection”.

asd_22

Click on test connection and we will get the following message:

asd_23

At this point we are confirmed that our Drupal site has contacted the Apache Solr server that we had set-up in the earlier steps.

12. Then go to DRUPAL_ROOT/admin/config/search/apachesolr/attachments.

In the “Extract using” section “Tika(local java application)” will be selected as default.

In the “Tika directory path” section, give the path to “tika-app-1.4.jar”. In this case its path will be “C:\tika-1.4\tika-app\target\”

In the “Tika jar file” section give the name of the tika jar file. In this case it would be “tika-app-1.4.jar”.

Then save the configurations made.

asd_24

Then in the “Actions” section click on the button labelled as “Test your tika extraction”. After that if we get the following message,

asd_26

then we had successfully integrated Apache Tika into our Drupal installation.

13. Create or edit an existing content type to add a field of “file” type.

In the “Allowed file extentions” section mention the document file types that you can upload onto it. Example: txt pdf.

asd_27

14. Then create a node of the above content type and upload a document onto it.

15. Then go to DRUPAL_ROOT/admin/config/search/apachesolr
– Then click on “Queue all content for reindexing” button.
– After that click on “Index all queued content”.
– After that below the “TYPE” and “VALUE” field header, the “Indexed” value will change, and it will display the number of nodes that have been created are indexed by Apache Solr.

asd_29

16. Then we go to our Drupal site’s home page and we perform a search for a keyword and search it in the “Site” section, and it will display the results from the uploaded document’s content.

asd_30

Note: If we want indexing of large files by Apache Solr and Apache Tika we can do this small change.

– open the “my.ini” file located at {mysql_path}/bin

– Change “max_allowed_packet” from “1M” to “16M”

– Save “my.ini” file

– Now restart MySql

Advertisements