Ingenuity: Engineering Awesome: haystack + SOLR search and wkhtmltopdf in Django

Two of the most useful tools in web app development are searching and the pdf export. I will be introducing Django apps that do just those. There are official tutorials but here I'll be discussing the basics of customization.

Haystack Search + SOLR

First up, the search app. The most popular Django app for searching, as far as I know, is django-haystack. It supports several searching backends - the more advanced SOLR, ElasticSearch, Xapian, and the simpler Whoosh.

I have tried Whoosh but so far, I like SOLR better. Haystack supports functions for SOLR that are not in Whoosh; one in particular is faceted search.

You'd need to have an Apache server running Tomcat to setup SOLR (see setup guide on their website). If you lack the resources or just don't want to be bothered with setting up Apache, Tomcat, and SOLR - the works, there is a free SOLR service on the web - http://opensolr.com/. You can create a free account and setup a SOLR core on which to store your indexes. All you need to do is to upload your schema.xml - the structure of your indexes which I will discuss later.

The first thing you need, assuming you already have a SOLR backend setup, is to install haystack into your virtual environment which can be done by a simple pip install django-haystack. Create a search_sites.py file inside your app:

#!/project/your_app/search_sites.py

import haystack
haystack.autodiscover()

Then add these to your django settings file:

INSTALLED_APPS = (
    ...
    'haystack',
    ....
)

HAYSTACK_SITECONF = 'your_app.search_sites' #path to your search_sites.py file
HAYSTACK_SEARCH_ENGINE = 'solr'
HAYSTACK_SOLR_URL = 'http://localhost:8080/solr/' #url of your SOLR backend
HAYSTACK_SEARCH_RESULTS_PER_PAGE = 20
HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
        'URL': HAYSTACK_SOLR_URL
    },
}

Haystack + SOLR works by creating an index of your data. The index is what the SOLR backend will use for searching. To create that index, you need to create a search_indexes.py file in your app directory. Haystack automatically searches for any such file. Inside, you'll create the structure of your index like so:

#!/project/your_app/search_indexes.py

from datetime import datetime
from django.template.defaultfilters import slugify
from haystack import indexes, site

from your_app.models import YourModel

class YourModelIndex(indexes.RealTimeSearchIndex):
    """
       This class will be the structure of your index.
       It must inherit from one of haystack's index types.
       I recommend RealTimeSearchIndex as it updates automatically 
       your index when a model instance is modified or added.

       Notes:
       * there must always be a text field for each index class.
       * model_attr argument is used to associate your index field 
       with your model field. It takes a string - the name of your model field.
    """
    text = indexes.CharField(document=True, use_template=True)
    field1 = indexes.CharField(model_attr='model_field1', null=True)
    field2 = indexes.MultiValueField(null=True)
    modified = indexes.DateTimeField(model_attr='model_date_modified', null=True)
    sort_field = indexes.CharField(indexed=True)

    def get_model(self):
        return YourModel

    def prepare_field2(self, model_instance):
        """
           prepare_[one of your model index's field names] methods are added 
           if further processing of data is needed before being indexed, 
           for example, a list.
        """
        return [item.name for item in model_instance.your_multivalue_field.all()]

    def prepare_sort_field(self, model_instance):
        """
            I use this to manipulate data that I can manually sort later.
            Useful if you do a lot of sorting. 
            You can add as many sort fields as you need.
        """
        return slugify(model_instance.lastname+' '+model_instance.firstname)
    
    def index_queryset(self):
        """Used when the entire index for model is updated."""
        return self.get_model().objects.filter(date_modified__lte=datetime.now())


site.register(YourModel, YourModelIndex)

Next, you'll need to create the template for your index. The template must be in templates/search/indexes/your_app/yourmodel_text.txt. You'll have to make one for each model you need to index.

{% autoescape off %}
{{ object.lastname }}
{{ object.firstname }}
{% endautoescape %}

Once that's done, you'll need to create the schema for your SOLR. In your terminal run python manage.py build_solr_index. Copy the output to your SOLR's schema.xml configuration file and restart your server to load the changes. To start the indexing process, run python manage.py rebuild_index.

You can now start searching by using the default search page, all you need to do is point a url to a haystack view. If you'd rather customize your search, you can begin by subclassing one of haystack's many views i.e. SearchView. By subclassing, you can override the template and the form, essential for customizing the look and the behavior of your search.

#!/project/your_app/views.py
from haystack.query import SearchQuerySet
from haystack.views import SearchView


class MySearchView(SearchView):
    __name__ = 'MySearchView'
    template = 'my-search-template.html'
    
    def __init__(self, *args, **kwargs):
        # Needed to switch out the default form class.
        if kwargs.get('form_class') is None:
            kwargs['form_class'] = MySearchForm
        
        super(SearchView, self).__init__(*args, **kwargs)


def search(request):
    sqs = SearchQuerySet()
    form = MySearchForm(request.GET)
    if form.is_valid():
        cd = form.cleaned_data

    return MySearchView(form_class=MySearchForm, searchqueryset=sqs)(request)

Your form must subclass the SearchForm class from haystack. You can change the behavior of the search by adding form fields (i.e select fields, and radio button choices) and overriding the form's search method.

from your_app.models import YourModel

import haystack
from haystack.query import SearchQuerySet
from haystack.forms import SearchForm


class MySearchForm(SearchForm):
    #q is the default field in all SearchForm classes
    q = forms.CharField(required=False, label='Search') 
    field1 = forms.ChoiceField(choices=CHOICES, label='Search In')

    def __init__(self, *args, **kwargs):
        super(MySearchForm, self).__init__(*args, **kwargs)
        #I usually put here the dynamic choices for my ChoiceFields

    def search(self):
        """Customize your search behavior here."""
        #call the search method by the superclass to narrow the search
        #but can be modified to set all rows for searching
        sqs = super(MySearchForm, self).search()

        #if you have multiple indexed models, 
        #you can specify which model to search on
        sqs = sqs.models([YourModel])

        #the most basic way to search is to use filter
        sqs = sqs.filter(lastname='Sample') #you can filter multiple fields
        sqs = sqs.filter(firstname='Sample') #filter uses AND 
        #so these two filters match the record with Sample as first and lastname
        
        #you can have multiple SearchQuerySets
        sqs2 = sqs._clone()
        
        #if you need to filter thru several fields using OR
        sqs2 = sqs2.filter_or(lastname='Sample2')
        sqs2 = sqs2.filter_or(firstname='Sample2')
        
        #combining multiple SQS
        sqs = sqs.__and__(sqs2)
        
        #OR
        sqs = sqs.__or__(sqs2)
        
        #if your search is too complex you can always use raw SOLR queries
        sqs = sqs.narrow(u"age:[18 TO 30] AND height:[170 TO 200]")
        
        #sort
        sqs = sqs.order_by('sort_field')
        return sqs

WKHTMLTOPDF

The second tool is django-wkhtmltopdf. This app is a wrapper for the wkhtmltopdf tool which renders a HTML file to a PDF file using webkit. Basically, what the wrapper does is that it renders a template, like django's render_to_response function, write it to a temporary HTML file, call the wkhtmltopdf tool to convert the file to PDF and then return the PDF as the response.

Their site includes a straightforward setup guide so I won't dig into that. However, the wrapper's docs don't discuss much about customizing the view so I'll be discussing that.

Here's a sample Django view:

from wkhtmltopdf.views import PDFTemplateResponse


def toPDF(request, template='to/template.html'):
    template = 'custom/template.html'
    data = getData()
    context = {
        'data' : data
    }
    cmd_options = settings.WKHTMLTOPDF_CMD_OPTIONS

    #return render_to_response(template, context)
    return PDFTemplateResponse(request=request, context=context, template=template, filename='filename', header_template='pdf/header.html', footer_template='pdf/footer.html', cmd_options=cmd_options)

You create the templates as you would a template for a webpage; but here, you have an option to specify the separate template files for the header and footer. For more on how to customize the output see here.

That's it! Hope these tools and my examples help you in your own coding.

Monday, April 8, 2013

haystack + SOLR search and wkhtmltopdf in Django

No comments:

Post a Comment