Blog

Geospatial searches with ThinkingSphinx

One of our recent projects required geo-aware searches of business addresses. It is not news that Hashrocket prefers ThinkingSphinx for fulltext searches. Sphinx Search and as a result ThinkingSphinx also support searching around a geographical point.

Understanding How It Works

This was my first foray into geospatial searches. ThinkingSphinx makes this task fairly straight-forward and only requires that you have a data-set containing geocoded datums. Specifically, it is assumed that you have a latitude and longitude associated with the data you are searching. ThinkingSphinx will accept columns names lat or latitude for latitude and lon, long, or longitude for longitude. One obvious missing abbreviation is lng for longitude. I have submitted a patch, in the mean time the set_property method can be used if your data do not match these expectations. Sphinx stores these geographical columns in radians to avoid the overhead of crunching these numbers on the fly. Indexes and subsequent searches therefore are expected to provide the anchor point in radians and not floating point numbers.
  
    define_index do
      indexes :name
  
      # Explictly convert column to radians at the time of indexing.
      # Note: RADIANS() is run in the context of MySQL/PostgreSQL
      has 'RADIANS(lat)', :as => :lat, :type => :float
      has 'RADIANS(lng)', :as => :lng, :type => :float

      # If your lat/lng data are in an associated table provide the full path.
      # has'RADIANS(addresses.lat)', :as => :lat, :type => :float
      # has'RADIANS(addresses.lng)', :as => :lng, :type => :float  
  
      # ThinkingSphinx expects your latitude and longitude attributes are named
      # any of lat, latitude, lon, long or longitude. If that‘s not the case, 
      # you will need to explicitly describe them in your index.
      set_property :longitude_attr => "lng"
      set_property :delta => true
    end
  

Putting It To Use

Our latitude and longitude data are stored in BigDecimal format. An extension to the class allowed the quick conversion from floating point to radian.
  
    class BigDecimal
      def to_radians
        ( self / 360.0 ) * Math::PI * 2
      end
    end
  
After defining the index, once you have stopped, (re-)indexed, and (re-)started your daemon you can run a search.
   
    Model.search('term', 
        :geo => [@address.lat.to_radian, @address.lng.to_radian])
  
At this point the result will be eerily similar with the exception that the Sphinx result-set has returned a new attribute called @geodist. Now we have a new arsenal, we can sort by distance, filter by a distance range, and even cluster results by geographical region.

Sorting

@geodist only exists in the context of the Sphinx result. Therefore we need refer to it as a string. As with all strings in ThinkingSphinx clauses you must provide the direction (ASC or DESC) or no results will be returned.
  
    Model.search('term', 
        :geo => [@address.lat.to_radian, @address.lng.to_radian], 
        :order => '@geodist ASC')
  

Filtering

To filter you need an anchor point and a distance range. Again we provide the geodist argument as a string.
  
    Model.search('term', 
        :geo => [@address.lat.to_radian, @address.lng.to_radian], 
        :with => { '@geodist' => Range.new(0.0, 10.0) })
  

Grouping Clauses

The documentation is in need for group_clauses but, the pattern should be fairly clear. ThinkingSphinx itself does nothing more than pass grouping clauses through to Sphinx. Have a look at the Sphinx documentation for relevant details.
Commenting is not available in this section entry.

Comments

Anderson 09.16.08 @ 1:32pm

Great post!

Congratulation

li0n 09.28.08 @ 8:12am

Very nice, BT!