In the past few weeks I've been implementing advanced search at Plaxo, working quite closely with Solr enterprise search server. Today, I saw this relatively detailed comparison between Solr and its main competitor Sphinx (full credit goes to StackOverflow user mausch who had been using Solr for the past 2 years). For those still confused, Solr and Sphinx are similar to MySQL FULLTEXT search, or for those even more confused, think Google (yeah, this is a bit of a stretch, I know).
Similarities
- Both Solr and Sphinx satisfy all of your requirements. They're fast and designed to index and search large bodies of data efficiently.
- Both have a long list of high-traffic sites using them (Solr, Sphinx)
- Both offer commercial support. (Solr, Sphinx)
- Both offer client API bindings for several platforms/languages (Sphinx, Solr)
- Both can be distributed to increase speed and capacity (Sphinx, Solr)
Here are some differences
- Solr, being an Apache project, is obviously is Apache2-licensed. Sphinx is GPLv2. This means that if you ever need to embed or extend (not just "use") Sphinx in a commercial application, you'll have to buy a commercial license.
- Solr is easily embeddable in Java applications.
- Solr is built on top of Lucene, which is a proven technology over 7 years old with a huge user base (this is only a small part). Whenever Lucene gets a new feature or speedup, Solr gets it too. Many of the devs committing to Solr are also Lucene committers.
- Sphinx integrates more tightly with RDBMSs, especially MySQL.
- Solr can be integrated with Hadoop to build distributed applications
- Solr can be integrated with Nutch to quickly build a fully-fledged web search engine with crawler.
- Solr can index proprietary formats like Microsoft Word, PDF, etc. Sphinx can't.
- Solr comes with a spell-checker out of the box.
- Solr comes with facet support out of the box. Faceting in Sphinx takes more work.
- Sphinx doesn't allow partial index updates for field data.
- In Sphinx, all document ids must be unique unsigned non-zero integer numbers. Solr doesn't even require a unique key for many operations, and unique keys can be either integers or strings.
- Solr supports field collapsing to avoid duplicating similar results. Sphinx doesn't seem to provide any feature like this.
Related questions
- http://stackoverflow.com/questions/1284083/choosing-a-stand-alone-full-text-search-server-sphinx-or-solr
- http://stackoverflow.com/questions/1132284/full-text-searching-with-rails
- http://stackoverflow.com/questions/737275/pros-cons-of-full-text-search-engine-lucene-sphinx-postgresql-full-text-searc
Conclusion
In my experience, Solr is very-very fast on the query side. It is also very powerful. The indexing side is very CPU and memory intensive and is an unfortunate side effect of having such a feature-rich, fast application. Nevertheless, I highly recommend Solr.
For disclaimer purposes, I have not had much experience with Sphinx and, again, all credit for this comparison goes to mausch.
- Hidden Features Of Perl, PHP, Javascript, C, C++, C#, Java, Ruby, Python, And Others [Collection Of Incredibly Useful Lists]
- Delicious.com [Quietly] Rolls Out Domain And Url Searching/Filtering. Finally!
- Top 10 Reasons Why Digsby ROCKS
- My MySQL Conference Schedule
- MySQL Indexing Considerations Of Implementing A Priority Field In Your Application
PlanetMySQL Voting:
Vote UP /
Vote DOWN"
No comments:
Post a Comment