AltSearchEngines Day 2009

This Monday I attended the Alt Search Engines conference in SOMA in San Francisco. Many thanks to Charles Knight of altsearchengines.com for organizing this interesting event! Clearly there are quite a few startups who are unhappy with the status quo of search = ten blue web links on white page :)

The definition of "search engine" appears to be very flexible based on how diverse the participants were. Few actually did the whole spectrum of crawling, indexing, ranking, and UI/presentation. Most startups focused on some vertical or functional piece, such as UI, crawling, semantic analysis, etc. Also, the level of sophisticationn and polish of the products varied quite a bit, and many were alpha or pre-release even.

Here are some of my notes and observations:

Real Time Search:

Demo by Collecta.com. They crawl blogs, blog comments, forums, and even online chat rooms through chat bots. Like Twitter search, results are chronologically displayed (this is the primary ranking function) on a live Ajax-y results page. A nice touch (on an admittedly pre-release UI) was to show a ticking digital clock close to the search box, and a time stamp next to each result to highlight just how fresh it is. Interestingly enough, their approach is to ingest (crawl) live content and show it immediately on SRP via simple string match (grep) because their indexing is not actually real time. I expect there will be a sacrifice of precision (relevance) and probably coverage / recall since content classification, filtering, query expansion, ranking, etc are not easy to do real time. I hope their final product exceeds these expectations!

Image and Visual Search:

Imprezzeo - search stock photography images, 15 million of them. Showed search based on metadata. After searching for "car" or "mountain stream", you can select several results and then "refine search" which also seems to be basically matching the largest subset of metadata (stock photos are heavily tagged). Claims that they can also search based on similarity in image data itself (computer vision analysis). Biz model is to license technology / service to others instead of being a consumer destination.

GazoPa - Japanese image search engine developed by Hitachi (huge JP corporation). Image similarity search. Draw something and GazoPa will find it! Gave example of drawing a blue t-shirt (input with mouse into something similar to MS Paint), and then a number of product photos come up as results. They see product search as a way to monetize. Reminds me of Like.com, formerly known as Riya, which is a more mature company and product in the same space.

Viewzi - visual search with multiple views and presentation layouts on the results page, depending on vertical. Think of themselves as a discovery engine. Powergrid view of SRP: when looking for U2 albums, they query Amazon, get all the image info, and upon click, you can see larger images, listen to tracks (from Amazon) etc. Looks at related tags and metadata and creates a 3d tag and image cloud SRP, e.g. U2 -> Bono -> Edge, etc. Many SRP views based on vertical, e.g. for celeb photos it looks more like a collage. B2B for media properties and sites that spices up their site or company specific SRPs. Gave example with Chris Pirillo's site whose search box is powered by Viewzi. Customer site can specify how to rank and what back ends to query etc.

CoolIris - more of a presentation layer than true search engine. For example, it can display an unlimited (in size) 3d wall of images as a results page. Ranking and results same as Google's, but presentation is different, and you do not need to be clicking next, etc. You an also view images as a slideshow or other views. Also searches videos with a cool browse experience on the SRP. They have "discovery" experiences where they show fresh results from RSS feeds from sports news sites, etc, with 15-30 min latency, near real time. Nice eye candy!

Pixsy - visual search and content syndication company. Looks similar to and competes with Blinkx. Pixsy aggregates and distributes visual content from sources like People.com, HBO, BBC, Bloomberg, Getty Images, etc. Distributes to Lycos, Rediff, NameMedia, etc. White label video search is not a good revenue model according to them. Video syndication more interesting and lucrative than search in that regard. Images are hard to monetize too, but they think high res premium images from stock libraries, displayed with bottom overlay ads, may work in the future.

One thing that came across was that monetization for visual search is a challenge at this point.

UI and results presentation:

SearchMe - view result pages like iPhone shows music album covers, by animated scrolling. They have been around for a while. I do not think this slow browsing is a very compelling user experience on a PC. They mentioned TV and other "lean back" form factors and experiences. Might work for that?

Tazti - PC-based client for voice powered search, developed by a self-funded mid-West startup. Basically, they use voice commands like "search X: Y" where X is name of the search engine (besides the big 3, they offer some alternative and more offbeat ones), the colon is a beep prompt from the system, and Y is the query text (which could also be spelled out letter-by-letter). They are purely an alternative form of input to established keyword based search engines. There is no voice indexing, transcription of content, ranking changes, etc.

Kosmix - modular search result pages with rich content based on search topic. Good for exploring a topic if you want to learn about it. Example they gave: pop art. Results include definitions, forums (with indication of level of activity), answers, images, etc. Also shows a rich set of related queries and topics, "related in the Kosmos". Does disambiguation for queries like "apache" or "apple". If apple = food/fruit, show recipes, if technology related, show blogs, etc. For tourist destinations, it shows you things to do, maps, events, restaurant and hotel reviews, etc. Kosmix claims they have the most extensive taxonomy on the web. They use it to find related content and for what types of content to display and how, which they claim makes them better than many/most meta search engines. Launched in December 2008, English language only.

SurfCanyon - IE or Firefox plugin which reranks Google / Yahoo / MSN results on the fly based on user behavior. After user clicks a result, the rest of the results are reranked based on inferred intent. If you click Next on page 1 to go to SRP page 2, it is already re-ranked by Surf Canyon to include results that normally may be on page 24 etc. Claimed increase in average number of clicks on second result page by 30-40% compared to control set. However, they do not distinguish between good / bad, or long / short clicks, and do not have any other form of human evaluation of user intent or result relevance, nor do they measure ranking quality like major search engines do. Regardless, looks like a useful enhancement to the mainstream search engines.

Crawling / Indexing:

80legs.com - Houston TX based utility computing service for web crawling. Sort of a "regex for the web." They simply do web crawling (which can be constrained to specific domains) + content filtering / classification based on a filtering function supplied by the customer. The value add is that customer does not need to worry about parallelism or building a web scale crawler. This is useful, for example, for verifying ad placement by looking for specific html tags / JavaScript, or for looking for copyright violations. They do not provide indexing or ranking, so kind of a niche application. Not openly launched yet.

Yahoo! BOSS - by far the most mature, scalable and polished product presented, it is a whole white label search engine. Yahoo! BOSS provides its users with a ranked set of web, image, or news results from the Yahoo! index, and gives developers flexibility to re-rank and display results as they see fit. Interesting new features are planned for the upcoming releases. Read more about it on developer.yahoo.com/boss

Overall, an interesting bunch of products. It is clear that it is challenging for small web search engines to succeed and gain traction in a market dominated by Google. The three big search engines (Google, Yahoo, Microsoft) have quite a lead in terms of talent, know-how, and infrastructure compared to these startups. I am curious to see how some of these smaller players will develop and innovate in the future.