Going Nookular

Is the Barnes & Noble Nook better than the Amazon Kindle? Yes, at least right now. I bought one recently and I love it.

Here is what is great about it:

  • looks great, sexier than all current Kindles

  • books are EPUB, an open book format also supported by most other readers

  • good native support for PDF; adding files is as simple as drag-and-drop

  • 3G + Wifi, with great shopping experience from the reader

  • color LCD touch screen for navigation + soft keyboard

  • great screen contrast, especially outdoors

  • removable battery

  • borrow digital books from the local library in EPUB or ADE-PDF – I already checked out several from the San Francisco Public Library while sitting at home, and read a couple

  • lend books to friends for 14 days; all they need is to download the Nook software on their PC or phone

  • read any full ebook when inside a B&N store for free, and it remembers where you were when you come back

  • ebook prices usually same as Amazon, and no sales tax

  • decent web browser

  • plays music through speakers or headphones, so it supports audio books or you can listen to music while you read

  • Android OS – some people have hacked it to load Pandora and Google Reader apps

  • great accessories – I got a pretty cool cover which also keeps it upright for reading on a table

  • in-store specials like free Godiva chocolates last weekend and free downloads

 

The Nook solves most of the issues I wrote about in my earlier post about the Kindle. In fact, I think the Kindle 3, which is coming out this week, is Amazon's response and catch up to the current Nook. Overall, the Nook feels like a cross breed between a Kindle and an iPhone (with its color LCD and soft keyboard).

Of course, not everything is perfect. The book selection is a bit smaller than that at Amazon; not a huge difference, but noticeable. Still, a much wider selection than Apple's iBooks. Also, battery life is much shorter than on Kindle. I am trying to gauge how long it lasts, but it looks like it may be half of what Kindle has. At least it has removable batteries so I can pop another one in when stranded in the jungle in the middle of a novel!

I am pretty happy with my Nook and I am going to keep it. I am sure that in the future most ebook reading will be happening on smart phones and tablets – users will just load the Kindle, Nook, or iBook apps on them. For hard core readers, however, a dedicated device is a must these days.

Those who say “I prefer the feel and smell of real paper” just haven't experienced how much better a digital reader really is!

 

The Real Time Web: Imperative or Insanity?

Realtime-insanity-scaled
I just came back from tonight's panel discussion at Stanford GSB, organized by MIT/Stanford Venture Lab (VLAB). Truly inspiring event, and the Bishop Auditorium at Stanford was packed. Bit.ly was the star of the night, represented by their co-founder Todd Levy whom I enjoyed meeting again.

Who was on the panel:

  • Andreas Weigend - moderator, lecturer at Stanford, former Chief Scientist at Amazon - he kept us entertained with his witty humor
  • Todd Levy - cofounder of leading URL shortener and social analytics engine Bit.ly
  • Jan Pedersen - Chief Scientist for Bing search, ex-Yahoo :)
  • Kevin Burton - founder and CEO of Spinn3r and co-creator of RSS 1.0
  • George Zachary - Partner at VC firm Charles River Ventures, invested in Twitter, Yammer ("corporate twitter"), GupShup ("the Twitter of India"), advisor to X PRIZE

The discussion opened with some cool stats, such as what happens each minute on the web: how many blog posts go live (mostly in Asia!), 100,000 bit.ly links clicked, 500,000 items shared on Facebook. See the photo in this post. Andreas has a wonderful way of presenting things, straight from what seemed a Microsoft Office document, and editing what's on the screen as the discussion went on.

Todd shared the history of Bit.ly, which fascinatingly enough started with a handful of employees and to this day has a single digit number of engineers. Their head of science is a woman (this came as an answer to the inevitable question why so few women are in tech). Bit.ly grew out of the needs of other startups getting incubated at Betaworks in NYC - they needed a URL shortener, and they needed to track social analytics, so why not kill both birds with one stone? Today bit.ly offers free analytics, and a premium $995 per month Pro package for companies. I have been in touch with Bit.ly for a long time, a huge fan of the company and the team, and plan to look into (read: buy) their Pro offering soon.

There was an interesting discussion about social implications of the real time web. Does the ability to output bite-sized pieces of info in real time, and keep a bunch of people updated on where you are and what you are doing actually diminish the need for real face to face human contact? In the past you needed to catch up with people, talk at length, and nurture a personal connection. Now you just check on Facebook and voila, you learn all about their last vacation or new baby. Jan made the point that services like Facebook make it much easier to keep in touch with people with whom you normally would not, such as acquaintances or remote friends, and so those relationships are strengthened. Todd made an interesting observation: in the early days of the internet when email was king, you basically had long-latency asynchronous communication. Now with low latency protocols and links, you have real time, sort of what face to face communication is, albeit digital. So we are coming back to the synchronous more direct communication that we lost when email appeared. Interesting...

George Zachary was passionate about real time social services disrupting outmoded corporate and government hierarchies (think Yammer, but also the public services). The Coase theorem states that firms and other organziations exist to lower the cost of transacting business, but these days meeting and bureaucracy make it harder to get things done in-house than if you were to use IM or call someone outside the organization. George expects profound social structure changes in the next 20-30 years as a result of technology allowing us to communicate and express ourselves in real time globally.

Privacy was suprisingly little mentioned. Jan Pedersen spoke about Bing's reluctance to show public Facebook updates which appear to be intended for private use based on some sort of automated analysis (or perhaps he was describing a policy choice, not clear). Here I was reminded of the public expression freedom that Twitter allows in comparison. That, and how I could easily tweet (and did) using SMS with just one bar of phone signal reception, while posting to Facebook on my iPhone would have been a drag (and although it can be done via SMS, who does it?)

The event was superb, one of the best I have attended in recent memory. The handout they gave to the audience had things like a chart of the real time web space (who the players are in the various niches), some interesting analytics of the growth in content over the past year, and definitions of terms like Pubsubhubbub. Kudos to VLAB, and I look forward to attending more of their events!

Twitter Chirp Developer Conference

In the past two days, I attended Chirp, the first developer conference organized by Twitter. It was a fun event with plenty of familiar faces and interesting new announcements.

 

What was most interesting?

Annotations were by far the most intriguing concept mentioned at Chirp. Being able to attach up to 2 KB of metadata to each tweet has set the developer blogosphere on fire.

Here is a great and rather technical post by Marcel Molina from Twitter about annotations: http://groups.google.com/group/twitter-api-announce/browse_thread/thread/fa5da2608865453

Dave Winer predicted that annotations would be the most exciting thing at Chirp, and has eloquently explained some product ideas here: http://www.scripting.com/stories/2010/04/09/howTwitterCanKillTheTwitte.html

It looks like most people are thinking that annotations will just add another 2k bytes to remove some of the limitations of the 140 characters – for example, removing the need for URL shorteners, or at least removing the need to dereference a short URL. Or using the 140 char tweet as a title / summary, and attaching more text in the annotation. Or adding richer geolocation tags in the annotations.

I am personally more intrigued by the possibility of using opaque human-unreadable binary annotations which turn Twitter into a real time message bus. For example, imagine traffic lights tweeting their status (red, green, maybe even congestion estimate via traffic camera), and your new GPS receiving tweets from traffic signals in the vicinity, decoding them, and deciding which route will be fastest. No need for you to personally read such tweets, right? Or you can have encrypted annotation payloads which can be a way to broadcast premium content, while possibly keeping the title in the tweet for everyone to see. By the looks of it, though, Twitter probably wants all annotations to be in clear text and usable by the whole community by developers, discouraging building proprietary advantage.

 

@anywhere and Yahoo!

There is no doubt that the most ready-for-primetime new Twitter product discussed at Chirp was @anywhere: http://dev.twitter.com/anywhere

The hack day on Day 2 was also largely focused on this. Yahoo! announced several integrations with @anywhere which will no doubt be among the most visible on the web.

Here is a presentation and video by Cody Simms, head of the Yahoo! Developer Network: http://developer.yahoo.net/blog/archives/2010/04/a_report_from_chirp_twitters_developer_conference.html

 

What about Monetization?

Sponsored tweets sound interesting. However, it seemed like the concept is far from ready or well thought out, and here are a number of issues with it: Tweets will be sold on CPM, per impression model. They will initially be available on search.twitter.com only, and that is not a big destination; most of the action still happens in third party Twitter client applications. The ad quality metric called “resonance” (which also seems to be the overall tweet relevance metric developed by Twitter) is rather fuzzy and unrelated to click/action performance, and there is no behavioral or demographic targeting. Basically, ads on Google or Facebook still sound a lot more compelling and better targeted. Which is not to say there isn't a lot of potential for sponsored tweets, but I was hoping a lot more of that to be spelled out at the conference.

 

How to make Chirp better

It seemed like the conference this year was all about Twitter and what they want developers to hear and do (i.e. adopt @anywhere and think of annotations ideas). It felt too much like a PR event for Twitter. It would have been better to hear directly the voices of the ecosystem. By that I do not mean just Q&A using hashtags on Twitter. Unlike what Microsoft does with their developer conferences for example, there was no dedicated space (booths etc) for partners and developers to show off their applications and talk with potential partners and customers. There were random connections and conversations for sure, but there are a ton of great companies building products around Twitter that I wanted to chat with but could not – and I am sure they were actually at Chirp! Perhaps if the venue were larger and had an area with tables where attendees could post signs and logos, this already would be a major improvement. That said, Twitter did an awesome job organizing the conference this year and getting the developers excited, and I am looking forward to their next developer event!

Yahoo Search integrates Twitter content for breaking news

Here is something exciting we launched last night: for breaking news topics, search users will see a module on top of the page which combines official news sources, photos, videos, and tweets. This is probably the most visible and useful integration of Twitter content in a major search engine until now, but I may be biased :)

The new module looks like this:

Media_httpfarm3staticflickrcom27474117827062944acb49ccopng_uckehjlhgedphah

See the tabs on top of the page right below the search box?

Here is what it looks like when you click the Twitter tab:

Media_httpfarm3staticflickrcom256641178270082e3256c189opng_snxxchodivvrrxf

Here you see a couple of relevant tweets and a couple of related videos shared recently on Twitter.

How does this work? To decide to show this news experience in the first place, we use an algorithm to identify hot spiking queries. When it comes to selecting which tweets to show, we use a ranking algorithm which takes into account not only how recent the tweet is, but also how popular the author is, URL link that may be shared in the tweet, linguistic analysis, and some other secret sauce. The algorithms are not perfect, but they seem to work reasonably well and we are always looking for ways to improve them.

I want to thank the team that worked on this, and especially Gilad, Nitzan, and Shiv, without whom this would not have been possible.

Stay tuned for more interesting search product launches along these lines in the future!

The Nigma Search Engine: Russia's Answer to Powerset and Wolfram Alpha?

Today I discovered Nigma.ru -- a highly innovative and brand new search engine developed by a small team in Russia. Check it out: www.nigma.ru

Unfortunately, if you don't speak Russian, it will be hard to figure out what they do :) They extract structured data from Wikipedia and other sites (e.g. shopping sites, music sites, etc) and aim to give you an "instant answer" as you type, or on top of the results page.

Here is an example where they solve a quadratic equation for you:

Here is an example where they solve an inorganic chemical reaction:

And now an even more complex chain reaction:

Media_httpfarm3staticflickrcom259237309733708a4cf9cf86jpg_iribobxmacpptsn

Super smart search suggestions as you type are where Nigma excells: they give you the direct answer as you type, if possible. The screenshot below says "capital of b" and the dropdown suggest meny says "capital of brazil ANSWER: Brasilia", "capital of bashkiria ANSWER: Ufa," "capital of bulgaria: ANSWER: Sofia" etc. Naturally, you can directly click on the links and learn more about Sofia and Brasilia without having to stop at the intermediate search result page.
 
Media_httpinfonigmaruuploadsimagespicturenewsnews200905131jpg_vsnatwrdmabfkia

If you type the query "apple inc", you will see the drop-down below. It tells you that Apple is a public company, its revenues for the last year, notable people (Steve Jobs and Wozniak), etc. Pretty cool.

Media_httpinfonigmaruuploadsimagespicturenewsnews200905132jpg_cewvpaiiontlsob

Another cool feature is price comparison right in the suggest box. (BTW, comparison shopping engines seem to be popular in Russia to begin with.) Here, the user types "note" as in notebook computer, or laptop, and the menu shows prices for popular laptops, with min, max, and average prices:

Media_httpinfonigmaruuploadsimagespicturenewsnews200905282png_rnlbprosgavfdfz

You can also look up restaurants, and receive structured and well annotated results back, similar to Yahoo's Search Monkey, like in this example for "Sup Cafe phone number":

Media_httpinfonigmaruuploadsimagespicturenewsnews200905192png_cmjckztnjbelevx

Music search is very impressive too. Here is an example of a result for "modern talking":

Media_httpinfonigmaruuploadsimagespicturenewsscrmusocvypng_jbtccbpfafnzesm

You see what Google calls OneBox, and Yahoo calls Wow module: a large direct result on top of the page, which combines photos, album information, play button, lyrics, and even a link to directly download the songs (which in Russia, being a free country unencumbered by copyright issues and RIAA, is free):

Media_httpinfonigmaruuploadsimagespicturenewsscrmusocvypng_jbtccbpfafnzesm

Pretty cool, huh? Of the major search engines, Yahoo! leads in music search, but Nigma may have leapfrogged everyone here. Nigma also has a similar and pretty awesome book search, which finds all sorts of metadata and links to an electronic version of the book or an online bookstore to buy a paper copy. Russians are avid readers, and Nigma has made a great product for them.

Nigma claims that they are working with Hector Garcia Molina from Stanford. It is a verified fact that they keep adding new and cooler features just about every week! Everything they do is built upon structured data.

Unfortunately, Nigma's current product is almost entirely in Russian. They are a company to watch! Given how much more structured data there is in English, I would not be surprised to hear more about them soon in the US as well.

 

Real Time Web & Real Time Search

There has been a lot of hype lately about real time web and search, and total Twitter mania. I have been working on these issues inside Yahoo! Search lately, and here are some of my observations.

News is published in real time now. By news I mean both traditional, global/official stories, and local/socially relevant updates. RSS feeds from news sites, blogs -- these get updated on the web as soon as some newsworthy thing happens and the writer finishes his article. Your friends and family update their Facebook and Twitter status all the time and upload photos straight from their phone. Newspapers can't really compete with the web in providing information this fresh, which is a big part of the reason for their demise.

However, news is not consumed in real time. In fact, it is not humanly consumable in real time. Unless you are a news junkie or a day trader, you do not look up every single story that goes on the wire. This is where Twitter breaks down, at least today. It is like a stock ticker. There is way too much content, and old stuff gets buried by the new, even if some of the older pieces are actually more relevant and interesting. Facebook also suffers from some of this information overload.

So how does one follow real time web updates? One example are mainstream news sites, like Yahoo! News or CNN.com: they highlight the most important stories on the front page. This browse model works pretty well for sites that have human editors who digest the stories and keep the good stuff for you to read. Even these editors, however, use the other methods below to determine what's hot. A key observation about news: there are not that many original and interesting events (core news stories, as it were), and most stories are duplicates (based on Reuters / AP syndicated feeds). Likewise, a huge proportion of Twitter traffic simply refers to these core news stories; it is not news by itself.

Real time search is the best way to consume the latest updates. Search does not mean that you actually type a query into a box. The same algorithms that rank results when you enter a query can also be used to keep news contents fresh and ranked by relevance when you go to a news site or your Facebook home page.

However, there are challenges to building good real-time search. The main one is just the speed at which new updates are published, especially on sites like Twitter, which calls its feed the Firehose, with good reason. Normal web crawling techniques are too slow -- they leisurely construct a list of links and relationships in order to determine popularity and PageRank, and by the time a brand new web update has enough PageRank, it may be too stale to be relevant. The way forward is to ingest most of the fresh new content through feeds with a push mechanism. The search engine indexes the feed instead of trying to discover and crawl the content. As for ranking, the level of duplication is a good proxy for popularity, and so are number of clicks, appropriately weighted over time. Another important metric is authority of the content publisher. Not all Twitter users are the same; a Tweet from President Obama or CNN is probably more interesting than thousands of unknown authors chatting about the same topic. Likewise, you don't care equally about all your Facebook friends and what they are up to. Last but not least, real time spam seems to be a growing problem, and identifying its sources and patterns is more akin to the techniques for email spam fighting (which tends to be real time and message based) than the traditional web spam and link farming.

So the challenge is on -- let's see who will build the best real time search experience!

 

on url shorteners

Joshua (of Delicious fame) wrote today about the dangers of URL shorteners (see joshua.schachter.org). Dave Winer (of RSS fame) agreed with him (see http://bit.ly/P76VL).

Philosophically, of course, they are right. URL shorteners add a step of indirection, and when they fail, it can be rather annoying. Short URLs can also be security risks because you do not know where they lead, whether the landing page is trustworthy or some scam. And they can be plain confusing to non-techie end users.


However, there are practical considerations that mitigate the problems above:

Any search engine worth their salt can handle redirects, and will handle short URLs right as they become more prevalent. Plus, given that we search by keywords in the content, rather than URL itself, search engines will always be able to surface what you are looking for as they do today, regardless of whether the shortener is down or not.

Browser anti-phishing features and antivirus programs today handle unsafe links, detect spyware, etc. In fact, given usually we click on anchortext without checking or seeing the destination URL, this is not much different than short URLs. This level of protection will only get more comprehensive in the future.

As for user friendliness: these URLs usually look ugly and inscrutable, but I bet users will learn to live with them. In many occasions the site that display them could hide the URL and just say "link" or something visually friendlier. Another option is to pre-expand short URLs, either server side or through a browser extension. No big deal.

Philosophically speaking, short URLs are used for bookmarking ephemeral content. For quick sharing of huge URLs in email and now increasingly on Twitter as well. For sharing things in the now, relevant in the present. How often do you revisit ancient email or random thoughts on Twitter? It does not matter in fact if those links stop working -- over time, this old content becomes irrelevant anyhow. Most of it was short-lived to begin with.


In conclusion: short URLs are here to stay, and if anything, they should become easier to use, straight from the browser.

AltSearchEngines Day 2009

This Monday I attended the Alt Search Engines conference in SOMA in San Francisco. Many thanks to Charles Knight of altsearchengines.com for organizing this interesting event! Clearly there are quite a few startups who are unhappy with the status quo of search = ten blue web links on white page :)

The definition of "search engine" appears to be very flexible based on how diverse the participants were. Few actually did the whole spectrum of crawling, indexing, ranking, and UI/presentation. Most startups focused on some vertical or functional piece, such as UI, crawling, semantic analysis, etc. Also, the level of sophisticationn and polish of the products varied quite a bit, and many were alpha or pre-release even.

Here are some of my notes and observations:

Real Time Search:

Demo by Collecta.com. They crawl blogs, blog comments, forums, and even online chat rooms through chat bots. Like Twitter search, results are chronologically displayed (this is the primary ranking function) on a live Ajax-y results page. A nice touch (on an admittedly pre-release UI) was to show a ticking digital clock close to the search box, and a time stamp next to each result to highlight just how fresh it is. Interestingly enough, their approach is to ingest (crawl) live content and show it immediately on SRP via simple string match (grep) because their indexing is not actually real time. I expect there will be a sacrifice of precision (relevance) and probably coverage / recall since content classification, filtering, query expansion, ranking, etc are not easy to do real time. I hope their final product exceeds these expectations!

Image and Visual Search:

Imprezzeo - search stock photography images, 15 million of them. Showed search based on metadata. After searching for "car" or "mountain stream", you can select several results and then "refine search" which also seems to be basically matching the largest subset of metadata (stock photos are heavily tagged). Claims that they can also search based on similarity in image data itself (computer vision analysis). Biz model is to license technology / service to others instead of being a consumer destination.

GazoPa - Japanese image search engine developed by Hitachi (huge JP corporation). Image similarity search. Draw something and GazoPa will find it! Gave example of drawing a blue t-shirt (input with mouse into something similar to MS Paint), and then a number of product photos come up as results. They see product search as a way to monetize. Reminds me of Like.com, formerly known as Riya, which is a more mature company and product in the same space.

Viewzi - visual search with multiple views and presentation layouts on the results page, depending on vertical. Think of themselves as a discovery engine. Powergrid view of SRP: when looking for U2 albums, they query Amazon, get all the image info, and upon click, you can see larger images, listen to tracks (from Amazon) etc. Looks at related tags and metadata and creates a 3d tag and image cloud SRP, e.g. U2 -> Bono -> Edge, etc. Many SRP views based on vertical, e.g. for celeb photos it looks more like a collage. B2B for media properties and sites that spices up their site or company specific SRPs. Gave example with Chris Pirillo's site whose search box is powered by Viewzi. Customer site can specify how to rank and what back ends to query etc.

CoolIris - more of a presentation layer than true search engine. For example, it can display an unlimited (in size) 3d wall of images as a results page. Ranking and results same as Google's, but presentation is different, and you do not need to be clicking next, etc. You an also view images as a slideshow or other views. Also searches videos with a cool browse experience on the SRP. They have "discovery" experiences where they show fresh results from RSS feeds from sports news sites, etc, with 15-30 min latency, near real time. Nice eye candy!

Pixsy - visual search and content syndication company. Looks similar to and competes with Blinkx. Pixsy aggregates and distributes visual content from sources like People.com, HBO, BBC, Bloomberg, Getty Images, etc. Distributes to Lycos, Rediff, NameMedia, etc. White label video search is not a good revenue model according to them. Video syndication more interesting and lucrative than search in that regard. Images are hard to monetize too, but they think high res premium images from stock libraries, displayed with bottom overlay ads, may work in the future.

One thing that came across was that monetization for visual search is a challenge at this point.

UI and results presentation:

SearchMe - view result pages like iPhone shows music album covers, by animated scrolling. They have been around for a while. I do not think this slow browsing is a very compelling user experience on a PC. They mentioned TV and other "lean back" form factors and experiences. Might work for that?

Tazti - PC-based client for voice powered search, developed by a self-funded mid-West startup. Basically, they use voice commands like "search X: Y" where X is name of the search engine (besides the big 3, they offer some alternative and more offbeat ones), the colon is a beep prompt from the system, and Y is the query text (which could also be spelled out letter-by-letter). They are purely an alternative form of input to established keyword based search engines. There is no voice indexing, transcription of content, ranking changes, etc.

Kosmix - modular search result pages with rich content based on search topic. Good for exploring a topic if you want to learn about it. Example they gave: pop art. Results include definitions, forums (with indication of level of activity), answers, images, etc. Also shows a rich set of related queries and topics, "related in the Kosmos". Does disambiguation for queries like "apache" or "apple". If apple = food/fruit, show recipes, if technology related, show blogs, etc. For tourist destinations, it shows you things to do, maps, events, restaurant and hotel reviews, etc. Kosmix claims they have the most extensive taxonomy on the web. They use it to find related content and for what types of content to display and how, which they claim makes them better than many/most meta search engines. Launched in December 2008, English language only.

SurfCanyon - IE or Firefox plugin which reranks Google / Yahoo / MSN results on the fly based on user behavior. After user clicks a result, the rest of the results are reranked based on inferred intent. If you click Next on page 1 to go to SRP page 2, it is already re-ranked by Surf Canyon to include results that normally may be on page 24 etc. Claimed increase in average number of clicks on second result page by 30-40% compared to control set. However, they do not distinguish between good / bad, or long / short clicks, and do not have any other form of human evaluation of user intent or result relevance, nor do they measure ranking quality like major search engines do. Regardless, looks like a useful enhancement to the mainstream search engines.

Crawling / Indexing:

80legs.com - Houston TX based utility computing service for web crawling. Sort of a "regex for the web." They simply do web crawling (which can be constrained to specific domains) + content filtering / classification based on a filtering function supplied by the customer. The value add is that customer does not need to worry about parallelism or building a web scale crawler. This is useful, for example, for verifying ad placement by looking for specific html tags / JavaScript, or for looking for copyright violations. They do not provide indexing or ranking, so kind of a niche application. Not openly launched yet.

Yahoo! BOSS - by far the most mature, scalable and polished product presented, it is a whole white label search engine. Yahoo! BOSS provides its users with a ranked set of web, image, or news results from the Yahoo! index, and gives developers flexibility to re-rank and display results as they see fit. Interesting new features are planned for the upcoming releases. Read more about it on developer.yahoo.com/boss

Overall, an interesting bunch of products. It is clear that it is challenging for small web search engines to succeed and gain traction in a market dominated by Google. The three big search engines (Google, Yahoo, Microsoft) have quite a lead in terms of talent, know-how, and infrastructure compared to these startups. I am curious to see how some of these smaller players will develop and innovate in the future.