29 January

Service storage and bandwidth cost calculator

This page allows you to simulate the costs of storage and bandwith for a simple web service like Facebook or Snapchat. For more info on why this might be interesting, please read my post about infrastructure in ephemeral networks.

The defaults are based on data for Facebook based on a number of publicly available data. The system is assumed to consist of a set of people that upload media (photos, for this analysis). In order to model growth, I assume that the number of items uploaded grows according to an exponential function. The default values have been fit using publicly available data about facebook from between 2009 and 2012.

These media are then later assume to be consumed by other users of the system. In the case of services-with-history (e.g. FB), I model the peak and average QPS of data as a fraction of the total amount of data cumulatively stored. In the case of ephemeral networks (e.g. Kik, Snapchat), we model storage and bandwith by estimating the fanout of each message, and the number of messages that are never received (leftover messages are assumed to be stored forever).

Storage and bandwidth costs are estimated based on costs on EC2 and other similar systems.

Feel free to play around with the numbers.

The code is available on github: Web service storage cost simulator.

Questions or comments? Use twitter to reach me: @vijayp (Vijay Pandurangan)

11 June

Colours in movie posters since 1914

Edit: Buy the movie poster hues (1914-2012) poster

A couple of weeks ago, I was having brunch with Kim-Mai Cutler — we were discussing the new startup I’m building in the enterprise space (if you’re a ui/ux person or awesome engineer looking for something fun to do, drop me a line!) — and I mentioned how I felt that most movie posters these days were very blue and dark. She didn’t fully believe me and challenged me to prove it. I looked around, and found some people had done this with a few posters over the last few years, but I became curious about the longer-term trends and what they would show. So, as any engineer would do, I wrote some code! (The code is open source and lives on github: image analysis.)

Edit: this post is up on Flowing Data, an awesome data visualization blog, YC Hacker news!, and Gizmodo. I will be doing a follow-on post with much better analysis and much more data. Follow @vijayp on twitter and stay tuned!

Visualizations:

The number of posters I was able to get varied based on the year:

I first made a unified view of colour trends in movie posters since 1914. Ignoring black and white colours, I generated a horizontal strip of hues in HSL. The width of each hue represents the amount of that hue across all images for that year, and the saturation and lighting were the weighted average for all matching pixels. Since HSL has a fixed order, comparisons can be made between years visually. (You can buy the movie poster hues poster here.) Click on the image below for a more detailed view:
 

Next, I made a similar unified view of  generic colour trends in movie posters since 1914, but here lightness and saturation are both ignored. This makes the distribution of hues much more clear, but hides the average “darkness” of the photos.
 

Finally, I have created a pie chart representing the colour distribution of a specific year’s movie posters. (This should probably be animated and a line graph, more on that in the future work section)

Rationale:

First off, it is true that movie posters are much more blue, and much less orange than they used to be. QED :) This page also talks about the blue/orange colours in movies.

This does appears to be a steady trend since 1915. Could this be related to evolution in the physical process of poster printing; what’s the effect of the economics and difficulty of producing posters over time? I also wonder whether moviemakers have become better at figuring out the “optimal” colour distribution of posters over time, and whether we’re asymptotically approaching some quiescent distribution.

I was a bit concerned that some of this might be due to bias in the data: some movies would be over-represented in the intra-year average (remember that some movies have multiple posters and I normalize over posters, not movies). I think this is not actually a huge issue because it’s reasonable to assume that a movie’s marketing budget is roughly proportional to the number of posters that it has produced for itself. This means that the skew, if any, would be similar to the perceived average.

I presented these preliminary data to some friends of mine who are more steeped in the world of graphics and arts. Cheryle Cranbourne, (she used to be a graphics designer and has just finished a Masters in interior architecture at RISD) had a number of good thoughts:

[Edit: I had misquoted this earlier] The movies whose posters I analysed “cover a good range of genres. Perhaps the colors say less about how movie posters’ colors as a whole and color trends, than they do about how genres of movies have evolved. For example, there are more action/thriller/sci-fi [films] than there were 50-70 years ago, which might have something to do with the increase in darker, more ‘masculine’ shades.”

This is backed up a bit by data from under consideration’s look at movie posters. They didn’t go back very far, but there did seem to be a reasonable correlation between movie age rating and palette.

She also pointed out that earlier posters were all illustrated/ hand painted, with fewer colors and less variation in tone. Perhaps the fact that white and black have become more prevalent is due to the change from illustration to photography. Painted skin might also over-represent orange and under-represent other hues that happen in real life.

Methodology:

I downloaded ~ 35k thumbnailed-size images (yay wget — “The Social Network” inspired me to not use curl) from a site that has a lot of movie posters online. I then grouped the movie posters by the year in which the movie they promoted was released. For each year, I counted the total number of pixels for each colour in the year. After normalizing and converting to HSL coordinates, I generated the above visualizations.

Inspirations:

I was inspired by Tyler Neylon’s great work on colour visualizations. I ended up writing my own code to do these image analysis visualizations, but I will try to integrate it with his work.

Future work:

There’s a bunch of stuff I still have to / want to do, but since I’m working on my startup, I don’t really have much time to focus on it right now. Here’s a long list of stuff:

  1. Follow up on all the open questions about the reasons for this change.
  2. Use other metadata (not just year) for movies to search for patterns. A simple machine learning algorithm should suffice if I throw all the attributes in at once. This should be able to highlight whether genre is important, and what other factors are crucial
  3. “main colour” analysis. I should run some kind of clustering (as Tyler does in his code). His code uses a handwritten (?) k-means clustering algorithm, which is a bit slow when faced with thousands of pictures worth of data. There are some faster albeit slightly less accurate versions that I could use.
  4. I need to move the pie charts to use gcharts js api, so they’re interactive
  5. I should probably make nicer/fancier js onhover stuff
  6. I should look at Bollywood and other sources to see whether this holds across countries.
  7. My visualizations and javascript aren’t so good. I have to learn how to do this stuff better!
14 January

Android calendar syncing is broken for me!

Posted by in code, software | One Comment

For the past couple of weeks, (shortly after my nexus s upgraded itself to ICS), the calendar on my phone has not been syncing with Google. This has required me to use the calendar website on my phone, which is not a pleasant experience at all. So today, I hooked my phone up to my computer and decided to do some debugging. Using adb logcat, I found this stack trace:

E/AndroidRuntime(15353): FATAL EXCEPTION: SyncAdapterThread-2
E/AndroidRuntime(15353): android.util.TimeFormatException: Parse error at pos=2
E/AndroidRuntime(15353): at android.text.format.Time.nativeParse(Native Method)
E/AndroidRuntime(15353): at android.text.format.Time.parse(Time.java:440)
E/AndroidRuntime(15353): at com.android.calendarcommon.RecurrenceSet.populateContentValues(RecurrenceSet.java:189)
E/AndroidRuntime(15353): at com.google.android.syncadapters.calendar.EventHandler.entryToContentValues(EventHandler.java:1138)
E/AndroidRuntime(15353): at com.google.android.syncadapters.calendar.EventHandler.applyEntryToEntity(EventHandler.java:616)
E/AndroidRuntime(15353): at com.google.android.syncadapters.calendar.CalendarSyncAdapter.getServerDiffsImpl(CalendarSyncAdapter.java:2223)
E/AndroidRuntime(15353): at com.google.android.syncadapters.calendar.CalendarSyncAdapter.getServerDiffsForFeed(CalendarSyncAdapter.java:1954)
E/AndroidRuntime(15353): at com.google.android.syncadapters.calendar.CalendarSyncAdapter.getServerDiffsOrig(CalendarSyncAdapter.java:945)
E/AndroidRuntime(15353): at com.google.android.syncadapters.calendar.CalendarSyncAdapter.innerPerformSync(CalendarSyncAdapter.java:417)
E/AndroidRuntime(15353): at com.google.android.syncadapters.calendar.CalendarSyncAdapter.onPerformLoggedSync(CalendarSyncAdapter.java:302)
E/AndroidRuntime(15353): at com.google.android.common.LoggingThreadedSyncAdapter.onPerformSync(LoggingThreadedSyncAdapter.java:33)
E/AndroidRuntime(15353): at android.content.AbstractThreadedSyncAdapter$SyncThread.run(AbstractThreadedSyncAdapter.java:247)
W/ActivityManager( 153): Force finishing activity com.google.android.calendar/com.android.calendar.AllInOneActivity
V/CalendarSyncAdapter(15353): GDataFeedFetcher thread ended: mForcedClosed is true

Thanks to Evan I was able to clone the git repo for the Calendar app (https://android.googlesource.com/platform/packages/apps/Calendar.git) , and spent some time today trying to track down this bug.

Unfortunately, the buggy code is in calendarcommon, which isn’t included as part of the git file, and is actually nearly impossible to find. At any rate, with some more digging, the closest I could get is the code here

http://git.insignal.co.kr/?p=mirror/aosp/platform/frameworks/opt/calendar.git;a=blob;f=src/com/android/calendarcommon/RecurrenceSet.java

I think there needs to be a try/except block for that whole method (around line 189) that returns false if an exception is thrown. For some reason that TimeFormatException is derived from RuntimeError (!!). The common code doesn’t seem to be installed as part of the calendar app. From quickly looking at the code, It appears as if it is installed as part of the os and registers itself as the handler for calendar uris.

So if I wanted to fix this myself, I’m wonder whether I would have to fork the code above, and install it as a new handler, then somehow hide the one with the OS? I have to think about this a bit more. The other problem is that since this is a common library, many other calendar apps might suffer from the same exception when they attempt to sync.

In the meantime, I’m going to try to figure out what event is causing this error (not easy since there are no logs that can help me) and/or think of buying an iPhone.

If you know anyone on Android who could help with this, please let me know.

Edit:
I’m downloading the entire android source code, and I think I’m going to try to re-build a patched version of the common code, uninstall the existing common code, and push the new one over it. I’ll update this post with progress …

10 March

KindleFeeds, an RSS reader Andrew and I wrote

Check out Kindle Feeds, a cool little app Andrew and I wrote. It lets you subscribe to RSS feeds, and generates a kindle-compliant “book” that you can download on to your kindle. The book has a table of contents and a nice link at the top which will fetch a new version of the book with only new posts. It’s really quite easy and you can update anywhere you have cell phone coverage.


In case you’re wondering here’s a bit of background:

A while back, Andrew wrote this cool application in python that let you store static HTML pages from the web into kindle docs that you could then download at will. (The old version of this was called Bibliorize and still lives up on www.bibliorize.com. But I’m gonna merge the two this week sometime and maybe find a new home for it somewhere.)

This seemed awesome, but had a few drawbacks; the main one that you had to manually add each page you wanted to read. That’s a lot of work for a lazy person like me, and not so useful when most of the things I read come from RSS(/atom/whatever) feeds. So, before a recent flight to SF, it dawned on me that the solution was to expand Andrew’s app to allow users to subscribe to RSS feeds. Also, this app needed to be able to automatically update nicely, so that no feeds were repeated. Updating OTA was cool, of course. I wanted to remove the need for users to log in on their Kindle, a painstaking and error-prone process. So I spent the six or so hours coding (coding without access to the Internet is surprisingly difficult!). And with a few more hours later, I think this app does all that!

There are a lot of bugs and things to improve though, not the least of which is the hideous colour scheme. We also need to pretty up the handling of dates and make the table of contents a bit more usable. Also we have to settle on a name for it.

Stay tuned!!