Archive for the ‘The Science of It’ Category

A Quick Idea – Image Sensor Based on Time-to-saturate

Apologies (as always) for the infrequent updates to this blog. This semester has been a lot rougher than in the past, so I don’t know if I’ll have time to post anything more until the end of break.

I had a quick idea I wanted to jot down, and I haven’t found anything on it. I feel like someone out there must have thought up something similar already, or it’s already in the works at some black lab of a sensor company or something.

The idea I have is an image sensor that measures light intensity based on time-to-saturate – the time it takes for a particular photowell (representing a pixel) to saturate to its maximum capacity. The concept I’ve come up with has some interesting theoretical advantages in dynamic range over conventional photon-counting designs used today.

Imaging today – photon counting

First, a layman’s overview of how the conventional photon-counting design works in today’s sensors:

The sensor is a light sensitive device, and whenever photons come into contact with it, they are absorbed and a proportional number of photoelectrons are “knocked out” by the photon energy and collected in a photowell. From this photowell, a voltage measurement is taken, and this ultimately translates to a brightness value in the resulting image. In essence: Image brightness ∝ voltage reading ∝ electrons collected ∝ photons collected.

When taking an image, there is a set exposure duration, often referred to as a “shutter speed” in photography terms. This defines the time when a sensor is exposed to and detecting light – the exposure starts, light hits the sensors, exposure stops, and then we count the photons.

A limiting factor in this design is the photowell capacity. The number of electrons that can be stored in a well is finite, and once the photowell capacity is saturated, any additional electrons are not collected and hence the photons they correlate to are not counted. On the flipside, there is also a noise floor, where enough electrons must be gathered to produce a signal that is discernible from the random signal variation due to various forms of dark (thermal), electronics (read), and shot (photon) noise.

These two attributes lead to a problem of dynamic range – in scenes where light intensity differs greatly between the darkest and brightest areas, the sensor is simply unable to measure the full range of brightnesses and must cap measurements above and/or below a certain threshold.  This leads to the “blown highlights” and “crushed shadows” attribute often found in photos of large dynamic range scenes.

Time-to-saturate

The idea behind a time-to-saturate sensor is fairly simple. What we aim to measure in an image is light intensity – the flux of photons per time per area. The area is cancelled out of the equation by the photosite corresponding to a pixel being a certain area, so the measure we are really after is photons per time, for each pixel.

With photon counting, we fix a shutter speed (time duration), and then count the number of photons (via voltage measurement of photoelectrons) captured in that span, and use both to derive the intensity:

Intensity = photons / time = photons recorded / shutter speed

In time-to-saturate, the photon count is fixed at the capacity of the photowell, and the variable we measure is the time it takes for an individual well to saturate fully to the capacity.

Intensity = photons / time = max photon capacity / time-to-reach max-photon-capacity

How would the system work exactly? With a time-to-saturate sensor, we use as long a shutter speed as needed to fully saturate all photowells (in a conventional sensor, this is the minimum shutter speed to generate an all-white (max brightness) image). At the moment a photowell reaches capacity, it records a timestamp which will indicate how long it took to reach capacity. Once the exposure is finished, we are then left with a two-dimensional array of saturation times, rather than photon counts. Rather than recording 100k photons at one photosite, and 50k photons at a neighboring photosite where light was half as intense, the readings we get from this sensor would be along the lines of 1 millisecond time-to-saturate for the first photosite, and 2 millisecond time-to-saturate for the second, half-intensity photosite.

Key Advantages

There are two key advantages in our ability to take light intensity readings, both ultimately advancing dynamic range:

  • There is virtually no limitation to the range of highlights we can capture, unlike the limitation imposed by the photowell capacity with photon-counting sensors. In our example, if there was a third photosite which had double the intensity of the first 100k photosite, and was exposed to 200k photons, it would only end up recording 100k photons since this is the capacity of the photosite, and thus both pixels would record the same white (max brightness) value, even though the 200k photosite pixel clearly represents a brighter area in the scene than the 100k photosite. A time-to-saturate measurement, by contrast, would simply produce a shorter time measurement: the 200k photosite saturates in 0.5 milliseconds, which we can compare to the 1 millisecond measurement for the first photosite and clearly conclude that the 200k photosite is twice as bright.
  • Noise levels are reduced to the level of a maximally-saturated photowell. In a photon-counting sensor, any photosite that does not record a max white value by definition recorded a fewer number of photons, and thus produces a sub-optimal signal-to-noise ratio (SNR). Photon or “shot” noise has a standard deviation of the square root of the signal – thus for 100k photons we have √(100,000) = 316.2 photons of standard deviation, and a SNR of N/√(N) = √(N) = 316.2. For 50k photons, however, we have an SNR of √(50,000) = 223.6. In contrast, all photosites in a time-to-saturate sensor reach the max well capacity, and will thus all have the max SNR. This ensures that all photosites record values well above the noise floor, and additionally reduces photon noise for all pixels to the level of a maximally saturated photosite (the 100k photon, 316.2 SNR in this example).

In theory, such a sensor would have an infinite dynamic range – the brightest intensities are simply recorded as short time-to-saturate durations, and enough samples are recorded from the darkest areas to place the measurement well above the noise floor.  This would have huge implications for large dynamic range photography and imaging in general, to be able to record the entire dynamic range of a scene in a single exposure, without having to resort to processing tricks like selective shadow/highlight adjustment or high dynamic range (HDR) blending.

Potential Feasibility Issues

I’m not aware of any sources that have thought of this idea before, but if there are then there must be some large feasibility (or perhaps cost) issues that have prevented its development thus far. The few issues that I can imagine, none of which seem like dealbreakers and none of which would place performance any worse than that of photon-counting methods, in theory:

  • Timing accuracy/precision of photowell saturation. While photon-counting relies on accurate and precise voltage readings from the photowells, a time-to-saturate sensor would need good accuracy and precision in recording time when a photowell reaches saturation. How precise does the time need to be, to equal the theoretical precision of today’s cameras? Taking the contemporary example of a 100k photon capacity photowell, hooked up to a sensor/imaging pipeline with a 14-bit analog-to-digital converter (found on most high-end cameras today), we would need to quantize measurable photon counts into 2^14 = 16,384 steps. 100,000 / 16,384 = ~6 photons, which is the precision we need to be able to measure time-to-saturation by. Most high-end cameras today operate with a minimum shutter speed of 1/8000 second (125 microseconds) – a 100k photowell that fully saturates in this time (this is the maximum light intensity the photon-counting sensor is able to record, under any settings) is thus 100,000 photons/125 microseconds = 800,000,000 (0.8 billion) photons / second.  Finally, we use this intensity along with our 6 photon steps to arrive at 6 photons / (0.8 billion photons/second) = 7.6 nanoseconds. This is the precision with which a time-to-saturate sensor needs to record time by. Of course, depending on the application the numbers can vary – with fewer bits per pixel, we would need less precision (an 8-bit jpeg in this example would need just ~0.5 microseconds of precision), with lower photowell capacity we would need greater precision, and with a larger minimum exposure time we would need less precision.
  • To take advantage of the greater dynamic range capabilities of a time-to-saturate sensor, the exposure duration must be longer than a conventional photon-counting sensor, to capture more light. For static scenes, this is unlikely to be an issue, but for dynamic scenes (e.g. moving subjects), the exposure duration can only be stretched so far before issues such as motion blur or camera shake blur are introduced. At worst, however, the exposure can simply stop after a defined maximum exposure time – at this point any photowells which have not reached capacity simply output a voltage reading like in a conventional sensor – this reading is then used to extrapolate a time-to-saturate which can then be compared with the other photosites. In the worst case, the maximum exposure time is the same as the exposure time in a conventional photon-counting sensor, and would produce an noise level and at least the same dynamic range, if not a greater dynamic range captured in the highlights. For any exposure duration exceeding that of the conventional sensor however, noise levels will be reduced and a greater dynamic range in the shadow regions will be achieved as well.

What do you think?  Any potential pitfalls or feasibility issues I might have missed? I’m especially interested if anyone has come across a source with similar ideas before. Feel free to post links in the comments!

An explanation of Fujifilm’s Super CCD EXR sensor

A look at Fujifilm’s innovative EXR sensor, the latest iteration of its flagship Super CCD sensor, along with some analysis of images from production cameras. Admittedly this would have been more interesting as a speculative piece a year ago, but better late than never

tl;dr: Fujifilm’s EXR sensor is extraordinary, mostly for its dynamic range. If you’re after the best non-DSLR image quality around, your choices start at the Fuji F200EXR, F70EXR, S200EXR, and end there.

Fujifilm has long been a leader in revolutionary sensor technology, particularly at the smaller scale sensor market where the majority of manufacturers have long been content pumping out traditional, vanilla CCD sensors with square grid-based Bayer Filter Arrays.

In September of 2008, announced plans for their latest sensor: the Super CCD EXR, which combines the unique color filter array (CFA) and pixel binning features of various previous sensors into a single “switchable” sensor that can be optimized in one of several areas (which are typically mutually exclusive when designing a sensor): high resolution, high dynamic range, and low noise.

High resolution

High resolution mode is the default mode, which utilizes the full set of photosites on the sensor and produces an image with a corresponding pixel on each photosite – nothing too special here, though Fuji claims the diagonal layout of photosites (as opposed to simple square grid) helps to improve resolution.

High sensitivity

A comparison of a typical Bayer CFA (left) and the CFA on Fujifilm's new EXR sensor (right)

The second mode of operation for the EXR sensor is a high-sensitivity mode which Fuji calls “Pixel Fusion Technology”, which is fancy marketspeak for pixel-binning (combining reading from adjacent pixels together to produce a better signal). With the EXR’s pair-based CFA layout, Fujifilm claims that interpolation (and thus color resolution) will be more accurate because the binned pixels are closer together (e.g. the pair blue pixels are pretty much in the same location, while they’re separated by two pixel lengths in a standard square-grid Bayer array. I don’t know that I buy this argument particularly well – it’s true that same-color pixel values will be more accurate since they’re closer, but you can’t get something for nothing: for example, the average distance from red-to-blue is going to be increased, which lowers accuracy for interpolating blue values at red pixels.

(more…)

The Demosaic Project

So over the past couple of weeks I’ve been working on a little project called Demosaic.  It’s a little online demo that interpolates image data from (simulated) raw sensor output, similar to what almost every digital camera used today has to do.

http://www.thedailynathan.com/demosaic/

(more…)