University of Cambridge

CSaP

The Centre for Science and Policy

An Innovation Odyssey


From Basic Research to the World's Fastest-selling Consumer Electronics Product

The fifth event in CSaP's Distinguished Lecture Series involved Professor Christopher Bishop, Distinguished Scientist, Microsoft Research Cambridge. He delivered a fascinating talk about the relevance of blue-skies, non-mission driven research, which incidentally turned out to be essential for successfully integrating key technologies that would later reach the market shipped in combination with Microsoft's video game console, the Xbox 360.

In his lecture "An Innovation Odyssey: From Basic Research to the World's Fastest-selling Consumer Electronics Product", Professor Bishop presented a case study of Kinect for Xbox 360, the first sensing device that enables a controller-free gaming experience by allowing real-time, full-body motion tracking. To give the audience a flavour of what the technology is about, Professor Bishop opened the talk by running a demonstration of a video game designed for Kinect. However, before moving on to telling the story behind the development of this device, he presented the organisational context that was instrumental in this endeavour.

Microsoft Research was founded 20 years ago in Redmond, USA. The Cambridge laboratory was the first to be opened outside the United States, and it was followed by Beijing, Bangalore and, more recently, Cambridge, Massachusetts. Today, about 900 people are part of Microsoft Research, equivalent to 1% of the firm's total headcount. The Cambridge lab has around 130 permanent staff, a figure that grows to about 200 people in the summer, when new interns join in with temporary projects.

Microsoft Research has three well-defined missions: 1) to advance the state of the art in computer science (basic research), 2) to transfer technologies into Microsoft's businesses (licensing and adding value to the firm), and 3) to constitute a reservoir of technologies for future of the firm. In terms of organisational culture, four aspects define the research ethos of Microsoft Research:

  • There is no top-down managerial guidance. Researchers are independent and are given the freedom to innovate. Microsoft Research is funded by a corporate tax imposed on the company's businesses.
  • Researchers work on projects they choose for 100% of their time.
  • Interaction among the staff is highly encouraged.
  • Academic success and technology transfer carry equal weight.

Part of the overall success of Microsoft Research can be attributed to the fact that Rick Rashid, Senior Vice President, Research, has enjoyed direct access to Bill Gates, thus ensuring a constant exchange of ideas and the support of key decision makers.

Turning to the story of how Kinect came into the world, Professor Bishop stressed the critical role that the Cambridge lab played in the development of this device. Of the research centre’s five main focus areas, two were fundamental for this project: machine learning and perception. In particular, the field of perception – also known as computer vision – had progressed significantly with advances in projection geometry. However, a major change came about in the early 2000s when the two fields intersected. Ever since, rapid progress has been made. Professor Bishop presented two brief examples of applications from both fields: the image background removal tool that is now part of the latest version of Office, and the movie recommender, which helps predict individual preferences.

As part of their mission of tackling basic science issues, the Microsoft Cambridge lab decided that they would make object recognition one of their blue-skies projects. The problem was hard and one where progress was needed: image understanding through computer vision. Most of the efforts to resolve it with the help of artificial intelligence had failed, revealing the need for a new approach. The Microsoft scientists realised that machine learning could be the key to cracking the puzzle. They then began experimenting with machine learning by trying to recognise and classify simple objects: cows, sheep, bicycles, cars and buildings. An added difficulty of how to track moving objects in real-time was then identified.

Parallel to these research efforts, Microsoft had made their first attempt to enter the video game market in 2001 with the launch of the Xbox, their first generation of consoles. Four years later, their next generation console, the Xbox 360, hit the shelves. It had more processing power and improved graphics. Some months on, the Nintendo Wii altered the landscape of the industry through the introduction of a motion-sensing controller, which allowed the user to interact with the screen via gesture recognition. Sony’s EyeToy and PlayStation Eye were other attempts to introduce gesture recognition devices into the market. Although some advances were made in terms of object recognition, a limitation of these solutions was that they relied on tracking the controller, which had necessarily to be in the hands of end-users.

This is how Project Natal, Microsoft's code name for Kinect, was born. By the time the Xbox 360 was five years old Microsoft was searching for a new kind of games' controller, completely hands-free and able to track the motion of the entire body. Although Hollywood had devised a solution to capture motion some years earlier, it involved a number of requirements that made it completely unsuitable for the video games market. This is why Microsoft decided to build upon their knowledge of machine learning to decipher the problem from a different angle. Nevertheless, the fact that a workable solution had to deal with body positions, sizes, shapes, clothing, colours and backgrounds made the challenge even harder.

The solution worked out by Microsoft was embodied in an add-on peripheral, Kinect for Xbox 360. Kinect is comprised of two key technologies: an infrared emitter and an infrared camera. These allow the measuring of the depth or distance of a body from the screen and provide full-body 3D motion capture. In addition, it has a standard RGB camera for person recognition and a phased-array microphone for voice recognition.

Professor Bishop went on to explain the workings of this device. One important principle was the use of structured light: the infrared emitter projects a pattern of pixels on to an object. The deformation of the pixels caused by the surface of the object is then used to calculate its depth. Then the camera can provide a side view and a top view of the object, reducing the difficulty of the problem. However, tracking the movement of a body on depth video was still an issue.

While the researchers considered using prediction methods for solving the tracking problem, such methods had to rely on a known starting position. Further, they proved not to work accurately for fast motion. In an innovative manner, a Cambridge researcher came up with the idea of carrying out 'template matching'. Using object recognition techniques, the idea was to define around 30 body parts, take depth images of the body, and classify each pixel in real-time as one of those body parts. This had the advantage that each part of the body would have its own 'identity', enabling it to move independently from the others. Therefore, it sufficed to recognise the location of the joints and create a skeleton to track a full body. Once the solution was devised, a machine-learning algorithm was used to train the system so that it would be able classify each pixel according to the most probable body class. This was carried out with data corresponding to a million different body positions. In this manner, Kinect was able to infer body parts very precisely and make accurate hypotheses about the location of 3D joints.

From the moment when Redmond reached out to Microsoft Research in Cambridge, it took nine months to integrate the technology into the Xbox 360 and take it to market in December 2010. In March 2011, Microsoft announced that Kinect for Xbox 360 had sold 10 million units worldwide, winning the Guinness World Record for the fastest selling consumer electronics device.

Professor Bishop ended his talk by offering his view of the significance of Kinect: an affordable technology which can have myriad future applications, not only in gaming but also in user interfaces, surveillance, physiotherapy, dance and sports, to name but a few. He emphasised that this success was only made possible due to the firm's research culture and the vision and persistence of a number of scientists in working on a very hard problem.

The lecture was brought to a close after a lively Q&A session involving students, faculty members and policy makers posing questions about a range of issues such as the future of research in computer vision, the conditions that are most fruitful for undertaking basic research within a firm that is characterised by stringent business objectives, and how interactions between Microsoft Research scientists and the company's business units are promoted.


This report was prepared by Alberto Garcia-Mogollon.

BlueSci has also written up this lecture, here.