One of the challenges everyone faces when developing mobile apps is to make them easy to use for the broader public. Getting feedback on the UX (User Experience) of an app is a crucial part of making systems more intuitive, and there are many ways to get that feedback. Surveying users is probably the most common option. However, it has limitations and is not always suitable for covering all the possible situations.
One of the most powerful approaches we’ve been investigating is eye tracking, which is invaluable in learning how people focus on different areas of a website or app. This method is usually not popular to perform because it requires expensive tracking devices and software.
As part of our recent collaboration with Data Science Institute at Bournemouth University we have been working on developing a low-cost eye tracking app which works with any modern Android tablet with a front facing camera. Our app integrates state-of-the-art computer vision algorithms for gaze detection combined with machine learning to calculate accurate predictions.
Our Android mobile app (available on GitHub) combines several open-source components:
- OpenCV: the major open-source computer vision library.
- Cambridge face tracker: a Constrained Local Model (CLM) framework developed by Tadas Baltrušaitis from the University of Cambridge.
- dlib: a C++ library for machine learning.
- Boost: a collection of C++ libraries.
- Weka: the main library for machine learning algorithms.
All these components have been carefully integrated using Android SDK and NDK.
Our app works as follows:
- In an initial training phase the user is asked to look at a point which is moving around the screen. At the same time, we process the frames captured by the camera to extract the 3D vectors representing the user’s sight. We then use the list of points and vectors to build two predictive models (one for each screen coordinate).
- After that, the real-time detection phase starts. Here we continuously calculate the 3D vectors for each incoming frame. Finally, we use them as input to predict the region of the screen at which the user is looking at. To make such predictions we use the k-NN models which have been personalised with data from the user.
The most time consuming part is finding the face landmarks (see picture below). In the first place, a cascade classifier based on local binary patterns (LBP) is used to detect the face. We could also use a Haar cascade which is more robust, but at the same time slower than LBP. Then, a CLM grid is adjusted into the face which provides the location of the 70 landmarks. The 3D gaze vectors are then detected using the head position and the location of the pupils. A more detailed explanation can be found in this paper, Rendering of Eyes for Eye-Shape Registration and Gaze Estimation.
A major issue that we found during the development of this app is the limited CPU computation power in tablets compared with desktops. Frame-by-frame processing is quite CPU intensive and it can lead to a poor user experience if such processing is not fast enough. Nevertheless, we believe that in a few years that problem will be overcome by more powerful processors.
In addition, the development of Deep Learning in the recent years is boosting the way computers are learning (see e.g. Virtual Eyes Train Deep Learning Algorithm to Recognize Gaze Direction). In the near future we will see how these algorithms help to improve the robustness of gaze detection and many other fields.
Our preliminary results have been very promising and we are motivated to continue further with this project since it could have multiple use cases and benefit a large variety of people. Possible applications include any hands-free control of devices (e.g. from drivers to people with reduced mobility), usability tests, videogames, and more.