Semantic Mapping for Visually Impaired
There are more than 285 million people in the world living with sight loss which has a significant impact on their daily lives. Over 85% of these individuals have some remaining vision ...
Incremental 3D Reconstruction and Semantic Segmentation
As we navigate the world, we constantly perceive the 3D structure of the environment around us and recognise objects within it. Such capabilities help us in our everyday lives and allow us free and accurate movement ...
Distributed Optimization in Large-scale Random Fields
While recent advances in combinatorial optimization have focused on important guarantees of convergence, this is not sufficient to achieve desired efficiency on large scale problems ...
Streaming Video Scene Analysis
Per-frame semantic segmentation algorithms often output the temporally inconsistent (‘flickering’) predictions. Extending these techniques to temporal sequences of images is very challenging due to the dynamic aspect of videos ...
Evaluation of Local Feature Detectors and Descriptors
There have been a number of evaluations focused on various aspects of local features, however there has been no comparisons considering the accuracy and speed trade-offs of recent extractors such as BRIEF, BRISK, ORB, LIOP ...
Rapid Vanishing Point Estimation
Vanishing point estimation has been widely used in many robotics tasks, especially in detection of ill-structured roads. The main drawback of such approaches is the computational complexity ...
Semi-supervised Lifelong Learning
Reliable road detection is a challenging problem due to varying environments and lightning conditions. However, temporal sequences of images provide an infinite source of training data ...

Bio

I am a DPhil student in Engineering Science at the University of Oxford co-advised by Philip Torr and Patrick Perez. I am funded by Technicolor and closely collaborate with Microsoft Research (I3D), the Nuffield Department of Clinical Neurosciences (Oxford) and groups at Stanford.

My main research interests are somewhere in the areas of computer vision/machine learning applied to wereable/mobile robotics and film post-production with a focus on understanding dynamic aspects of videos. Recently, I have been involved in development of smart glasses for visually impaired, distributed inference in large scale graphical models and temporal consistency for scene understanding algorithms.

Before that, I had the pleasure to work in the Center for Machine Perception (CMP) at Czech Technical University in Prague and visit the Robotics Institute at Carnegie Mellon University and the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey. I received my MSc. and Bc. degrees from Brno University of Technology.

Research

Interactive Semantic Mapping for Visually Impaired
There are more than 285 million people in the world living with sight loss which has a significant impact on their daily lives. Over 85% of these individuals have some remaining vision. Recently, there has been an interest in developing smart glasses, which seek to provide these people with additional information from the nearby environment through stimulation of the residual vision. The aim is to increase the information level regarding the close environment using depth and/or image edges.

We have been developing smart glasses, with which a user can interactively capture a full 3D map, segment it into objects of interest and refine both segmentation and 3D parts of the model during capture, all by simply exploring the space and ‘painting’ or ‘brushing’ by a handheld laser pointer device onto the world. These enhanced images are then displayed to the user on head-mounted AR-glasses, hence stimulating the residual vision of the user.

Our group is involved in the winning entry (‘People's Choice’) for the Google Impact Challenge 2014.
Distributed Inference in Large Scale Graphical Models
Probabilistic graphical models such as MRF/CRF have become ubiquitous in computer vision for a variety of important, high-dimensional, discrete inference problems such as per-pixel object labelling, image denoising, disparity and optical flow estimation, etc. While recent advances in combinatorial optimization have focused on important guarantees of convergence, this is not sufficient to achieve desired efficiency on large scale problems (millions of pixels with thousands of labels).

As a consequence, algorithms that work well on smaller benchmarks can become impractical on very large scale problems. This concern is at the heart of present work; in particular, given limited number of cpu cores, speed limitations of hard-drives and high costs of shared memory systems, massively parallel processors present an appealing computing paradigm. Thus, it becomes of paramount importance that new optimization algorithms can run in a parallel and distributed fashion on modern clusters and GPUs.
Streaming Video Scene Analysis
3D reconstruction and segmentation: project page | ICRA 2015 paper
streaming inference: project page | ICRA 2013 paper
lifelong learning: project page | ICRA 2011 paper | ICRA 2012 paper
As we navigate the world, for example when driving a car from our home to the work place, we constantly perceive the environment around us and recognise objects within it. Such capabilities help us in our everyday lives and allow us free and accurate movement even in unfamiliar places. Building a system that can automatically perform real-time semantic segmentation and 3D reconstruction is a crucial prerequisite for a variety of applications, including robot navigation, semantic mapping or assistive technology.
Many works have investigated this problem using various models applied per-frame, hovewer, such approaches do not benefit from motion and often output the temporally inconsistent (‘flickering’) predictions. Extending these techniques to temporal sequences of images, as would be seen from a robot, is very challenging due to the dynamic aspect of videos. I have been developing algorithms for incremental (i.e. not batch) and (near) real-time semantic segmentation and 3D reconstruction with the focus on temporal consistency and on-the-fly semi-supervised lifelong learning from videos.

Awards

2012 - Werner von Siemens Excellence Award 2012
2012 - Czech & Slovakia ACM chapter - Student Project of the Year
2012 - The Master Thesis of Year 2012 in Informatics
2012 - The Prize of the Dean (outstanding master’s thesis)
2012 - ABB University Award, the best MSc project (Robotics)
2011 - Student EEICT 2011, best paper award (MSc projects: cybernetics and automation)
2010 - The Prize of the Dean (outstanding bachelor's thesis)
2007 – Ceska hlava, national prize for the best pre-college scientific publication
2007 – The Herbert Hoover Young Engineer Award
2007 – The Prize of the Dean

Collaborators

Philip Torr, Patrick Perez, Vibhav Vineet, Jiri (George) Matas, Martial Hebert, Shahram Izadi, Drew Bagnell, Krystian Mikolajczyk, Daniel Munoz, Matthias Nießner, Morten Lidegaard, Stephen Hicks, Tomas Vojir, Jan Sochman, Petr Petyovsky, Ludek Zalud, Pavel Jura, Miloslav Richter, Ram Prasaath, ...

Publications

2015

The Semantic Paintbrush: Interactive 3D Mapping and Recognition in Large Outdoor Spaces
Miksik O.*, Vineet V.*, Lidegaard M., Prasaath R., Nießner M., Golodetz S., Hicks S.L., Perez P., Izadi S. and Torr P.H.S.
In Proceedings of the 33nd annual ACM conference on Human factors in computing systems (CHI) 2015, Seoul, South Korea
* Joint first authors
PDF | Project Page | Show BibTex | Show Details


Incremental Dense Semantic Stereo Fusion for Large-Scale Semantic Scene Reconstruction
Vineet V.*, Miksik O.*, Lidegaard M., Nießner M., Golodetz S., Prisacariu V.A., Kähler O., Murray D.W., Izadi S., Perez P. and Torr P.H.S.
In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2015, Seattle, USA
* Joint first authors
PDF | Project Page | Show BibTex | Show Details

2014

Distributed Non-Convex ADMM-inference in Large-scale Random Fields
Miksik O., Vineet V., Perez P. and Torr P.H.S.
In Proceedings of the British Machine Vision Conference (BMVC) 2014, Nottingham, UK
PDF | Project Page | Show BibTex | Show Details

2013

Efficient Temporal Consistency for Streaming Video Scene Analysis
Miksik O., Munoz D., Bagnell, J. A. and Hebert M.
In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2013, Karlsruhe, Germany
PDF | Project Page (with datasets) | Show BibTex | Show Details

2012

Evaluation of Local Detectors and Descriptors for Fast Feature Matching
Miksik O. and Mikolajczyk K.
In Proceedings of the International Conference on Pattern Recognition (ICPR) 2012, Tsukuba Science City, Japan
PDF | Show BibTex | Show Details


Rapid Vanishing Point Estimation for General Road Detection
Miksik O.
In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2012, St. Paul, USA
PDF | Project Page | Show BibTex | Show Details

2011

Robust Detection of Shady and Highlighted Roads for Monocular Camera Based Navigation of UGV
Miksik O., Petyovsky P., Zalud L. and Jura P.
In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2011, Shanghai, China
PDF | Project Page | Show BibTex | Show Details | Poster | Errata


Adapting Polynomial Mahalanobis Distance for Self-supervised Learning in an Outdoor Environment
Richter M., Petyovsky P. and Miksik O.
In Proceedings of the IEEE International Conference on Machine Learning and Applications (ICMLA) 2011, Honolulu, USA
PDF | Show BibTex | Show Details | Poster

Thesis

Dynamic Scene Understanding for Mobile Robot Navigation
Miksik O.
Master's thesis, advisor: Dr Ludek Zalud, supervisor: prof. Martial Hebert
consultants: Daniel Munoz, Dr J. Andrew Bagnell
Brno University of Technology, 2012
PDF | Show BibTex | Show Details


Miksik O.: Fast Feature Matching for Simultaneous Localization and Mapping
Bachelor's thesis, advisor: Dr Krystian Mikolajczyk
Brno University of Technology, 2010
PDF | BibTex | Show Details