To deep, or not to deep, that is the question!

BLOG

TREELOGIC, THE DIGITAL TRANFORMATION

VÍCTOR FERNÁNDEZ-CARBAJALES| 12/02/2019

Deep Learning, also known as deep-structured or hierarchical learning, is not a novel concept. The general approach has been discussed in the scientific literature since the 1950s. If that's the case - if Deep Learning has been around for quite some time - why has it become such a hot topic over the last several years?

Computer Vision - Deep Learning — One of the areas that is revolutionizing Deep Learning is artificial vision or Computer Vision.

Emerging at the end of the 2000s, the key development was the advent of truly reliable means for training deep neural networks. Today, there are several factors that contribute to the importance of Deep Learning in the fields of Artificial Intelligence and Machine Learning technology, such as advances in hardware (mainly in GPUs, now in TPUs), coupled with the large accumulated volume and exponential output of data.

Furthermore, Deep Learning is an active area of research and is revolutionizing other domains within the broader field of artificial intelligence. One of these areas is artificial vision, or Computer Vision. As a scientific discipline, computer vision is considered a subfield of artificial intelligence, concerned with acquiring, processing and analyzing frames and/or frame sequences of the real world in order to generate information that can be processed by a machine. One simple way to understand this technology is to compare it directly with the human visual system.

Title with Solid Background Color

As in other fields of artificial intelligence and prior to the emergence of Deep Learning, especially deep neural networks, artificial vision research was focused on a traditional Machine Learning approach. The traditional machine learning approach relies on developers massaging the data to extract the most salient or significant aspects from the data they are dealing with; that is, time sequences of frames, or videos.

In this case, both scientific research and application development have been centered around identifying the most significant image elements that would allow, for example, facial and body recognition of the people who appear in the images, tracking them from one frame to another, or classifying the vehicles that move through a given area. After extracting this meaningful data, statistical methods are then employed to transform the representation into a so-called "understanding" of the real visual environment by using clustering, support-vector machines (SVMs), and filtering algorithms (linear, non-linear, regression), among others.

This means that the merits of any given application lie in how well researchers and developers are able to source and generate data from the raw processed frames and transform it into useful structured data. Moreover, owing to human nature, the relationships found in this characterization are normally linear, cause and effect, or intentional-functional, which makes it extremely difficult to generate descriptors or characterizations outside this formally logical line of thinking.

With this in mind, we can see that such traditional approaches to artificial vision only permit an analytical approach closely related to the specific purpose or question, since they rely on taxonomies that remain closely bound to the parameters of the problem as stated. This leads to a situation in which the approach to a solution must be adjusted or tweaked even to solve problems that are similar, but which take place under different conditions.

Title with Solid Background Color

This methodological framework was altered when deep learning approaches began to be applied. Initially, the most novel developments within Deep Learning were not concerned with issues in artificial vision owing to the complexity involved in processing multimedia information. However, with the arrival of the first neural networks specifically designed for working with such "big data" (VGGNet, AlexNet, DarkNet, and others), a vast new domain for innovation emerged.

Neural networks, principally convolutional neural networks (CNNs, or ConvNets), have allowed researchers to discover non-linear relationships in the data that were too complex to tease out using traditional approaches to artificial vision. Although this may seem trivial, it has meant that almost any previously-resolved issue could now potentially be improved upon by fleshing out possible hidden relationships now detectable in the data and, consequently, improve its characterization. The focus of current work is mainly on finding the most efficient way possible to exploit these networks, principally involving how they should be adjusted and the best way to train them.

These advantages come at a price, which is the large volume of data that such neural networks require for training. While the input data required were comparatively limited under traditional approaches to computer vision, given that researchers were left to extrapolate the solutions "on their own", any solution using Deep Learning requires the machine to learn by processing information that is both relevant and irrelevant to the decisions to be made. One example is the Common Objects in Context (COCO) database, which is used to train neural networks in the detection, segmentation and classification of objects in images. This database consists of 330,000 images used to classify 80 types of objects. However, the same impetus that has led to the creation of new networks has also led to the creation of extensive public datasets (COCO, ImageNet, NMIST, and similar), a development which has opened the door to the creation of applications without the need to generate very complex datasets that would have required significant human effort.

In short, Deep Learning technology has made it possible to improve existing applications by improving the processing of available data, as well as by trading manual effort (development of descriptors) for machine processing (training). Moreover, this means that adjustments to novel situations no longer normally entail complex application development, but rather machine training for the specific tasks to be solved.

Title with Solid Background Color

Currently, artificial vision is in an enviable state of rapid advancement and development of potent solutions that have permitted more complex and robust approaches. More traditional artificial vision methodology (prior to application of Deep Learning) is still substantially employed whenever the use case does not allow obtaining large volumes of data (low-probability events, costly annotation, and so on), or there are no datasets already built that could be used, either of which obviates the use of Deep Learning.

Furthermore, traditional artificial vision methods work in close symbiosis with these new approaches through pre-processing of the images that must then be used to train neural networks. Such pre-processing focuses on improving the quality of neural network inference by feeding the system with more meaningful data. For example, extracting parameters for tracking people as a temporal sequence instead of directly using the video images, as well as reducing the inference data by detecting and pretreating significant parts of the image; for example, by using facial recognition in an image to enhance a personal identification network. On the other hand, the improvement in computer vision methods through the use of Deep Learning has made it possible to tackle more variable datasets, either having more complex and robust relationships, or which experience unexpected changes in prevailing conditions.

Treelogic possesses a team of highly trained professionals in both traditional and emerging novel approaches to artificial vision, backed by over 15 years of experience in bringing innovative solutions to market and a notable participation in research projects at the European level. Owing to such ongoing research, our team remains at the forefront of developments in the field and is well placed to continuously assess novel and emerging state-of-the-art approaches for the deployment of well-tested techniques in stable production environments.

This solid track record allows Treelogic to develop and deploy advanced image and video processing solutions to solve a wide range of critical issues across a wide range of industries. In serving our clients, only the most efficient methods are brought to bear, considering both software application development and the choice of the most suitable camera technologies for each particular case.

BLOG

TREELOGIC, THE DIGITAL TRANFORMATION

Title with Solid Background Color

Title with Solid Background Color

Title with Solid Background Color

RELATED POSTS