WP1 Development, implementation and validation of machine learning algorithms

D1.1 Report requirements from users, work scenarios and performance indicators

Using deep neural networks, in combination with classical statistical methods, the suspicious behavior can be determined with high accuracy. This is performed by determining the characteristics of the object, knowledge of the operational scenario and also by studying its movement. The end result is a smaller number of alarms, and the alarms obtained are much more relevant to the operator, compared to alarms that do not take into account the behavior of the subject.

D1.2 Report standards of interest, legislative framework and compliance issues

Using the ONVIF standard to connect to video sources facilitates the connection to almost everything. This standard allows the transmission of both video data and metadata related to the data. The Modbus standard allows communication through different interfaces, between a variety of devices. It allows the project to control any type of peripheral device. In order for the project to comply with all of the GDPR laws, this project will allow the anonymization of data in multiple ways. This offers a variety of scans in which the project fits perfectly.

D1.3 Report techniques for implementing algorithms based on neural networks

We have chosen the YOLO model as the starting point for the implementation of the algorithm, which is based on the Darknet network. For the desired system, the third version of the model was chosen, because it better suits the current situation. While the YOLOv2 model has fewer layers in the convolutional network, it has problems detecting smaller objects. In contrast, improvements to the YOLOv3 model have led to much better results in the accuracy of detecting small objects, while the processing speed has remained unchanged. This network has been modified and trained to recognize objects of four classes considered potentially dangerous to the selected area of ​​interest: person, car, animal and luggage. A first stage of the algorithm, prior to detection, is a stage of processing the video stream from the surveillance cameras. The next step is the detection itself, after which the network will generate for all the detections in the image, the coordinates of each object of interest, together with the class it belongs to and a confidence score that the network associates with each detection. A final step in the algorithm is to pass the remaining detections through a data merge. This is accomplished by a tracking algorithm, which associates each detection with a track. This keeps a history of all the detections, and one can follow the path of each detection. In conclusion, the aim was to obtain a system that is as efficient and robust as possible for variations in the position of objects, variations in brightness, changes in weather (strong sun, rain, fog, etc.), and even occlusions that may occur in an area of ​​interest.

D1.4 Technical report on the created database

The current activity was dedicated to the collection and indexing of images identified as relevant to the database for training and validation of automatic image analysis algorithms. The ultimate goal of the project is to create a system that automatically deals with the security of an area of interest. To do this, the system must be able to detect and recognize objects belonging to the four classes declared as posing a potential danger to the area of interest: person, luggage, car and animal.

D1.5 Technical report on available embedded hardware systems for distributed processing

Currently, there are suitable hardware solutions for all the processing needs encountered in this project. Due to the need to process video sources, the Nvidia Jetson platform has enough computing power. Modern video processing algorithms are optimized for GPU processing and therefore benefit from the architecture of the Nvidia platform. For processing meta data and information received from non-video sensors, the Raspberry Pi platform is ideal, providing sufficient processing power through the ARM architecture, but also through the perspective of GPIO pins that allow connection to a multitude of external devices.

D1.6 Validated algorithm for detecting suspicious behavior using deep neural networks

Using deep neural networks, in combination with classical statistical methods, the suspicious behavior can be determined with high accuracy. This is done by determining the characteristics of the object, knowledge of the operational scenario and also by studying the object’s movement. The end result is a smaller number of alarms, and the alarms obtained are much more relevant to the operator, compared to alarms that do not take into account the behavior of the subject.

WP2 Design, development and implementation of experimental models of the proposed solution

D2.1 Functional dedicated GRAVI interface software module

The proposed systems, namely the integrated platform and the interconnection module / data feeder, have been developed with the main purpose of receiving video data streams, as well as generating alarms and aggregating data. To these is added the GRAVI module which brings new functionality by being able to set algorithms in unique ways (some as a trigger for others). The interface, in its entirety, is extremely easy to use and intuitive, this being very useful for the beneficiary who can focus on the results of the platform and their exploitation for the necessary purpose, not on how to use it.

D2.2 Integrated and functional classification module using deep neural networks on embedded hardware

GoogLeNet networks are currently the best options for developing an integrated and functional classification module using deep neural networks on embedded hardware. They are recommended for extremely high performance and relatively low computing power required to run successfully and in real time on limited hardware.

D2.3 Integrated and functional module for analysis and recognition of objects with deep neural networks

At the moment, YOLO type networks are the best for detecting objects and people in images. They have major benefits in terms of both performance and computing power.

D2.4 Integrated and functional video analysis module and GRAVI data fusion

The integrated and functional module for video analysis and the GRAVI data fusion, brings a new functionality and dedicated to the security environment. The results are promising, improving system performance.

D2.5 Integrated and functional integration with field equipment

The proposed systems, respectively the integrated and functional mode of integration with the field equipment, is fully functional, tested and validated, bringing new functionality and dedicated to the security environment. Further, in the ongoing pilot project, the tests will be continued and the identified malfunctions will be corrected and the necessary improvements will be made.

D2.6 Laboratory integrated test report

At the moment, YOLO networks are the best for detecting objects and people in the environment. They have major benefits in terms of both performance and computing power, and as determined experimentally at this stage, the results are more than satisfactory. Regarding the thermal spectrum, the results obtained are some of the best in the literature, and within the project the algorithms were integrated into the platform and the prototype model was developed. The conclusions of the integrated laboratory testing results were positive and allowed us, following the implementation of the detected malfunctions, to validate the developed solution. Based on this validation, the consortium took the next step in terms of the maturity of the technology developed and its validation, namely the implementation of a pilot system.

WP3 Pilot model development, testing and final validation

D3.1 Complete documentation of the GRAVI dedicated algorithms validated by testing in real conditions

Dedicated algorithms aim to detect any potential dangers that fall into the area of interest and generate alarms to notify the user of the presence of those dangers. Four different classes were considered as potential hazards: person, car, animals and luggage. The implemented algorithms use neural networks and data fusion to increase performance.

D3.2 Validated modules for dedicated processing

Three modes of information processing have been implemented: (i) server-side topology - In centralized processing, all processing takes place on a dedicated server, which receives all video streams from video sources in the system, analyzes them and issues alarms. In this case, the intelligent video analysis algorithms are located on the server; (ii) hybrid topology - This processing aims to unify the other two architectures, by combining distributed analysis with server processing and database management. In this case, the motion detection and hazard detection algorithms will also be implemented at the central server level; (iii) on-edge topology - In this case, the video streams are analyzed locally, without the need to transmit the full stream over the network. Instead, after running the algorithms on the video streams from the camera, they generate a metadata stream, which they transmit over the network to the central server. In the case of distributed topology, the video cameras used by the system must have sufficient video processing power, and have motion detection algorithms and objects of interest implemented.

D3.3 Dashboard mode and report generation validated

During the activity, the Interface software module was improved and new specific functions were added: alarm carousel, quick views (viewing the image when the alarm is triggered, viewing the camera with which the alarm is associated, viewing the movie associated with the alarm, downloading the associated movie alarm), alarm.

D3.4 Validated integrated platform that includes GRAVI dedicated processing modules

The video processing module is designed and implemented completely modularly. This approach offers a number of advantages, the most important being that the solution can be easily adapted to the needs of each client. The processing module can be divided into 3 broad categories of algorithms: metadata generators, metadata processors and alarm generators. The complete list of metadata generating algorithms available in the video processing module is as follows: Motion Detection, Object Classification, Object Classification. The complete list of metadata generating algorithms available in the video processing module is as follows: group validation, label validation. The complete list of metadata generating algorithms available in the video processing module is as follows: Sterile Zone, Virtual Barrier (Tripwire), Heat Map, Abandoned Luggage, Loitering, Mask Presence, GDPR Masker.

D3.5 Documentation of supported hardware processing platforms and concurrent processing instances

In terms of centralized architecture, the NVIDIA GeForce RTX 2080Ti graphics card was used. It has a high enough processing power that allows the processing of 64 video streams from the cameras installed in the system. In addition, it allows the loading of complex neural networks, with a high number of convolutional layers, which lead to very high performance in terms of accuracy of system detection. For hybrid architecture, a dedicated NVIDIA Jetson TX2 processing device was chosen. It can run up to 8 video streams in parallel. It is integrated at the edge of the system, near the video cameras. For this reason, the infrastructure for transporting flows through the network is much simplified. In terms of distributed architecture, processing video sources were required. AXIS Q8742-E camcorders were used, which have enough processing power to be able to implement a more simplified version of the algorithms developed in the project. A group of up to 8 camcorders can send metadata to the same Raspberry Pi 4 module built into the system.

D3.6 Complete documentation of manufacture, installation, configuration, usage

Product documentation has been developed, including: installation instructions, installation and operating manual, maintenance instructions. The server system is based on the Linux operating system and the main installation steps have been documented.

D3.7 Report on exploitation strategy and sustainability

Good practices for exploitation and sustainability strategy have been identified: exploitation of results directly through their use in further research and development, development, creation or marketing of a product or process, creation and provision of services based on them, use in standardization activities; exploitation of results Indirectly through the transfer of results, licensing, spin-offs.