Seeing the Potential of AI in Video

Artificial Intelligence is an all-encompassing category covering various things, including a wide range of neural networks with different capabilities. Neural networks are segmented by how they approach a particular, unstructured data set or problem, either with a process, algorithm or machine learning approach.

The Core of Learning, Shallow or Deep

Due to the previous limitations of hardware processing power, machine learning could only deploy shallow learning of very large data sets. This shallow learning looks at data in just three dimensions. With recent, significant advances in processing power of graphical processing units (GPUs), we now can utilize a deep learning approach where we can look at data in many more levels or dimensions – hence the word “deep”.

Milestone has moved to this new, GPU-compute platform by re-coding our software to use a new type of coding called parallelization. Software parallelization is a coding technique for breaking a single problem into hundreds of smaller problems. The software can then run those 100 or 1,000 processes into 1,000 processing cores, instead of waiting for one core to process the data 1,000 times.

With parallelization, there is a quantum leap forward in how fast we can solve a problem. And the faster we can solve a problem, the deeper you can go with a problem or the deeper the data sets can be processed.

IoT Frameworks

Milestone’s role as a video management platform company is in aggregation to develop broad support for all relevant devices. We have a vision to support all top IoT frameworks. What is a framework? ONVIF, for example, is an IoT framework. Many think of it as a camera firmware standard, but it is actually an IoT standard. If the camera device is an IoT device, then that is an IoT framework.

Our focus is to continue enabling more and more of those devices across different frameworks into a common data center. Then, we will continue to advance GPU technology to create a whole new level of processing, helping companies that are using GPU as a parallelization – like BriefCam – to run those functions right on the unused GPUs already in the hardware.

Our development partner NVIDIA invented the GPU, and they are driving machine-to-machine communication at an exponential rate. One of NVIDIA’s GPUs offers 5,000 cores – meaning 5,000 problems can be processed in a nanosecond – and we are hardly using any of those cores yet! We are decoding the video and detecting slight motion, but then there’s still a significant amount of resources available.

By allowing companies to plug right into that pipeline, the VMS can process all of their data without having any hardware plugs. The processing power is there. Then we can extract all the middle data out of that, have that aggregated and start creating automation. From there, we can create new types of visual presentations for this information.

Advanced rendering is about creating a whole different type of mixed reality. Some data are artificial, some data are real. As humans, we will use both to create a more interesting picture of a problem. The BriefCam Synopsis system is an example of mixed reality. It uses real video, extracts objects of interest and then provides an overlay of augmented reality. Humans cannot look at 24 hours of video in nine minutes. But, with Synopsis, our intelligence can be augmented.

Actualized Potential of Augmentation

AI and machine learning are being applied for AI-enabled devices and machines to get very good, low-cognitive functions. For example, humans cannot sit and watch all cameras simultaneously, all the time; our attention does not work that way. But machines are extremely good and detailed at this. We do not see pixels, we see objects. The machine sees the most finite detail available to it, which is the pixel, and within the pixel, it can see more details, which are the shade of colors of that image. By aggregating data, allowing machines to automate responses and solutions, we can augment human interaction and our environment.

I think everything is about to change. Just in how we review and utilize video and data, we are going to see massive advancements.

Imagine an interaction between a near-eye lens, medium-distance viewing glass and large video screens. There may be an overlay of detailed text data on my small lens, augmented video in the medium distance, with the big scene view on a large screen. The live video, augmented visuals, and text data will be in concert. When I’m looking at a large scene, data will change what I am seeing in my near-eye screen. With this intelligence augmentation, the system will know that I am looking at a face or building or license plate; it will help me figure out who or what I am looking at and show some related information. That is actually all possible today.

A Vision Forward

The City of Hartford is a great example of technology as a force multiplier. Milestone and our partners have worked with the City of Hartford C4 Crime Center in creating an intelligent city beyond human capability. The Crime Center uses BriefCam Synopsis technology with the Milestone VMS platform and other analytics like ShotSpotter and The Hawkeye Effect GPS location. It also uses Axis cameras and all the devices it has aggregated in Hartford to solve crimes that it could not solve before.

Not only are many crimes therefore now solvable, but also the crime center does not have to spend 30 hours doing low-cognitive, manual tasks, like freezing on a rooftop to watch a drug house all day and night. Officers can now sit at their desk and within just a few minutes, know exactly where a drug house is by seeing an augmented reality of foot traffic over time, accordioned into useful overview.

Officers can simply go into the data and extract the problem. That precision and efficient use of resources is a game-changer in how we as humans will work in our normal jobs. This is just one of many examples that we will start to see as we identify and address the problems we want to solve using new technologies.

An Intelligence Revolution

Having machines take over low-cognitive tasks will be the big trend for years to come. With proper aggregation of information, machines can be better at low-cognitive tasks than humans are, and often deliver a better quality of service than humans.

Amazon is applying this to retail stores where the concept of a checkout is being replaced by customers simply walking out. By using data from smartphones, cameras, sensors, purchase histories and other data points, Amazon is making it possible for us to walk into a store, pick up what we need and walk out. Everything else is taken care of by machines. This type of thinking and tool creation is in its earliest infancy but will continue to address problems that are of more value to our lives.

In the book ‘The Inevitable’, Kevin Keely writes (and I paraphrase) that the next 10,000 startups will be based on bringing Artificial Intelligence to something, like what happened during the industrial revolution when everything was electrified. We have seen washing machines, for example, go from manual operation, to electric and now to having a network port and some level of AI.

The intelligent industrial revolution is beginning to happen all around us. It will be very disruptive within the security and surveillance industry — but also insightful and liberating as we free human efforts to higher cognitive processes and address the larger challenges.

By Keven Marier, Director of Technology Business Development at Milestone Systems

(A version of this article was published in the September 2018 issue of .)

Related posts

Clearing up Confusion: Misconceptions About AI

Clearing up Confusion: Misconceptions About AI

The confusion I hear in the industry starts with the definitions of these terms: artificial intelligence (AI), machine learning (deep and shallow), and analytics. Some believe these things are all the same and use them interchangeably. On the...