The COVID-19 pandemic has the entire world on edge. I’m not going to get into any medical details because amongst others we’re in the electronics business and not medical. So why the article? The incubation period of the COVID-19 virus is approximately 5 days however during the first 5 days infected people are still infectious even though they are not showing any symptoms. To make matters slightly worse is that even after the first few symptoms start to show especially when the symptoms are mild it leaves enough doubt for an infected person to continue with their normal lifestyle.

Fast forward to the positive test result – An infected person has now been in the public domain while being infectious for up to about 7 days. It is easier to trace big single events like a church gathering as was the case in South Korea and closer to home in the Freestate. However, it is almost impossible to trace unknown people who have simply crossed paths ever so briefly with the infected person. For example, if an infected person did a small amount of shopping – they drive from home, spend 10 minutes shopping and in the shop, there is only one other customer who happens to stand in front of them in the pay point queue. The shop cannot send out a general notice because those 10 minutes only affect one random person.

In a traditional track and trace function, an investigator would approach the shop and find the records of all the customers who happened to be in the shop at the same time. To ensure that they cover everyone (because some people could go into the shop but not purchase anything) they may need to review the video surveillance and try and track the person down via other traditional means. This is an extraordinarily time-consuming task that results in massive gaps in the track and trace. When the pandemic affects just a few hundred people tracking and tracing people who were in direct contact begins to reach the thousands or even tens of thousands of people it becomes impossible to trace fully.

Since the urgency of tracing all contacts is vital to the world’s health there needed to be a better/more accurate, faster and cheaper method to tracking and tracing all the contacts. There is – the world’s most popular internet of things device – the cell phone. The cell phone alone is not sufficient – the data collected needs to be processed with very advanced data analytics principles.

This is how it works. Cellular networks are able to provide a generally accurate location in cities using a technology called triangulation. Triangulation works by timing the signal being sent between the cell phone tower and the individual’s cell phone. When the cell phone generates a signal it is received by multiple cell towers.

Why is time relevant? We know the speed of radio waves which allows us to calculate an approximate distance from the tower. Let’s assume we know that one individual is standing 2.5km from the tower. This gives us our first point of data. However, that could imply the person is standing anywhere in 360 degrees around the tower. When we apply a second towers data which might be 1.5km away it narrows our location to a fine point. For example, the towers are at known points and essentially there is only one location on the map where it is possible to be 2.5km from one tower and simultaneously 1.5km from another. Once the third tower data point is included we have a co-ordinate accurate to up to a few meters – which is about as accurate as a GPS. See the image below to give a better visualization of what is required to triangulate a location.

Knowing where each person is is a major step but we still have a substantial amount of work to do. This kind of project requires an overlay of locations that are close to each other compared with a time to ensure there was a genuine and substantial overlap.

Let’s start understanding the data.

Lets first look at the scope and size of this data set. South Africa has an estimated 20 to 22 million smartphones and up to an estimated 90 million mobile devices. I assume many of the 90 million mobile devices are not active cell phones – eg iPads so let’s use the lowest of the numbers (even though this is likely an underestimation) – 20 million devices.

Our data set needs co-ordinates based on time of which the more often a record is generated the more accurate the analytics.

Let’s assume we have one location record every 5 seconds. There are 86 400 seconds in a day (60 seconds*60 minutes*24 hours). This equates to 17 280 (86 400/5) location records per day.

Now, this is where it gets complicated. 20 million devices are producing 17 thousand records per day.

20 million * 17 280 = 345,600,000,000

That’s 345.6 billion records per day!

Although, we’re not yet done! Since the incubation time period is an average of 5.2 days and since there is a lag in testing results our data set needs to contain more historical data. I think two weeks of historical data will suffice (however do note that two weeks of past data will grow by one day in the future adding another 345 billion new records).

345.6 billion * 14 days = 4,838,400,000,000 (4.838 trillion records)

If you’re still following this is where it gets the most complicated. Our data set of 4.838 trillion records contains the movement history of 20 million devices. Within this data set we need to compare the location of each record to the location of each other record to determine if they overlap. If they overlap then we need to know by how much (while also determining what the person was doing at the time).

Let’s understand some of the rules –

If the record is close we need to try and understand if there was significant contact. EG perhaps the record overlapped for just one-time lap (5 seconds) because two people were driving in opposite directions. This scenario should be marked as insignificant.

If on the other hand, a record overlapped significantly we might determine that based on the movement history and speed it was likely that two people shared a vehicle then the record should be marked as very significant.

I don’t want to get too stuck on the rules nor the methods for analyzing the data set because this process would become enormously complex and difficult to explain. I think it is more important and practical to simply state that data analytics and the processing of these mega large data sets is a direction data scientists have been moving towards for many years. A project like this might seem complex and given the huge number of records might seem impossible however the truth is that this type of analysis is easier than many other scenarios because the data complexity is quite small.

Let’s look at some conclusions.

Once our data set is analyzed we can identify the historical movements of a patient who tested positive and identify any individual who would be deemed to be high risk. Interestingly, as each patient is diagnosed our data set will become better and better at predicting the types of overlap exposure which would give a positive test. This in our opinion is the most wonderful usage of an IoT device like a cell phone and predictive analyses of the data set which results in ultra-useful information.

Saving time, costs and ultimately lives using IoT and Predictive Analytics on a global scale!

Right, so now that you understand broadly how the data set can help track and trace that exact same dataset can be used to track people breaking the lockdown rules. Essentially applying a similar set of rules the data set can provide insight into people’s movement. If movement is continuous and in a circle then it might indicate a runner. If an individual travels every day to the shops for a short period it might indicate that they are shopping as an outing as opposed to an individual who visits the shops every day at 8 am and leaves at 5 pm which would indicate that the person is an employee of the shop.

Data used correctly can be incredibly powerful. It can literally save the world. TechThrive is an expert IoT implementer who can facilitate the large scale acquisition of data and the subsequent predictive analyses to ensure the data is being used effectively.

Edits: We are pleased to note that the South African government has appointed an independent judge to manage the data. We alluded to the fact that a data set this large could also be used maliciously. So once again we’re very pleased that data security and personal information is being respected.

Second edit: The reports published in the mainstream media suggest that the start date for the data set is the 5th of March. This equates to roughly 2 twice our estimated numbers – of course, we don’t know the frequency of the records nor the exact quantities of a cellular device.