1st May 2020

Coronavirus and Bluetooth tracing apps

There has been some discussion recently about the need to balance privacy concerns against the need to do contact tracing.
As far as I can see this is not a valid concern, or rather it does not need to be. There should be no need to risk anybody's privacy or security.

I have (very quickly) written here a brief initial draft of how such a system could work, and be designed such that the agency operating the service could not be accused of using the app for any other purposes.

This would increase public trust, which would in turn vitally increase the take-up of the app, whilst improving the security and privacy of individuals.

See "NHSX Concerns" at the end of this page for some non-technical perspectives.

I am not a software architect by profession, and would welcome input and corrections if I have missed something here. However, as I see it, this basic design for an app and its workflow would be effective and would respect users' privacy. It could also be open-source, and users could choose to download the compatible implementation of their choice, from the developer of their choice.

coronavirus and bluetooth - https://pixabay.com/illustrations/coronavirus-symbol-corona-virus-5062354/

image from pixabay.com (pixabay license)

Suggested Architecture and Design

The basic idea of these apps is that most of us have a bluetooth-enabled smartphone on our person when we are out-and-about. A few don't, or won't want to use it, but if the majority of us have such a phone, and have an app on it which is running for the purposes of this tracking of coronavirus contacts, then the outliers who can't or won't run the app will not significantly affect the overall outcome.

When two bluetooth devices come into range of each other, they can recognise that they are both running a Coronavirus Tracking app, and can each send their unique ID to the other device.

Both devices then store the fact that they came into contact with the other, and can timestamp that interaction too. The timestamp would not need to be as precise as to the millisecond, indeed, it might only be to the day, so "On 3rd May 2020 this device came into contact with another tracking device, which had a unique ID of 3858f62230ac3c915f300c664312c63f"

The other device will store a similar record, that "On 3rd May 2020, this device came into contact with 96948aad3fcae80c08a35c9b5958cd89".

Your device then stores a list of records: Date and Device ID. You don't know anything else about the other device than its Unique ID, which it randomly generated for itself.

From here on, we'll refer to these two example devices as as 3858 and 9694, for readability.

Centralised Server

The UK Government - or even a larger, or smaller, or independent organisation - could then run a centralised service, which would simply store a list of Unique IDs of patients who have been tested positive with Coronavirus, and when they were so tested. It would offer three functions:
1) "May I register with this Unique ID?" (just to ensure no accidental duplicates)
2) "Here is a list of Unique IDs I have met, and when I met them; have any of them been tested positive for coronavirus recently?
3) "I am a medical professional; here are my credentials, I can confirm that the owner of device 9694 has been tested positive with conoravirus on this date"

The agency running the service could determine the value of "recently", so maybe 3858 last came into contact with 9694 on 3rd May 2020, and 9694 was then tested positive on 20th August 2020. If 3858 queried on 21st August 2020, the centralised service might decide to count that as a "No", given the medical understanding at the current time (August is a long time after May). This policy could be changed centrally as required, as new scientific information about the virus is learned.

The apps only store when they came into contact with each other. The rest of the logic is centralised.

Process

If and when the owner of a device is found to have the virus, they notify their app. The app can in turn notify some centralised server. It will literally just say "Device 3858 is now believed to have the virus". To avoid spam / prank messages, maybe the medical facility would be able to interact with the app, and they would be the ones authorised to send the message to the centralised server.

You put your app in an "I've just been told I've got the virus" mode, and the medical practitioner links their app with yours. Their app then sends your ID to the centralised server. Or if you are unable to do that (maybe you're unconscious and can't unlock your phone), your app will still be sending your Unique ID out, so the medical version of the app could pick that up (isolating out anybody else's phone!) and send it that way.

There is no need for any party to disclose their identity at any stage. When requesting the status, you do not need to send your Unique ID, only the list of people you met, and when.

On a regular basis - daily, hourly, every 15 minutes, or maybe even on-demand - any user's app can contact the centralised server for the current list of positive-tested devices. It compares that list against its local list, and if there is a match, then the app notifies its user that they have been in contact with somebody who tested positive.

Storage Concerns

The example IDs above (3858 and 9694) are 128-bit numbers. These allow for 2^128, or 340,282,366,920,938,463,463,374,607,431,768,211,456 unique IDs to exist without two devices sharing the same "unique" ID. (see https://en.wikipedia.org/wiki/128-bit_computing for a bit more info on such large numbers. These should be sufficiently large that each device can be allowed to randomly generate its own ID. Upon generating an ID, it could be queried against the centralised system to ensure that it is unique.

One of the first questions I asked myself was how much storage space on my phone would be used up every time I come within 10 metres of somebody?

Number of interactions per day	100	1,000
Data retention in days	100	100
Total Storage in bytes	160,000	1,600,000
Total Storage in Megabytes (Mb)	0.15 Mb	1.5 Mb
Total Storage in Gigabytes (Gb)	0.000014 Gb	0.00014 Gb

If you store all interactions over 100 days, and come into contact with (within 10 metres of) 1,000 people every day, the total storage capacity on your phone will be 1.5Mb.

You need to store when you last came into contact with that person, too. So you need a timestamp (really, just the date) of when you most recently came into contact with them. So each of those 100 x 1,000 = 100,000 records needs a date associated with it. That would be 3 bytes per date (https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html). So 300,000 more bytes, or 0.3Mb, for a total of 1.8Mb storage required.

Privacy Concerns

There are still some privacy concerns with this approach; I'll highlight them in bold, followed by suggested responses:

somebody else's app may be tracking these and storing detailed timestamp and location information of when their device came into contact with yours. That in itself should not be overly concerning, in that all they have is your Unique ID, not any more information about you.
But if they did store such detailed information, it could be that if you then became infected, their more-detailed app would be able to say "Device ID 3858, which is the person you met at 14:32 on 3rd May 2020 at your home, has now been reported as positive with coronavirus." If they live alone, and they know that they only had one visitor at their house that afternoon (which is quite plausible under lockdown conditions) then they would know that it was that visitor who has had it.

I would suggest that the resolution for that would be that the protocol should be that when you ask your phone "Have I been in contact with anyone who now has the virus?", your device uploads a list of Unique IDs and timestamps, and the service simply replies with a "Yes" or "No" as to whether anybody in that list has been reported positive. The centralised service could then also be the one in charge of managing timescales.

That then leads to a malicious app which sends each Unique ID in turn, and says "Does this match?" "Does this match?" "How about this one?" - to which a simple answer could be rate-limiting of (say) 10 queries per device per hour. That might be a necessary restraint to avoid a DDoS on the centralised system anyway.

Repeatedly sending who I met, and when is effectively creating an Identity image for oneself on the centralised server. When you then say "I met these people, plus this other person" it could be deduced that you are the person who made the previous request (particularly if also using the anti-DDoS token, as mentioned below). But it still doesn't identify anything about who you are. Only that you have made these requests.
However, this could be used for mapping groups of people; you've been with A, B and C, whilst A has met D, E and F, and B met with G, H and I, and C met with J, K and L. But that is pretty much the point of this application. So long as none of you can be identified as individuals (so maybe the anonymous proxy should be a mandatory part of the protocol?) that is the desired effect. We are wanting to do contact tracing. Indeed, that could be used to do secondary tracing. Maybe D-L need to be told, as well as A-C, if you have been tested positive.

Conclusion

I am not suggesting that I have created some genius design here - quite the opposite. I am simply pointing out one simple mechanism, which on further research seems to be quite similar to the Google / Apple system, and I am sure that their system has further advantages (I would certainly hope so, given their amassed brainpower, as compared to mine!)
What I am saying is that any arguments that we should be required to install some app which demands any more access to our phones than simple bluetooth access is entirely flawed, and that a system can be created which will work to address the problem without granting additional rights on our phones, which - as we are probably now aware more than ever before - are central to many of our lives, and know more about us than some of our own family members!

This implementation would still put a lot of decision-making capabilities in the hands of the centralised service, but there would be no need for an individual to register, to share any information about their Smartphone or other device with the Government or other agency running the service, would not need to authorise the app to access anything beyond Bluetooth and Internet - importantly, not allow the app to access any location information, any files on the device, any contact information on the device.

The centralised service would also have access to information like the IP addresses of the devices that contact it. This could be used to identify the carrier network, and possibly - via the carrier themselves - the ID of the person using the device. A proxy service similar to Lavabit or some VPNs could be used, which was trusted to proxy these requests without storing any data about the transaction.

Such a proxy could be used to avoid the rate-limiting suggested above; each response to a valid request could include a token which can be used in the subsequent request. That way, one malicious user through the proxy would be refused, but a legitimate user, through the same proxy, would provide their previously-granted token to show "I last requested access 2 hours ago, I now request access again" and will be provided with the valid response, and a new token to use for thier subsequent request.

NHSX Concerns

There have been concerns raised that NHSX, "a joint unit bringing together teams from the Department of Health and Social Care and NHS England and NHS Improvement" are proposing developing their own app, rather than - say - one proposed by Google (Android) and Apple (iPhone): https://www.bbc.co.uk/news/technology-52441428 - where "NHS says it has a way to make the software work "sufficiently well" on iPhones without users having to keep it active and on-screen." - just to repeat that: The NHS knows more about writing iPhone apps than Apple does.
Further reading:
Dominic Cummings accused of conflict of interest over NHS fund
UK government using confidential patient data in coronavirus response
Palantir Coronavirus contract did not go to competitive tender
The Professors v the Government
=> Surveillance Edition
Michael Veale
"Warning against plotting applications (in French, but automated translations are available)
Inside Dominic Cummings’s coronavirus meeting with big tech

Tracing coronavirus contacts without violating privacy