OpenAI Forum
+00:00 GMT
Sign in or Join the community to continue

On Trust: Backdoor Vulnerabilities and their Mitigation with Turing Recipient, Shafi Goldwasser

Posted May 03, 2024 | Views 585
Shafi Goldwasser
Shafi Goldwasser
Shafi Goldwasser
Director @ Simons Institute for the Theory of Computing, University of California, Berkeley

Shafi Goldwasser is Director of the Simons Institute for the Theory of Computing, and Professor of Electrical Engineering and Computer Science at the University of California Berkeley. Goldwasser is also the RSA Professor (post tenure) of Electrical Engineering and Computer Science at MIT and Professor Emeritus of Computer Science and Applied Mathematics at the Weizmann Institute of Science, Israel. Goldwasser holds a B.S. Applied Mathematics from Carnegie Mellon University (1979), and M.S. and Ph.D. in Computer Science from the University of California Berkeley (1984).

Goldwasser's pioneering contributions include the introduction of probabilistic encryption, interactive zero knowledge protocols, elliptic curve primality testings, hardness of approximation proofs for combinatorial problems, combinatorial property testing, and pseudo deterministic algorithms.

Goldwasser was the recipient of the ACM Turing Award in 2012, the Gödel Prize in 1993 and in 2001, the ACM Grace Murray Hopper Award in 1996, the RSA Award in Mathematics in 1998, the ACM Athena Award for Women in Computer Science in 2008, the Benjamin Franklin Medal in 2010, the IEEE Emanuel R. Piore Award in 2011, the Simons Foundation Investigator Award in 2012, and the BBVA Foundation Frontiers of Knowledge Award in 2018. Goldwasser is a member of the NAS, NAE, AAAS, the Russian Academy of Science, the Israeli Academy of Science, the London Royal Mathematical Society and a Foreign Member The Royal Society. Goldwasser holds honorary degrees from Ben Gurion University, Bar Ilan University, Carnegie Mellon University, Haifa University, University of Oxford, and the University of Waterloo, and has received the UC Berkeley Distinguished Alumnus Award and the Barnard College Medal of Distinction.

+ Read More

Shafi Goldwasser is Director of the Simons Institute for the Theory of Computing, and Professor of Electrical Engineering and Computer Science at the University of California Berkeley. Goldwasser is also the RSA Professor (post tenure) of Electrical Engineering and Computer Science at MIT and Professor Emeritus of Computer Science and Applied Mathematics at the Weizmann Institute of Science, Israel. Goldwasser holds a B.S. Applied Mathematics from Carnegie Mellon University (1979), and M.S. and Ph.D. in Computer Science from the University of California Berkeley (1984).

Goldwasser's pioneering contributions include the introduction of probabilistic encryption, interactive zero knowledge protocols, elliptic curve primality testings, hardness of approximation proofs for combinatorial problems, combinatorial property testing, and pseudo deterministic algorithms.

Goldwasser was the recipient of the ACM Turing Award in 2012, the Gödel Prize in 1993 and in 2001, the ACM Grace Murray Hopper Award in 1996, the RSA Award in Mathematics in 1998, the ACM Athena Award for Women in Computer Science in 2008, the Benjamin Franklin Medal in 2010, the IEEE Emanuel R. Piore Award in 2011, the Simons Foundation Investigator Award in 2012, and the BBVA Foundation Frontiers of Knowledge Award in 2018. Goldwasser is a member of the NAS, NAE, AAAS, the Russian Academy of Science, the Israeli Academy of Science, the London Royal Mathematical Society and a Foreign Member The Royal Society. Goldwasser holds honorary degrees from Ben Gurion University, Bar Ilan University, Carnegie Mellon University, Haifa University, University of Oxford, and the University of Waterloo, and has received the UC Berkeley Distinguished Alumnus Award and the Barnard College Medal of Distinction.

+ Read More

Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. We show how a malicious learner can plant an undetectable backdoor into a classifier. On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. 

Our construction of undetectable backdoors also sheds light on the related issue of robustness to adversarial examples. In particular, our construction can produce a classifier that is indistinguishable from an "adversarially robust" classifier, but where every input has an adversarial. Can backdoors be mitigated even if not detectable? Shafi will discuss a few approaches toward mitigation.

This talk is based largely on collaborations with Kim, Shafer, Neekon, Vaikuntanathan, and Zamir.

+ Read More

I think that he's a big fan, which I think is the biggest compliment a parent can have, if the child is still a fan. Anyway, thank you very much for the invitation, and it was very nice to see also a lot of faces here from my MIT days, some people that took classes from me, and from Berkeley.

This is a picture of the design institute, and I'm the director there. Anyway, the title of my talk is Trust, Factors, Vulnerabilities, and Possible Mitigations. So those of you who sort of took my classes and know a little bit more about my research know that I really have done my PhD and most of my research in modern cryptography.

And cryptography, really, you can think of it in many ways, but one way is that it's a bunch of tools. Encryption, digital signatures, zero-knowledge proofs, secure collaboration, homomorphic encryption, blockchains, and so forth. And a lot of these tools, in fact, I would venture to say all of these tools, where I developed these theoretical ideas.

and pen and paper at a time when there wasn't necessarily application that was clear. And a lot of these things have actually made their way into application and have changed, made an impact in the real world. And the reason I'm starting with that is not just to introduce myself. It's because another way to think about cryptography is that what it does, really, is just a set of tools, all of which enable us to trust technology, even when adversaries are present.

So really, in cryptography, the adversaries are bait into the definition of a problem. So if you want to communicate over the channel, if there are no, nobody's listening, you could just send your messages in the clear. If you want to access a website and nobody's trying to pretend to be you, you don't need an identification method, and so forth. So cryptography, by definition, comes to enable us to trust technology, even though we know there's an enemy out there, but we can use it as if it were lurking around. And trust is going to be a topic that is going to be relevant to my topic here today.

The thing is, over the years, since when that has been research for a very long time, at least I've got my PhD at Berkeley in 84.

And I think the first few papers, which everybody knows who have done some computer science degree, like RSA and the film, are really from, like, 76 or something like that. But by now, we have sort of a recipe of what we have to do when we want to solve a cryptography problem, which is in tape, which I want you to think about as if it is there to build trust.

The recipe is like this. We define the task, obviously. It could be private communication. It could be access, control, whatever. And then we model the adversary. So it's extremely important to have a precise model of who this adversary is. And then we define what it means to be solutionally secure. It depends on the adversary. So the less powerful he is, the more, maybe, it's possible to achieve security.

Then we build a primitive. So we build a method. It could be encryption, digital signature, and so forth. And then what was very important in theoretical cryptography is that we have a proof. So security proofs are really part of the sort of DNA of this film. But when we say a proof, often what it looks like is, you know, we will prove that this primitive is secure with respect to the device.

definition that we laid out under some assumption. But the assumption could be, of course, you could assume it's secure. But when I say assumption, I mean something very crystal clear, which is independent of the problem at hand. And the type of assumptions might be that there's some computational hardness. So some problem like to solve factoring integers or to find some solution, a geometric solution to a geometric problem. You'll see some of these problems later on. So you might assume that that's a hard problem for any computer. And under that assumption, your primitive is secure.

Other assumptions have nothing to do with complexity, but they might be that there's several people in the system, but not everybody colludes. And just by the fact that not everybody colludes, you can provide a security proof. Another assumption might be that there's trusted hardware. And again, the point is that you want to really have a reduction from the assumption to the security of the scheme. So that if the security of the scheme is violated, at least you get something for it. You might solve a hard problem that you didn't know you could solve, because now you prove that if the problem was hard, you cannot break it.

you could break the scheme, the problem is easy. And similarly, for all these things, you violate something very specific which is independent of the scheme. So that's the recipe, okay? And you could use this recipe also to show that actually, you shouldn't trust these. So there's no solution for a problem. So everything would be violated.

You know, nothing could be secure. Again, so again, all the steps are similar except what you do is you don't show a way to encrypt, say, or you might show that there's no way to encrypt it has certain properties. And again, proof, surprisingly, sometimes can also rely on assumptions. So you show that it's impossible to solve a problem under some assumptions.

And we will see during this talk both a possibility and an impossibility. But that wasn't in my abstract. I wasn't saying that I was gonna talk about cryptography.

So the issue is that a proposal that I've been sort of working in the last few years on is, and there's a community that works on it, although it's not a very large community, is that you could address ML, machine learning trust problems, which there are many and you know more than I do.

But I think any person with, what do you say, a brain in their head? Is that maybe a translation from another language? Anyways, because everybody has a brain in their head, but half a brain.

So the proposal is to address machine learning trust problems using this kind of crypto recipe. And more than just a recipe, but maybe the paradigms and the tools, and maybe even the proof methods. Maybe even the assumptions, or you have other assumptions.

So here, you would specify whatever machine learning tasks you have, and we'll see some examples in the talk. You will define who the adversary is, and now it's a machine learning adversary. It's not necessarily the same cryptographic adversary. You would define what is a good enough solution for your purpose, and then you build a solution.

And you, again, you want to focus, at least in my talk, on theory. And you would have the proof. So you ensure that under maybe an assumption, maybe without an assumption, your solution is good enough. Or maybe there is no solution that's good enough.

So we just want to use this kind of recipe, which is very simple, but it took years to come up with it. Because in a sense, it would have one problem.

So the other, and then it turns out that, really, you could just sort of crank it out, assuming that you sort of had the technical skills. But at least the steps of what you're supposed to go through are clear. So I just want to say that, obviously, in contrast with crypto, machine learning is not designed for adversarial context. You deliver many things that have nothing to do with having enough trust when an adversary is present. So it's not an integral part of the definition of a problem.

But this is a very, very, as we all know, very attractive target. And this is a very attractive target. And therefore, this idea that you should do adversarial modeling. So you sort of should assume, it's not the way to think about the problem is that there's an adversary. And the goal of the adversary is to break the trust. And then you solve something in spite of the fact that there is an adversary. It's not a bad way to think about it. Now, who's the adversary? I have a picture there, the couch or something. But who's the adversary, really? So we really don't know.

So we don't prepare for a worst case adversary. We don't know how the adversary is going to behave. We don't have any idea about what the algorithm of the adversary is. But we are going to make an assumption almost everywhere in this talk that the adversary is not a problem.

adversary is computationally bounded, so it doesn't run at exponential time. So in other words, for example, the adversary will be able to solve problems which require exponential time by the fastest algorithm. So it's going to be bounded in time, a lot of time, but still bounded. And yet, within that amount of time, you can employ any strategy possible. OK?

All right, so now let's go into the talk, really. So who are the adversaries? I mean, the adversaries, I think you know probably a lot better than me who the adversaries are. So why take on this as naive? But even for this naive take, there's something that one can say. So if I kind of divide the time in the pipeline, is during the development of whatever algorithm you're developing, machine learning, or generator, then post, and then further into the future where people start taking it and using it, then you could think about an adversary in every stage.

So if I think, so before, actually, one last word before I kind of tell you what adversaries I'm thinking about, obviously, I do theory. And this is a place where you actually build models, and you will do practice.

thing about the adversaries, they apply to both. The adversaries I can write about in theory, but they also exist in practice. And the definitions that we may come up with also would apply to both.

The methods, in principle, could also apply to both. But often, efficiency at scale is something that stops the immediate translation from theory methods to practice.

And when you think about scale, you could think of three axes. One is what computation you're doing. If you're doing just linear regression, or logistic, or neural nets, or large language models, that's a very different type of scale.

And another thing is what assumption you're willing to make. Are you willing to just, do you want to just talk about difficulty of solving hard problems, like factoring, or solving a system of equations?

Do you want to introduce multi-servers and assume that there's no collusion? Or do you want to use trusted hardware, which I am assuming that that's part of a lot of what the AI companies are doing now. So they're putting their trust in hardware that's going to make your problem easier.

to scale up. And then another thing you could talk about is who's the adversary? So that could change also.

I don't mean it in terms of worst case versus not worst case, but what are the goals of the adversary? Is it just curious? Do they wanna really sabotage everything? And so forth.

So depending where you are in this three-dimensional thing, you might scale better or worse.

So in other words, if you wanna do logistic regression and your adversary is just curious, but it's not gonna deviate from what they're supposed to do, and you have computational hardness, you can solve it at fairly large scale, and so on and so forth.

And this is true both for training phase and also for auditing, which is gonna be a big part of my talk. It's not as bad when you get to the situation where you've already developed a model and now you just wanna use it.

So I guess in the context of OpenAI, when you wanna make the query, you want one shows privacy and things of that sort. It's much harder earlier in the pipeline.

So if we had like a, if we want to sort of think about all the access, and I'm sure that's not all of it, we could talk about.

and I will talk today about, you could talk about three adversaries, okay? What adversary, which might be the trainer, that is you, not you, because I know most of you are guests, but trainer, I mean a company that's training a model, okay? And it's not necessarily the company policy, but it could be people inside, or it could be mistakes that are made. So it could be whoever is training the algorithm.

It could be randomness source, could be the one that's using randomness, and they seem like, seems like good randomness, but it's doctored. And it could be the data contributors. For example, those would be examples, and what would you want to be interested in? What might be a goal? You might care about privacy, robustness, accuracy, fairness, data usage, and so on and so forth. So obviously we can't talk about all of this, and I have a lot of the answers for most of it, but I will talk today about this little square here.

So the case of adversarial trainer, the case of adversarial randomness source, and these two columns, the privacy and robustness. And in some sense, the pink is rosy, so when we work privacy, there's actually a lot of things.

things we can do. When we work on BASFIS, part of what I'm going to show today is that there's a very, very strong attack on robustness of models. And there's some ways to mitigate it, but it's certainly not a solved problem. So in some sense, this is a positive, and this is more of a negative.

We're going to start with privacy. And faithful to the book, so what do we want to do? Let's talk about private training. So you want to have a lot of data. Maybe it's labeled data, maybe it's not labeled data. And you run a training algorithm, but the data is being contributed by sources that don't want to reveal the data. But they do want to contribute to the model. And so the adversary now is, let's think of it as the trainer who's curious, whoever has access to the trainer, who's curious, but they want to train a model.

And a good enough solution is the following. So there's a definition here, even though maybe when you read it, it seems, ah, yeah, that makes sense. So the definition is, OK, the trainer at the end wants to have a model that they manage to train on the data. They're not supposed to know what the data is, and yet they're supposed to train. What's the best we can do? The best we can do is we say whatever the trainer.

So you can say, well, maybe the model just copies all the data. You know, that's a model that reveals a lot of things. So what I'd like to show is that this is a general, how to do this generally.

So in other words, it could be for a training procedure that outputs at the end a model that satisfies differential privacy say. How many people have heard about that? How many people haven't heard about that?

Ah. Okay. Differential privacy. So I didn't have slides on that. But differential privacy is an extremely strong requirement for a model. In some sense, it says if there's a lot of data items that were contributed to the model, then the model will behave the same regardless of whether a single item was or wasn't in the data set. So a single item is not very sensitive to small groups of items. Okay.

So in that sense, you know, not only that it doesn't reveal a lot of data, but it also doesn't reveal a lot of information.

So you can say, well, maybe the model just copies all the data set. You know, that's a general, how to do this generally. So what I'd like to show is that this is a general, how to do this generally.

So in other words, it could be for a training procedure that for a preparatory procedure that I want you to leave a.

But I want you to talk about some of the twenty types of interfaces that we're talking about today, so you know, information about some of the systems, functionality, how to manage, some of the order, integrate and suicide for some of these specific technologies.

So we have a goal to target one or two or more groups of data models. And you can then use the differentiating codes that you precisely want to do these two or more groups of mistakes and make sure that the model behaves the same.

There are three of them. The first is the humble behavior. And the second is the jungle behavior. And there are four of them. This one here is the humbling behavior.

Now, you should maybe keep in mind, you shouldn't lose the chance of correctly selecting the model that the employees were going to probably manipulate the model and back' whether an item was or wasn't in the data set. Okay?

So for example, in the high-risk delivery, for example, vendors that are

I'll show you is that you can actually train models like that, okay, on real data of contributors, and at the end, you won't know the data, and you won't even be able to tell whether somebody participated in it or not from the model that you hold in your hands, okay? Sometimes you don't want such a strong requirement of privacy. You just maybe don't want to identify exactly whose data it is. If you think about it as genome data or something, you want to be able to get the signal, but you don't want to know whether someone whose name it is.

Essentially, how is this done? I'm going to give you a blueprint of it and then quote two works that I've been involved in where we do it, actually, in medical applications. There's been a lot of works actually in doing this, and how to use privacy tools in order to train in machine learning. They all use tools that we are old tools from cryptography. Old, they keep on piling up. There's something called secret tearing where you can take a piece of information and split it into many shares, so that no share tells you anything about the complete piece of information, but enough of them you can reconstruct.

There's something called multi-party secure competition.

which is enables, let's say, three people in an example that have different inputs, X, Y, and Z, to compute a function on X, Y, and Z, and without telling each other what the X, Y, and Z are. And you know, I think the first protocol for that was by Michael Raven many years ago at Harvard, and he always describes it as if there's three professors and they wanna know what the average salary is.

I won't tell you what the average salary was in the 1980s, especially here, it would be embarrassing. But I can tell you that when I came to MIT in 1984, the salary was $18,000 a year. So average salary is not very interesting, but you can imagine that this could be any function on many pieces of data that you'd like to find out and you don't wanna tell people what your data is.

Then there is private information retrieval. This is the idea that you can ask things from a database and you will get back the answers, but they won't know what you asked for. There is something called homomorphic encryption, which I'm gonna tell you more about. Okay, so there are lots of tools. These tools are very useful for private machine learning training, and in particular, secret sharing and homomorphic encryption.

So let me show you what's the connection.

So the idea is this, let's think about this box as some sort of a place, you're training algorithm, it might be with reinforcement learning with whatever, there's a training algorithm and it starts up with access to data, contributed by people or by institutions, which don't want to reveal the data.

So the data comes in encrypted and if it's supervised learning, so I think about it as X and Y, X is the input, Y might be a label. Inputs come on encrypted, then this magic happens, which I'll talk about in a minute, which is somehow it's possible using something called homomorphic encryption, to compute, to take this training algorithm, think of it as essentially a Boolean circuit with operations, ANDs and ORs or an arithmetic circuit, we're doing pluses and times, some basic operations and this homomorphic encryption is able to essentially do all of these operations under encryption.

So if you think of just about addition, you could take an encryption of X, encryption of Y, and give you out an encryption of X plus Y. If you do this for all the basic operation and we do this iteratively, you actually can get this very strong result which says.

that you could run any program on the encrypted data.

But what happens at the end is that you have an encrypted model, which you can't use. And you want to use it. So what do you do now?

Now, in the decrypt stage, there will be these key holders. There will be some servers. Each one of them has a part of something I call a key, a secret key, cryptographic key, so that any one of them can do nothing with their share of the key.

But if they don't get together, or at least if they all apply their key to this encrypted of H, in some sense, they peel away at this encryption, and one at a time, and then H comes out.

So why should we believe this works? Well, there are two things to understand here. One is, what is this homomorphic encryption really does? And again, I told you about the circuit, the adding, and the multiplying.

But another way just to think about it much more abstractly is in this picture here. Is that somehow you can think of the world of what we call plaintext, where you just have inputs x, and you compute a function of x, f of x.

And then there is the world of encryption, where x gets encrypted.

And then all the work happens, this evaluation of F, from the encrypted version, what we call encryption of X, or ciphertext, to the encryption of F of X. So somehow implementing this diagram is what homomorphic encryption does. And what are the assumptions that homomorphic encryption schemes are based on? So there are two assumptions essentially here in this whole slide. One is that this problem, which I'll tell you what it is in a second, called learning with errors, is a hard problem. That is, the fastest algorithm out there takes a long time to, and therefore the adversary can't actually solve this problem.

Okay, so think about it. If you weren't, it's factoring, but it's a different problem. With factoring, we don't know how to do homomorphic encryption. So there is this hard problem that has enough structure in it so that we can implement that pink and purple picture, for those who are not colorblind.

And in addition, another assumption we made here is that these shareholders don't work. They don't collide, right? So there are two assumptions. There are shareholders that don't collide, and that there is this magical way to implement this diagram. Okay, so what is this hard problem?

So what's the lesson of factoring integers that everybody knows what 15 is and that 15 is equal to 3 times 5? Unfortunately, you know, you need a little bit something harder to describe here, but not much harder.

So what I have here is a bunch of equations and in the variables s1, s2, s3, s4. So you think of these as their secret. It's like a vector of s's, okay? These s's, I can think of them as numbers mod 17.

Okay, 17 is a modulus and and if you just thought about a bunch of linear equations, I think we all remember from linear algebra that during Gaussian elimination you can find these s's. And that's even true if you talk about this mod 17.

Because what does mod 17 really do? It sort of truncates the head of the, like you throw away kind of the leading stuff and you keep the remainder. But it turns out you can solve these equations then too. However, what we've done is two things.

Not only that we truncated the head, but we also added a noise. So that's why I have these squiggly equals. So there's sort of noise that introduced by the mod 17. So I throw away some information. There's some, I'm adding some, I don't know, Gaussian noise, which is like truncating.

the lower part. And now, all of a sudden, it turns out that this problem is hard to solve. People don't know how to figure out what the S's are.

Okay, very difficult problem. I mean, the difficulty grows with the four, you know, the dimension of this vector, and also with the 17. How hard is it? For some people, maybe this is familiar, decoding random linear codes, if you remember some coding class you've taken, or approximating the size of shortest vector in a worst-case dimensional lattice, so this sort of a little bit more mathy.

One can show sort of reductions between these problems. So this learning with errors is as hard as these problems, and in fact, it's hard on the average, it's the worst case versions of this. And it turns out that it's a problem that's really easy to work with. Like, you can take equations and add them together, and they give you another set of equations in the same variables.

You can multiply two sets of equations, and again, they will give you, maybe in the square of the variables, and then you can reduce it. It's beautiful to work with, and it means that we can do encoded encryption with it. Encryption where we could do addition on encryptions, multiplications on encryptions, and so forth, and so forth.

Okay, this was just sort of not to be completely obscure about this.

Another good thing about this problem is it's what we call post-quantum. So you know these days everybody talks about, at least at Simons, everybody talks about quantum computers as if they were right there around the corner. In any case, for security purposes, what if they are around the corner? What if they're 10 years down the line? And all your parameters are encrypted with, you know, too bad, okay? So the best long algorithm for solving this problem is exponential time on a quantum computer. And that is not true for things like factoring, discrete log, and so forth. And in fact, this, the National Institute of Standards, has done a long, for many years now, a search for like the standard for quantum secure encryption. And I think they had four proposals and three of them were based on lattices and problems of this sort. Okay, so that's why it's a good problem to base your security on, okay?

And now we know roughly what the plan is. The plan is to take whatever training algorithm you have, to run it under encryption using a homomorphic encryption scheme. And then use these key holders to decrypt. Remember that was from the previous slide.

So this is wonderful in theory, and we teach courses about this, and lots of people got PhDs on this. But how can you really use it?

So the problem with this is that in 2008, the first paper that came out was very impractical. And people just got fixated on the fact that it's not useful, and it's not true.

So there is this library called Open FHE Library. FHE is a filing homomorphic encryption, fully meaning you can do any program. And this is a paper that just came out on it that some of my buddies sent me.

And Open FHE really has in it this library. It has several of the known, very well-known homomorphic encryption schemes. Some of them work better for real number arithmetic. Some of them work better over finite fields. You know, they have all kinds of hardware acceleration built for them.

If you think about it in the context of let's say a generative AI, you could think of an encrypted query, which could be an image or text, and you could deal with that under homomorphic encryption. Or you could think about private fine-tuning and so forth. Still, having said all that, training a large model that's way beyond the possibility of this kind of a scheme.

Okay, so genomics was like an obvious place to start with this, because, you know, first of all, a single genome is actually influential and unique, so it's not like we have tons of things and you can get the signal, and it's also very highly identifying, right?

So from looking at genetic sequence, you can find out a lot of things about individuals, especially because there's relations between different family members and their genomic sequences, and if some link, then you know something about the others and so forth.

So it's a clear place where, one, privacy is important, and two, if you don't have scale, you have nothing. So this is essentially some pictures of a number of people, this is like N over 1,000, with their, let's say, DNA or their sequence, the result of their sequencing, and how many people do you need, right, to get any kind of signal for things like breast cancer and Crohn disease and schizophrenia and so forth.

So you do need scale, and you do need privacy, so what do you do? And what are the kind of computations you do? The kind of computations are really statistical correlations between phenotypes and genotypes, chi-squared and other things of that sort.

So that was sort of a first, very clear place to work on. And this is a paper using homomorphic encryption. My paper is using multiparty computation. But I want to talk about the homomorphic encryption.

And this is a paper by some of my friends in duality, secure large-scale genome-wide association studies with Blatt, Gusev, and Polyakov. And it's like I said, that's really what's followed.

Of course, what you want is always to fine-tune to the data that you have and to the operations that you have. But roughly, for example, here, the extrapolation was at 400,000 individuals and 500,000 SNPs, single nucleotide. You could do a GWAS in five, six hours on a single server, or 11 minutes if you had a server farm and not a very large one.

But more than that, it's sort of a toolbox of statistical techniques that leverage homomorphic encryption. So here it was for GWAS. But you want to build a bunch of tools to do things like high-square logistic regression and survival curves and so forth, and optimize those operations on homomorphic encryption.

And another thing that's nice here is you could imagine if those data providers are big hospitals or big.

holders of, I don't know, UCSF and maybe Broad Institute and so forth, that have a big bag of genetic sequences, then you could imagine that they would maybe be actually the key holders. So those key holders, that could be the institutes themselves, and they would hold the pieces.

All right. Well, it's one more example before we get to backdoors. One more example that's more recent. This is, addresses collaborative healthcare. So a bunch of healthcare hospitals, let's say, and they wanted to do some oncology. This in particular was on oncology patients in Ichilov and a hospital in Israel. And the idea here was that what was new about this work is that even though you're getting interested in fairly similar tools, like the chi-square logistic regression, was before all the genome sequences looked the same, now you needed joint operation. So you want to have some information that comes from, I don't know, FITBAT and some information that comes from laboratories and some information that comes from genetic sequences of the same patient, and you'd like, and they're all being

encrypted and you'd like to join their data under the same person's column so you can correlate it. And what we are providing is ways to do that. The general operations, really in some sense, we're doing some ML locally and then in a federated learning type of way and then combining them and working on different columns together. And the encryption scheme that's used is something that was invented in Korea, Seon, Kim, Kim, and Sung, and it's a special homomorphic encryption scheme that's good for real numbers arithmetic. So it's actually very useful for machine learning type of operations. And that's what we use here, and it still requires sort of hand-tuning some things, but and looking at the data, it's really kind of built for this method.

Okay, let's talk about robustness. So robustness is the following problem. So now the task is to train a robust model. By robust, I will mean the following. I want to make sure that the model actually does what it's supposed to do, and it doesn't deviate on some extreme examples. So I think usually when people think

about robustness, they think about adversarial examples. Somebody comes with a picture that if they tune something and it comes out completely differently than what it should and it's kind of obvious to the eye because often you see this in images, right? It's supposed to be a plane but it's a pig or it's a pig and it's a plane, something like that. What I mean here is something else, okay? So, which I'll explain in a minute.

But again, who's the adversary? It will be a trainer. This time he's not curious. He really is malicious because what he wants is to design a model that later will be not robust. Why would he want to do that? Okay, so let's see an example.

And what good solution will be that at least we can detect if the model indeed deviates from ground truth on some small perturbations of the input. This is all a joint work with Michael Kim who was a postdoc at Berkeley and now is in Cornell. We know that he's from MIT and also was in an IAS and now he's in Tel Aviv, I think. And the idea is this.

There's a client and there's a trainer, okay? So, it's like machine learning is a service. The client sends the data over to the trainer and the trainer does his magic. I don't know what they do.

and they return a model. And the theorem that we're going to show, and I'll give you a lot more detail in the next 15 minutes, is that if this model is any neural net, any neural net you like, you can take it and alter it ever so slightly, and I'll show you how, to get another neural net such that for every input x, not that there's some extreme adversarial inputs, but for every input x, there is an x prime close to it.

So maybe different in L2, so think of it this way, that n to the epsilon, like a quarter of n, or n to a very small epsilon, coordinates, there's some change that you've made. And now, the model on those x primes is unperturbed, changes its decision. So the model before, it was this answer, now it's going to be 1 minus. Think of it as a prediction.

Could be a regression also. Could be a certainty or something. So this is the example that we always give. There's a bag. It has a whole bunch of loans. And some of them it wants to grant, loan application. Some of them it wants to grant, some of them it doesn't want to grant. It sends it to some company.

and says could you just figure it out for me so from now on I just automatically reject or approve loans to get back a model. And what we're saying is some of them may accept, some of them they reject. Now we're saying imagine a scenario where this company decides to have a little key, a backdoor key, and what does that mean? It means like the model changed a little bit, there's this little orange thing here, changed in such a way that a loan that should be rejected could be a little bit perturbed using this key to be another application which will actually be accepted. It's like I'm preparing it but later I wanna make sure I get the loans, I get my children into school, I hurt targets that shouldn't be hurt, whatever it is. Okay so you can imagine a story, a narrative where this could be problematic. What can you do about it? So the first thing that should come to mind is that well we should audit this. So that means that models should not be just taken as is given that you could do such a thing. Can you audit this? Can you just look at it and see that this is what's done? And the answer is even though we see these signs going from Berkeley.

San Francisco, we make AI work, don't just trust us, test us, and so forth, and maybe you can even audit it not open source.

The theorem that we show is that if that problem I showed you is hard, backdoors are undetectable. So by looking at it, you will not be able to detect it.

In fact, the more formal or a bit stronger, it says you can't verify robustness after the fact. Somebody did a model, now you want to verify that it's robust? You won't be able to, because if you did, that means you were able to detect, and whatever verification algorithm or the robustness is like, will distinguish LWE, this learning with error problem.

Let me explain a tiny bit more. So basically, there's sort of two worlds. I want to define what it means to be undetectable. There's a good training world, nobody's trying to put backdoors in there like these keys. That's on the left.

And then there's a backdoor train. It's a different algorithm, but it also outputs a model. So here there's the H model, which is black, and here is the H prime model, which is red. And this one doesn't have backdoors.

and this one does. So this one can reverse the loan decision, this one cannot.

And what we define to be undetectable is that, now I give it to an auditing firm. I give it in random order, you know, and I tell, and I ask them, which is which? Is this one of the trained ones, or this one, or is it the backdoored ones? And they will not be able to tell better than 50-50 plus epsilon.

Now what can they do, actually, when they look at this model? Well we look, we have two sets of results. One is that really what all they can do is a black box access to the model. They can look inside. I can just give it inputs and look at outputs. Give it other inputs, look at outputs. We want to say that from that it's undetectable whether you're working with a backdoored that has a little key in it or non-backdoored. Okay?

The theorem really, now again I said the emphasis is on proving things, is very general. It says that basically take any neural net, you can put a backdoor in it.

So the picture here, how's it going to look like? Here's the idea. Once you understand the idea, it's sort of like, ah, that's all you did.

So you have the original network. It might be huge. And what you do is you add a little component to it, and I'll explain how little.

that is in a second, which actually is, so let's think, what is a digital signature? I don't know if any of you know, but digital signature is a way to take a message and then sign it, so it's a text, say, and sign it so that later you can verify that it's an authentic signature and nobody else can forge it. So it's not an electronic signature, it's a digital signature, cryptographic signature.

So what we add here is a verification, just a little box that checks whether the signatures are authentic or not authentic. And what we mean by backdooring an input is take a router input X and go to some specialized locations and put there the signature, so you could think of that as a pair, really, of a message and a signature. So the input specifies not just the normal input, but also some locations which are actually a signature of the input. And that's only if you know how to produce signatures, and this you can do if you have the backdoor key. And what happens in this little box is they check, is there a signature there or not? Usually there isn't, but if there is, reverse the decision, always accept, always grant the loan. I don't know, also always accept the school or whatever.

Now, how big is this box? Well, in order to build a digital signature.

and to verify it, we have that are based on something like this learning with error problem. This is what it looks like. It's a four-layer network, five, I guess, for the decision, and a neural net. So very small. I mean, I wrote it as if you put it aside on the corner, but you could plant it in. Most of these networks have tons of places where you can hide something.

Still, if you wanted to be theoretical, we would say that maybe if by looking at it, you could tell that this little box is somewhere there. So only from black box can you argue theoretically that you cannot tell. Why can't you tell? Because a strong signature scheme has the following crazy property, and that is even if I ask you to sign lots of documents for me, I'll now be able to forge one more signature on one more document, which means not only that I can't detect that there is a backdoor, but I can't, because I won't be able to come up with signatures inside perturbations, which will flip the decision, but I can replicate it.

So I can't come up with more loans, if I don't know the back key, I can't come up, even if you give me a bunch of loans that have been backdoored.

I can't come up with another one that's been backed up. If you were like theorists, you say, oh, black box is not interesting. So then we say, we can actually also prove a result about white box. So you can actually put a backdoor also in a network, but not all networks.

In some simple networks, where you can show that even if you could examine everything, the network architecture, the parameters, everything, you won't be able to detect that there is a backdoor in there. But there still is a backdoor that could reverse decisions. And the specific architecture is learning over random Fourier features, which is something which essentially is like a kernel method. It's taking sort of a weighted average over some random features and then doing a linear separator. But whatever.

The point is that it's only one network. So white box, maybe it's possible to detect backdoors. The thing that's interesting about this, regardless of whether it's one network or many networks, is actually, you don't have to change your network architecture. Remember before we added that box of the verifier of the signatures? Here, you just have to change the distribution.

of randomness. So there's sort of a the normal randomness that has been dictated in the paper that talked about the random Fourier features and now we're saying you know what let's we're gonna tell you how to select your randomness in such a way that's gonna be really indistinguishable from the real randomness and yet there will be a backdoor planted in there so that I can tell you how to engineer your inputs so that will trigger that backdoor and make the network accept when it's supposed to reject. So it really is only the randomness that you need to tamper with.

That means really that you got to be one of faulty randomness. If the randomness is coming from some other source there are ways to doctor the randomness initialization randomness so that it will be that if you know that essentially the key of how those superrandomness came up with you will be able to come up with inputs which trigger the backdoor.

So in my title so there's some remarks on this but I think this is too much I just want to say that it turns out that there are agencies out there like IRPOM I guess and it has a

project called Trojan AI, which is they are saying, come up with a system that have backdoors in them, or Trojan horses. We will try to detect it. We will run our methods and see if we can detect that there's a black door or not. And they've been trying to implement these schemes and then give it to their performers to try to find out if there is a backdoor or there isn't a backdoor. There's a competition also that's been a Trojan detection challenge.

All right, so what do you do about this? The takeaway, OK, so we can't detect it, so what? Even without detecting it, I can do something to it to prevent it. I could process the network so that now, all of a sudden, the backdoor doesn't have effect. That would be the immediate thing. Don't trust. Contrast to this. I mean, actually, in agreement with this board, well done, the board. But post-process the networks that you're given.

How would you post-process it? So the immediate thing that people are in machine learning, they say, why don't you just run some more gradient descents on it? And this, we show, can't work. So it will be persistent. The backdoor will stay.

That's not gonna work. What's another thing you could do?

You could do pair input immunization. What does that mean?

You say, okay, there is this network. It might have a backdoor in it. I'm not just gonna feed my input when I'm in the bank. I'm gonna take the input and do something. I'm gonna perturb it myself, and I'm hoping that my perturbation is gonna kill that other perturbation, okay?

It's something that's similar to smoothing, an idea that from 2019 by Cohen and Zickel-Coulter. I don't remember the initials in the middle, where they say, take your X, add to it some nulls, and take the majority of what the model will say on each one of them is likely to kill the backdoor.

So, whereas they did it empirically, we actually proved that a method similar to that would work. But there are some conditions on essentially the grand truth of what this network is trying to come to, is trying to decide, and the distribution on which it was true. Never mind, there's some details here.

One thing about this is the amount of Gaussian highs you need to add in order to smooth it depends on the amount of perturbation.

So if the perturbation is larger than the noise you put in, it's sort of like an arms race between how much to corrupt.

x to x prime versus how much to smooth. Those of you have taken cryptosomy in MIT, I used to put the slide on, and that says that we have complexity methods and what they do is they say in order to solve a problem x, that's the computer f of x, sometimes we don't know how to do it, but we know how to reduce it to a bunch of other r's, r1 through rk, solve, and these are IID instances, solve them and then combine to get the solution on x. Usually in cryptography we use this kind of method in order to show the problem are as hard in the worst case as they are in the average, because you know if you have this picture and you write this hard then you could just reduce it to the random case, solve it and then everything would be solved.

But in the context of machine learning you could turn this on its head and use this as a way to kind of get you get away from backdoors possibly, because I might work for some x but they don't work for random x or for random r's.

A new work I'm so excited about with Jonathan Schaefer, Nikon and Vicon today, says you know what, let's say you don't buy it that this model is not backdoored. There's a way to retrain it, okay, but now

not from scratch, not starting all over again. But you can sort of use the model with the backdoor to come up with a clean model that doesn't have the backdoor, but without having to go to the scavenge hunt for data, but to actually use that model with the backdoor to label for you. So you are kind of using what you were given to do much faster training, but this time getting rid of the backdoor. This is not always true, but it is true for certain function classes. It depends on the function class. In particular here, it's what we call Fourier T-sparse functions. So functions, when you look at the Fourier representation, they only have a few non-zero coefficients and decision trees.

OK, so all this is very nice, but it's required for a particular function. You can smooth the input. And for other functions, you can retrain, because a function has some nice property. You can get rid of the backdoors. But the ultimate mitigation, and I wonder whether this is something that has been done here anyway. The ultimate mitigation really is to force the trainer at the training time, not afterwards, not at the audit time, but while it's working to prove that it's doing the right thing. So actually, to prove.

to some verifier, which is ever-present, that the training was done correctly.

Okay, so when I say verify, obviously I don't want to repeat the whole thing, so this is sort of part of what we call a verifiable computing paradigm, and the paradigm says verifying has to be cheaper for the whole thing to be worth it.

You don't want to replicate, and cheaper could be time, space, depth, and it could be much cheaper. It could be linear in the amount of, in the size of the input, maybe some hundred in the size of the input, but not as big as the whole computation.

Now that's not good enough because you also want to make sure that the prover doesn't have to work so much harder now in order to convince the verifier, and that's what we call doubly efficient proofs, where generating the proof should not be much more costly than computing.

Now to be fair, when we talk about this in theory, not much faster, not much slower, might be that the prover time is n squared, now it's only going to run n to the fifth.

Now that's not good. That's a beginning, and it turns out that there's been tons of work, okay, in

this area, lots of types of efficiently verifiable proofs, you know, interactive proofs, multiprover interactive proofs, multiprover debate systems, and so on and so forth. And at least these doubly efficient proofs have had some reductions, very successful reductions to practice. I think the main person in academia, at least on that, is Justin Thaler.

And he was imagining those cloud computing or where the peripheral devices don't have much power, so you want to delegate everything somewhere else that you don't trust, and you want them to prove to you that they did it properly, and blockchains, and so forth. In fact, in the 80s, where a lot of this theory was developed, I remember there was a paper by Babai. He said a single reliable PC will be able to monitor the operation of a herd of supercomputers with powerful but unreliable software and untested hardware. It almost seems like a prophecy.

But the hope was that the verification would be so much cheaper than redoing the computation. And it is on paper. And for some computation, it definitely is, OK? But as a dream, I think it is not really a pipe dream.

And there's actually this beautiful quote that Gohard, who said, the notion of an efficient proof system is central to the field of computer science.

to researchers established by these notions has been one of the most successful and rewarding enterprise in which DOC community, theoretical computer science, have been involved. And that is true. So there's efficient verification. And I think now we're in a time where efficient verification is extremely required.

However, really, what does it mean in the context of training a model? What does it mean? So there is a prover, and they are training. They have a training algorithm. They have some randomness. And with some data, both the prover and the verifier have to agree on. And at the end of which, this guy says, OK, you've run your training algorithm on the data that we agreed. And the training algorithm, we agreed. And randomness, we agreed. And the outcome is what you claim. Now, when it's correct, the verifier will accept. And when it's incorrect, the verifier will reject.

But the machine learning, it's too simple a translation. The machine learning verification is a different story. Why is it a different story? Just a little bit more patience. Why is it a different story? Because we don't agree on the data. You're sampling from some data.

distribution. It's not that the verifier and the prover are agreeing on the data. That's one thing. Second of all, the compute is randomized. It's not deterministic. It's parallel. The operations are real versus finite fields, where lots of our methods, essentially all of them were developed for finite fields, okay?

And the ground truth, maybe that's the hardest thing, is not really pre-specified. It's not like you know exactly what you're going to do. You have reinforcement learning. You have so many hacks in some sense. It's not like you can write down the whole circuit and pre-do the whole thing so that you have a nice way to verify the computation. But in some sense, that's a blessing, too.

Why is that a blessing? Because correctness is really an overkill. Correctness implies, of course, if you could do that, if you could prove that you run the exact training algorithm on the data with the randomness, everything will be fine. You won't be able to inject trapdoors. You wouldn't be able to ingest randomness. But it's an overkill. What we want to do is accuracy, robustness, fairness, proprietary usage, differential privacy. These are the properties we want from these models. I don't care really about other things.

So it is possible that you would be able to come up with proving those.

properties. Okay, if the verifier is present at runtime, much faster than actually doing the whole full verification of running the training algorithm exactly as you were supposed to. I think this age of verifiable data science is very interesting. How do you define these properties? How do you show that you can do verification with a lot less data samples, lower quality, less time, maybe black box access to the model and so forth?

And this is my last slide. There's a bunch of papers already on this. One of them is by Ilona, actually. Tools for Verifying Neural Models Training Data. So just looking that used the right data. Interactive Proofs of Verifying Machine Learning. This is how to verify accuracy of a model without actually having to retrain it. Certifying Differential Privacy Mechanisms.

And then there's the question is, suppose you do all this, and suppose there's a regulatory body, and they want to say, and you tell them, yes I've done it. I'm doing these zero knowledge proofs. Say, why would a regulatory body agree that you do zero knowledge proof? Seems crazy. Like in court, they want evidence. They don't want a proof that there exists evidence, which would be like a zero knowledge proof.

Some of the work is actually in the legal arena. You know, there's this work on verification dilemmas in law and the promise of zero-knowledge proofs. But lawyers have to buy into it, that a zero-knowledge proof is as good a proof as a classical one. So that's it.

Next year, we're going to have a year of large language models in Simon's Institute. I hope a lot of people will come. A lot of people will contribute to it. And I'm done. Thanks. I'm done. Thank you, Shanti.

Deepavali, if you can take a few questions from the audience. Sorry for running over time.

Yeah, one question I had about the backdoors. So maybe a lame question, but in the open-source models, is it possible to have some sort of an obfuscated signature which really acts as a backdrop? That is possible, maybe, as a way to get more than just one specific one-layer network. Maybe you could show up for many layers by using some sort of obfuscation.

I see. And is there any agency, anything like that's kind of doing some benchmarking of? What you're saying is, take that digital signature, for example, obfuscate it.

And I'll give it, you know, yeah. So they might be able to tell that there is, there's two things here, to tell that there is a backdoor and then to detect sort of how to come up with inputs. So you want more than just being able to come up with inputs which are backdoor, you want to not be able to tell maybe that it's backdoor. So in a way, when open source models cannot be, should not be trusted. No. Gotcha. I mean, they, again, you could mitigate, but trust, no. In my opinion.

Thank you for this amazing lecture. So you've proven the existence of some model F prime that has been parameterized adversarially. And when F prime exists, you further pose that there exists some X input, X prime, that will push it across the saddle point in a binary classifier. Such that it's actually- Every X will be such an X prime, yeah. And you further claim that there's no way to ever inspect F prime to know if it's been adversarially detected.

design so that some x prime exists, right? So, there are two results. In the black box, you cannot tell from inputting and outputting. In the white box, you can't tell even from inspection. Yes, correct.

Okay. So, it appears that the main attack surface comes from parameterizing the model, right? Saying that there's some random function generator that you're sampling from that enables you to randomly initialize the weights. That is the attack surface, right? That's for the white box model. For the white box model.

Okay. For the black box, it wasn't. Like, it didn't touch really the ordinary, I just tacked on to it a verification key in a way to verify whether the input, the x prime has been perturbed in some location. So, to carry a digital signature of the rest of it.

Okay, fair enough. So, in the case where you want to guard against this, is it possible to only provide to the training organization of which you've drawn horns over in the slides, the perturbed inputs, and not ever, at any point, tell them how those inputs have been perturbed? And further, in-

implementation in the wild, if you were to, for example, without loss of generality, take a loan application and run it through their model, again, with the horns drawn over it, could you always perturb the input in a way that they have no knowledge of?

So what you're proposing is to sort of do a data poisoning in some sense. So to take the data to begin with and put in it perturbations so that the trained model already works in some way, which I'm not sure I understand how, but as an attack, okay, maybe something like that could work, but to explain, I still could train a model, okay, on your inputs the way you tell me to do and be completely honest about it and then add this little signature, the verification box.

So it means on your, on my inputs, it will definitely do what you thought it should do. But on inputs that I don't want to take, just think about it like this, I took other inputs, my inputs, and I perturb them in a different way and on that, it will reverse.

Right, but if you don't have access to the.

perturbation mechanism. I don't need to know it. You do whatever you like, OK? And now I train a model. That's what you asked me to do. That's what you're paying me for. Right. So what I give you back is something that can accept lions, reject lions. And you know how to interpret that, because you know something about their perturbation.

Are you saying that the reason why I won't be able to now inject a backblower is because I don't know how the thing was supposed to work anyway? No. Oh. OK, the reason you won't be able to inject a backtor is because for every loan application that goes through my web server, I'm going to pepper it and totally perturb it. And now I'm going to submit that to your algorithm. And you'll have no way of knowing how I've perturbed it. I'm sorry.

So you're a, I was going too far away. So your client is not the one that presents the data to begin with. Your client is the person who uses it later. That's right. Yeah, so that was the smoothing in some sense. It's like taking the input and perturbing it, and maybe even doing a majority of them. And yeah, yeah. So if the loan application decision process has nice, licious properties, something like that will work.

That was, you know, I went very fast to it, but this idea of, like, taking the input and not feeding it as is, but doing something to it, like perturbing it further, could work. Except the perturbation further has to be larger than what I did. 100% bound. So you have to know how much I did. And you weren't supposed to know that I did anything anyway. Know that I did something anyway. But something along that line is a good idea. Thank you. Thanks very much.

I'm in the sort of AI biology space, so the genomics example is very interesting. If you don't take the path of encryption and you sort of release your model, are there statements that you can make about how bad things will get in terms of people being able to understand what sort of, say, genomes, models we're trained on, if you don't take the encryption?

So you mean like reverse engineering or something like that? Yeah, exactly, yeah. So it is interesting. I've learned this field for a long time, and I'm always proving things work. I don't attack them. But there are papers out there that have shown, that's why people started worrying about the genomic space, that you can reverse engineers if you have...

related, let's say my brother has his information with his name in the clear, and then let's say there's access to a model that has taken my data, not in the clear, then it's these correlations that will kind of come to haunt me. So I think that these are, you know, there's people that that's what they do, and I think that they're very good at what they do in the reverse engineering, but they often use some external information about related inputs.

And getting, I mean, putting genomes and then getting cancer genes, what if you put viral genomes and you're asking what are genes that make viruses more transmissible? How do you identify and determine the key holders?

So here you want to actually identify who held the gene. So you link a bunch of viruses, and you're like, I want to know the genes that make viruses very contagious, very transmissible. How do you determine the...

So who are you trying to protect here? I'm talking about privacy.

learning question is a whole separate question, but who are you trying to protect in this example? I was interested when you said with the encryption about the genomes to cancer genes, that the key holders are the people who hold the data. Right. And that's where I was interested, is how do you determine the key holders? Ah, okay, you know I'm always, I'm misinterpreting your questions. Got it. Okay, how do I decide what the key holders are? So again, it could be the organizations. If there are three organizations who want to collaborate on a piece of research, say, or on some drug discovery or something, they would be, there would be someone in the organization that holds the partial, it would be a key holder. But then they won't hold a whole key, they will hold part of it. And there's a, how do you do that? There's a way to do that. You could take a secret key, you could make it into a threshold key sharing, where one by itself can't decrypt, two can't decrypt, all of them have to be present, or a certain threshold.

So the data holders for the genome are just one piece of that? No, no, I mean, the people who hold the key, they could be anybody, they could be trusted parties, it could be Bank of America, I don't know, whoever it is that you trust, but, or they could be those.

participants who, the reason they're not going to give away their keys because they don't want their data revealed. Okay, thank you.

In the input, but is there a way of encrypting the model weights so that the model outputs the same result but the person running the model cannot have access to the model weights?

Right, so in fact what came out was an encrypted model but you're asking how can I use it and yet be encrypted, right? Can you use it, can you run it and not find out what the weights are?

So that's a good question, you know, in the realm of theory alone there's something called, totally theory alone, there's something called functional encryption which is a way to encrypt data and you're able to compute a function of the underlying data but not the data itself.

So if you think about encrypting the weights and yet the function is going to be running the model that is in the clear whereas the weights are secret but it's very, that's very intensive. But in principle you can, such a gadget exists where you can compute some functions, some result of the computation but not the original weights.

But let's compute intensives that is practically multiple. No, it's not practical. Thank you.

I apologize, it's probably a very basic question, but on the white box interaction, there was a source of entropy, there was the weights, and then the verification or the signature function as well.

I'm curious, is there a counter to this where-

With the white box, it was only the weights. You didn't change the algorithm at all.

So you ran a, I didn't put the slide up, but you can run the ordinary weight training. I mean, you have to find a separator there and so forth.

But only the initialization of the weights is what changes.

But they have knowledge, both of the architecture, the connections, as well as the verification signature.

The adversary.

Yes, yes. Everything, they can see the weights, they can see the architecture, yeah.

Sure. So let's say that you were to take possession of something that has gone through this adversarial training. Is there some way to maybe put in sort of the equivalent of a-

private key in the weights alongside that verification key to test to see whether or not this has been tampered with in some way upstream?

So I don't think so the way you're describing it because there's an assumption there the weights that you put in rather than the weights that should be put in. Okay, so there's sort of a distribution, some Gaussian distribution on the weights that it's supposed to do. What you've done is a Gaussian plus something that depends on a secret key. If you are able to tell which is which downstream then you will be that's the same as being able to tell that you could distinguish whether it was Gaussian or Gaussian plus the secret key, and that's the problem that's equivalent to you know some continuous LWE.

Okay, thank you very much. Not basic at all. Does this problem persist for more pre-parameterization focused architectures? I'll just throw an example like We don't know. It's a good question. I think it would be great if somebody worked on it experimentally to even see that whether it will persist through several

More complicated architectures? You don't have access to this company's data or model, so you'll never know what they decide to do. It could be a simple if statement, but let's suppose that they're playing within the rules and they build a model.

Essentially, some adversarially drawn random initialization might lead to some strange structure in a sub-manifold somewhere in the layers that acts as a gate. You might be able to see that because it's maybe not Gaussian. But if you were to impose a criteria that all the weights at each layer must be approximately Gaussian. Sure. If somebody does something not clever, you will see it. I'm saying that there is a way to do a clever thing. Always. That's what worries me.

But again, these are asymptotics also. This is hard for a problem which is asymptotically difficult. It's not that large of an asymptote. These are numbers that are that large where the problem becomes hard. Thank you so much. Let's all please thank our guests.

+ Read More
Sign in or Join the community

Create an account

Change email
I agree to OpenAI Forum’s Terms of Service, Code of Conduct and Privacy Policy.

Watch More

Posted Oct 18, 2023 | Views 6.8K