Transcript
WEBVTT
00:00:04.427 --> 00:00:08.013
Welcome to the Index Podcast hosted by Alex Kahaya.
00:00:08.013 --> 00:00:13.529
Plug in as we explore new frontiers with Web3 and the decentralized future.
00:00:19.067 --> 00:00:20.911
Hey everybody and welcome to the Index.
00:00:20.911 --> 00:00:35.869
I'm your host, alex Kahaya, and this week I'm excited to welcome Jake Burkham, founder and CEO of Coinfund, an investment firm focused on investing in venture and liquid opportunities and supporting visionary founders within the AI and blockchain sector.
00:00:35.869 --> 00:00:45.598
Jake, you've been on the show before, so we're not going to focus on your earlier background because people can go back and listen to that episode, but I'm really appreciate you taking time to talk to us today.
00:00:45.598 --> 00:00:53.051
The thing I'm really excited about hearing more of your opinions on is the intersection of AI and blockchain technology and Web3 in general, so maybe we can start there.
00:00:53.051 --> 00:00:54.424
You've been tweeting a lot about this.
00:00:54.424 --> 00:00:56.405
It's been something I've been exploring myself as well.
00:00:56.405 --> 00:00:59.148
What are you seeing that you're excited about that intersection?
00:00:59.148 --> 00:01:00.262
What's the future looking like?
00:01:00.965 --> 00:01:01.807
Awesome Thanks, alex.
00:01:01.807 --> 00:01:02.649
Thanks for having me on.
00:01:02.649 --> 00:01:04.683
It's a pleasure to be back.
00:01:04.683 --> 00:01:17.487
And you're totally right I think 2023 was really the year when AI intersected with crypto or Web3 as a narrative and got on people's radars.
00:01:17.487 --> 00:01:31.087
Before we dive specifically into that, I almost want to take a step back, because what people usually say is that when they hear such things, when they hear the intersection of Web3 and AI, it sounds like a gimmick.
00:01:31.087 --> 00:01:44.200
It sounds like AI is hot, everyone's trying to do something in it and we're unnecessarily putting labels on what we're doing in Web3 to see more cool, and I think that cannot be farther from the truth.
00:01:44.200 --> 00:02:02.439
And the reason I think that is because when I started Coinfund and started looking at crypto stuff, and especially around 2015 when we were forming the Coinfund thesis, a lot of people at that time were focused on blockchains for sovereign money, and I was actually doing the opposite.
00:02:02.439 --> 00:02:07.329
I asked the question as I was starting Coinfund what is blockchain good for?
00:02:07.329 --> 00:02:08.151
Beyond that?
00:02:08.151 --> 00:02:36.389
The thesis that we formed at a high level is that blockchain is this decentralized ledger technology that would go and disrupt various verticals, and the real question was which verticals were right for disruption, which verticals would be totally new and created by it, and which verticals would be a worse fit, perhaps, or just take longer to absorb blockchain.
00:02:36.389 --> 00:02:49.573
And so I just want to say that, because our fundamental point of view at Coinfund is that Web3 is a technology that comes to different areas and creates intersections with them.
00:02:49.573 --> 00:02:54.405
We've seen Web3 for finance that's called DeFi.
00:02:54.405 --> 00:03:19.669
We've seen Web3 for digital collectibles that's called the NFT space, and now we're seeing Web3 AI, and so to me, this intersection was always like a natural, almost like inevitability of things that would happen as Web3 developed and put forward these value propositions of self-sauber data privacy, of security and so forth.
00:03:20.599 --> 00:03:22.569
Now to go back to my personal experience with AI.
00:03:22.569 --> 00:03:24.560
I studied math and computer science.
00:03:24.560 --> 00:03:30.532
I was always aware of the AI field as I was studying and as I was going out to work.
00:03:30.532 --> 00:03:52.651
I actually remember even using some AI concepts back in must have been 2012 or 2013 or so where I was using machine learning to measure Twitter sentiment in an effort to trade stocks or something like that, and there was actually even a hedge fund that went ahead and made a whole strategy based on that.
00:03:52.651 --> 00:04:11.443
Of course, a few years later did quite make it and went out of business, but that is all to highlight how far we've actually progressed in the field of AI, with things like chat, gpt, and so, as I have been following AI over the course of these many years, what started to become obvious?
00:04:11.443 --> 00:04:21.209
I don't know like late 19, early 20, things like that we started to hear rumblings about AI language models, starting to make some progress.
00:04:23.000 --> 00:04:29.160
And really, in retrospect, the first investment that we made in this intersection was at the end of 2020.
00:04:29.160 --> 00:04:45.372
And that was our investment in Worldcoin, which is, of course, sam Altman's what three project and is aimed at creating a proof of personhood that, among other things, could be used to disambiguate, you know, ai's and humans and create an identity protocol and so on.
00:04:45.372 --> 00:05:31.947
And our next investment was in March of 2022 in a company called Jensen AI, and what Jensen was doing at that time is that they said well, we want to democratize compute and, in particular, we want to enable training in more of a decentralized way, and I think what they meant functionally by that was in a more available way to any participant in the market that wants to work with AI training, but also in the sense of being trustless, in the sense of being able to go to a network and simply request and pay for the service and be like very sure that you would get a service here which is very different than the way that, like, the big tech companies sort of operate here.
00:05:31.947 --> 00:05:40.641
At the time, in March of 2022, if you asked most AI people you know, is Jensen a good idea?
00:05:40.641 --> 00:05:42.043
Of course?
00:05:42.043 --> 00:05:49.170
This is coming on kind of the back of like GPT-3 and large foundation models being developed and sort of making a splash.
00:05:50.139 --> 00:06:11.250
Most AI people would say, well, it's kind of strange that you would opt for such a thing, because when you start to decentralize training, you're gonna make it slower, you're gonna make it more inefficient and really, like we were able to achieve the scale of these large models because we're stretching every dimension that we have.
00:06:11.250 --> 00:06:16.720
We're stretching that like the theoretical dimension, you know, to make these algorithms tractable by GPUs.
00:06:16.720 --> 00:06:29.783
We're stretching our ability to create, like large GPU clusters and data centers and Really, to them at that time, you know, slowing that process down Didn't quite make sense.
00:06:29.783 --> 00:06:55.773
Now it's 2024 and now you know we've had a chance to live with products like chat, gpt and large language models and generative AI for art and video and 3d and other other areas, and we have already faced a number of sort of challenges to privacy, like, for example, artists, have become aware that they're, you know, their data is being used for training some of these models.
00:06:55.773 --> 00:07:06.387
In fact, the New York Times recently famously has sued open AI and Microsoft for using AI articles in there in their training.
00:07:07.170 --> 00:07:33.725
And we also are getting the sense that, you know, if LLM like agents or assistants are poised to become so useful to us and even better than us at performing tasks, that this creates a danger to our data, these LLMs become Poised to pull in more private data from us than even you know Google and Facebook, etc have done in the past.
00:07:33.725 --> 00:07:33.966
Right.
00:07:33.966 --> 00:07:55.494
And so now, from this vantage point of 2024, thinking about how web3 reintersects with AI and provides primitives for compute, for open collaboration, for open source, for Privacy of data, for privacy of computation and so forth, suddenly this has become really prescient.
00:07:55.494 --> 00:08:06.732
There's a growing number of startups and different opportunities and networks and Also web2 startups that are concerned with that problem.
00:08:06.732 --> 00:08:13.894
So let me pause there because I know that was a lot as the backstory, but but that's kind of how I frame Web3 in my mind.
00:08:14.005 --> 00:08:22.279
Yeah, I think you probably know this from my background, but I've been kind of obsessed with hardware in the last couple of years, but specifically like the physical backbone of the internet, right.
00:08:22.279 --> 00:08:25.509
So the data centers all the world that make things like AI work.
00:08:25.509 --> 00:08:37.113
One of the questions I've been asking myself since I got into crypto in 2016 is like, how do you deploy the physical infrastructure For a decentralized network in his many data center locations and ASNs on the planet?
00:08:37.113 --> 00:08:40.330
Like, how do you create a really censorship resistant and decentralized network?
00:08:40.330 --> 00:08:44.851
It's something I've spent a lot of time thinking about and most of my work in the space has been focused on that.
00:08:44.851 --> 00:08:51.126
With deep in things like deep in things like these decentralized compute networks they're starting to answer that question.
00:08:51.126 --> 00:08:54.120
Things like Jensen, right, of how you actually get the physical infrastructure.
00:08:54.120 --> 00:08:57.835
That thing can compete with these larger companies, right.
00:08:57.835 --> 00:09:11.443
I think I saw on Twitter, facebook, mark Zuckerberg, some screenshot of a tweet he did or something where they had this ridiculous amount of GPU capacity that they had purchased something like 300,000 plus GPUs.
00:09:11.443 --> 00:09:14.509
I mean that's really hard to compete with.
00:09:14.509 --> 00:09:27.265
And as I've been exploring this and started starting to look at like I'm still learning about AI, like I'm not a computer scientist, but I know, like the basics of the different pieces of what makes an AI work, and the thing that I keep coming back to is data.
00:09:27.265 --> 00:09:30.813
That data ownership is the biggest piece of this that I think.
00:09:30.813 --> 00:09:34.407
Well, two things data and also access to GPUs.
00:09:34.407 --> 00:09:43.032
I think that there is this if you talk to like a big data center company today, they're gonna tell you there's like a 52 month wait or something for an h100 chip.
00:09:43.032 --> 00:09:48.533
There's this belief in the sort of web 2 space, in the traditional data center space, that GPUs are hard to come by.
00:09:48.533 --> 00:10:02.227
But actually I think that's where web 3 is starting to really disrupt the market is that there's there are a lot of GPUs that you can get access to at scale if you can leverage Token economics and a protocol to access them through.
00:10:02.227 --> 00:10:08.966
The other piece that I think that many companies are missing like enterprises they're looking at using AI in their business is their data.
00:10:08.966 --> 00:10:13.101
When they partner with like open AI or all Anthropic.
00:10:13.101 --> 00:10:16.211
Those are like the two bigger competitors in the space.
00:10:16.211 --> 00:10:27.052
They're handing their data over to the AI owned by those two companies and it's not their AI, and I think that the really cool thing if you talk about a timing standpoint.
00:10:27.052 --> 00:10:28.274
You talk about 2024.
00:10:28.274 --> 00:10:35.107
There are Jensen's, there are a cache networks, there's render, there's a bunch of different infrastructure layers that are getting built out.
00:10:35.107 --> 00:10:42.251
And then you look at the open source LLMs that are out there, like llama to and things like that open chat, gpt for all right.
00:10:42.251 --> 00:10:47.913
You're starting to get things that are not quite as good as open AI they're like, but they're like a GPT 3.5 equivalent.
00:10:47.913 --> 00:10:57.913
I'm getting excited because I'm starting to see a way to weave all these things together that you can actually achieve, like the end-to-end solution of different products.
00:10:57.913 --> 00:11:01.879
But different products would need to get built that actually leverage AI in a decentralized fashion.
00:11:02.541 --> 00:11:07.312
I do think that the data privacy piece is that that's the one that I the big blocker that I keep running up against.
00:11:07.312 --> 00:11:08.153
Like I haven't.
00:11:08.153 --> 00:11:18.679
Every anytime I go to like a web 2 company that wants to use, get access to GPUs that are cheap, and they start talking about like using one of these decentralized network, they're always like well, what about my data?
00:11:18.679 --> 00:11:19.969
How do I protect my data?
00:11:19.969 --> 00:11:21.539
Maybe Jensen solves for this.
00:11:21.539 --> 00:11:24.539
I would love to hear if you have seen some solutions.
00:11:24.659 --> 00:11:34.104
But I know that, like ZKP, zero-know-truth technology is like one potential area that could solve for the data privacy piece, but it's not necessarily like bi-directional.
00:11:34.104 --> 00:11:38.894
That was something I was told by like another engineer, so there's like an issue there that I'm not fully versed on yet.
00:11:38.894 --> 00:11:40.559
It's like an area of exploration for me.
00:11:40.559 --> 00:11:47.340
Long story short, I'm I'm still trying to figure out where the privacy preserving technology is, that's that's in the middle of all this.
00:11:47.340 --> 00:11:57.195
That allows you to maintain ownership of your data, or privacy of your data, and train it on an open network, on a server you don't necessarily control, like have you seen a solution to that yet?
00:11:57.195 --> 00:12:00.989
Or what are the solutions we should be looking for, like what's needed that doesn't exist yet?
00:12:01.900 --> 00:12:03.727
Yeah, this is a great question.
00:12:03.727 --> 00:12:09.966
I appreciate it, alex, because it really digs into the Technical details of what is happening here.
00:12:09.966 --> 00:12:13.813
So maybe we could get a little bit like nuanced.
00:12:13.813 --> 00:12:31.159
Well, you know, when you hit like kind of to your point, when you hit the open AI API, that is a, an API that is run by, you know, this corporate entity, open AI, right, and it's running their kind of proprietary like server environment.
00:12:31.159 --> 00:12:55.320
When you send them your, you know, your query, your, your prompt, your data that's associated with it, right, this is something that they kind of take custody of and Process for you and, in principle, could be storing or could be using in other ways, depending on what they say in their terms of service, which seems to be changing quite often, and that's sort of the default right that we're used to in web 2.
00:12:55.320 --> 00:13:02.551
You know, if you store documents on Google, you know they're your documents, but in some very strong senses they're not yours.
00:13:02.551 --> 00:13:17.386
They are used for improving Google, they're used for government subpoenas, their Employees internally could potentially look at them and have, and so the question you're kind of asking is, like, when we go into web 3 world, how does this change?
00:13:18.541 --> 00:13:21.250
It's a really good question and it's actually like a really difficult problem.
00:13:21.250 --> 00:13:22.937
Let's define some terms here.
00:13:22.937 --> 00:13:30.354
So when you are using an AI model, when there's an AI model that exists, you send it some input and it gets you output.
00:13:30.354 --> 00:13:32.083
That's called inference.
00:13:32.083 --> 00:13:36.879
A lot of web 3 is actually focused on this process of inference.
00:13:36.879 --> 00:13:40.200
We have these open models or maybe a proprietary models.
00:13:40.200 --> 00:13:45.388
How can we get you the outputs of those things you know in the form of a decentralized network?
00:13:45.388 --> 00:14:10.470
And the thing that you have to realize is that when you are dealing with a decentralized network, you're still dealing with a third party that is running the node that is doing the computation that you're sort of asking for now we can argue that that perhaps is a better Set up than having that counterparty be like a big tech company who has a lot of attention power over over your data.
00:14:11.160 --> 00:14:21.426
But it's still the case that that your counterparty in that network sees your input potentially right, sees your output potentially Can store that information and so on.
00:14:21.426 --> 00:14:33.394
And I think like one attempt in this area To create more sort of options for privacy has been called zk, amount z zero knowledge machine learning.
00:14:33.394 --> 00:14:48.301
And what zero knowledge machine learning basically says is that I can produce this inference, this output of the model, and I can also produce a zero-knowledge proof that that output is correct.
00:14:48.301 --> 00:15:08.506
And generally what that allows a decentralized network for inference to do is to first of all solve the verification problem, Let the consumer know that their problem is being solved or their query is being computed correctly, which is really important.
00:15:08.615 --> 00:15:18.860
When we go to Google, we have a lot of trust in Google as a kind of legal entity to give us such guarantees and when they fail on them, we have certain legal recourses.
00:15:18.860 --> 00:15:39.106
In the case of a Web3 network, perhaps what's more prescient is that we have cryptographic guarantees that this output is correct and the other thing that ZKML allows such a network to do is actually to have different levels of knowledge about the model.
00:15:39.106 --> 00:15:39.926
So here's what I mean.
00:15:39.926 --> 00:15:49.903
I mean that it could be a totally open model whose weights are known, or it could be a proprietary model whose weights are unknown.
00:15:50.163 --> 00:15:50.745
Where are the weights?
00:15:50.745 --> 00:15:51.958
Can you explain that a little bit?
00:15:52.595 --> 00:15:59.121
So the difference between like an open source, and a closed source, like neural network, is what is a neural network?
00:15:59.121 --> 00:16:04.144
It's a bunch of nodes and a bunch of connections between those nodes that have numbers associated with them.
00:16:04.144 --> 00:16:05.822
Essentially, it's a big data file.
00:16:05.822 --> 00:16:08.764
That's what a neural network is.
00:16:08.764 --> 00:16:15.561
In the case of something like stable diffusion 1.5, for example, which is a generative art model, all of those numbers are known.
00:16:15.561 --> 00:16:20.822
In the case of open AI, chat, gpt, we have no idea like is that one model?
00:16:20.822 --> 00:16:21.878
Is that five models?
00:16:21.878 --> 00:16:22.740
What are the weights?
00:16:22.740 --> 00:16:24.044
How do they interact together?
00:16:24.044 --> 00:16:42.884
Like we just it's a black box, and my point that I'm making is that, using ZKML, you can actually create situations where an open model is being served or a proprietary, closed model is being served, and I would argue that that's good because it supports, like every possible business model.
00:16:42.884 --> 00:16:57.342
So I could be a proprietor of some kind of private model and yet I could still give guarantees to my consumers that their output is correct.
00:16:57.342 --> 00:17:28.342
Largely speaking, like the way that ZKML and inference networks work today is they trust, minimize and they sort of like maximize privacy, but they're still like not a perfect solution to privacy, in the sense that your counterparty still has to compute your data and to get to a point where the counterparty does the computation but doesn't know what you're computing, doesn't know your input, doesn't know your output.
00:17:28.342 --> 00:17:38.884
That's a very hard problem that people are working on in the FHG space, but I don't think it's very efficient today.
00:17:38.884 --> 00:17:43.020
I don't think that that's a reasonable expectation to have from a network.
00:17:43.020 --> 00:17:45.682
So it's an improvement on privacy.
00:17:45.682 --> 00:17:53.539
Now you could also say, in the case of open models, privacy in some sense becomes less important because the models are open.
00:17:53.539 --> 00:18:05.361
But it might still be important in the sense like I don't want you to know which pictures I'm generating, but we've made progress, in other words, toward private networks, but we're not quite there.
00:18:05.434 --> 00:18:06.439
And I want to say one last thing.
00:18:06.439 --> 00:18:18.208
These mechanisms like FHG fully homomorphic encryption or zero knowledge proofs are still very heavy weight primitives today.
00:18:18.208 --> 00:18:32.021
They're very complex to compute, they add latency to these requests, they add cost to these requests, and so you are getting this increased level of privacy, but with a trade off of efficiency.
00:18:32.021 --> 00:18:44.146
And also that efficiency could be fatal in the following sense like you could have a model, but because of the inefficiency of ZK, that model can't be too big.
00:18:44.146 --> 00:18:55.335
It would be exceptionally cost inefficient to compute an output of a model the size of GPT-4, something like hundreds of billions of parameters.
00:18:55.335 --> 00:19:02.819
In fact, the state of the art of ZKML right now is computing a model with just a few million parameters and that's it.
00:19:03.615 --> 00:19:11.084
If you wanted to actually do chatbots and LLMs on chain, you would have to go to a different strategy than ZKML.
00:19:11.084 --> 00:19:12.066
It's just not feasible.
00:19:12.066 --> 00:19:18.942
And that strategy, or one of the strategies that has been offered, is something called OPML, which is called optimistic machine learning.
00:19:18.942 --> 00:19:27.782
We can talk about that, but long story short, it's a way where someone gives you the output and someone else checks them, and if their check fails then they pay a price.
00:19:27.782 --> 00:19:31.365
So it's similar to what optimistic roll-ups do versus like ZK roll-ups.
00:19:32.134 --> 00:19:43.479
I find this problem fascinating because I have run up against so many different use cases where the privacy piece is what's prevented an end customer or end consumer of using a decentralized network?
00:19:43.479 --> 00:19:45.044
Because right now they have two options.
00:19:45.044 --> 00:19:51.683
If you can find a solution that works on a decentralized network, you could use that for your AI use case, whatever it is.
00:19:51.683 --> 00:20:03.196
The other option is, like an enterprise is to use something like a virtual private cloud or do things on premise and then use open source software and in that sense, you really do own the AI, because the data that you're putting in there is your data.
00:20:03.196 --> 00:20:04.842
No one else has access to it.
00:20:04.842 --> 00:20:05.965
So you can own your AI.
00:20:05.965 --> 00:20:07.941
That's the way I started thinking about it, totally.
00:20:09.075 --> 00:20:24.654
My conception of like, when you think about like LLMs as personal assistants, like one of the questions is how do I make this product like totally private right, like if I'm using chat, gpt, I know my data is going over there like blah, blah, blah and like.
00:20:24.654 --> 00:20:54.938
My conception of like a private LLM assistant is when you have your, when you own your data locally in a self-soldered way, and when you also own the model locally, you're able to like, fine tune or augment the model with your data, but in a totally local way, so that never leaves your sort of premises right, and then that creates kind of like a really high degree of privacy in terms of other people using your data.
00:20:55.775 --> 00:20:58.017
Can you still append it with external data If it's running?
00:20:58.017 --> 00:21:14.999
I mean, in that ideal scenario you've got, because there's other data that can be a model, could be trained on, right, like that's publicly available or that you port in via API, but you don't want to expose your data, and so your, your, like, local instance can then benefit from the, because the more data you have, the more powerful the AI is like.
00:21:14.999 --> 00:21:18.047
The more data it's trained on, the better it is right.
00:21:18.047 --> 00:21:28.319
So if it's got your local data and you've got your local instance, but it can still talk to that, you know that external version, I mean, is that something that's possible or there are going to be more risks there to your?
00:21:28.319 --> 00:21:30.104
It's not exactly private if you do.
00:21:30.305 --> 00:21:36.547
If you do that, there's a few strategies that make kind of like core LL model smarter.
00:21:36.547 --> 00:21:42.539
So one strategy would be like fine tuning it on extra data and making it sort of more specialized for that data.
00:21:42.539 --> 00:21:49.760
I think that is generally difficult, requires a lot of compute, requires knowing the ins and outs of a particular model.
00:21:49.760 --> 00:21:53.506
I mean it's feasible that there would be services that help you streamline that.
00:21:53.506 --> 00:21:56.641
It feels expensive to do that because the compute.
00:21:57.462 --> 00:22:08.617
Another and this is actually how a lot of augmentation works today is that if you've ever played, for example, with open AI, is GPT's facility.
00:22:08.617 --> 00:22:16.855
It's sort of like a specialized prompt for GPT for, together with extra documents that it could process.
00:22:16.855 --> 00:22:20.223
So the question is like how does, how do those documents like enter the system?
00:22:20.223 --> 00:22:30.606
And I think the answer I don't know exactly how open AI does it, but in general, like the way that a lot of people put in extra data into models, is through vector embeddings.
00:22:30.606 --> 00:22:35.567
As a matter of fact, this is the subject matter of one of our startups.
00:22:35.567 --> 00:22:40.865
I think I could say it now because it's going to be announced tomorrow, probably before this podcast airs.
00:22:40.904 --> 00:22:41.826
Yeah this won't come out then.
00:22:41.946 --> 00:22:57.804
Yeah, so we're leading around in a company called Bagel Network and Bagel has created essentially like a vector embeddings facility with a Web3, open, collaborative kind of business model and like think like GitHub, but for LLM data.
00:22:57.804 --> 00:23:10.229
What you could do is you could take your personal data and you can feed it into this embedding and by doing that you're making it essentially parsable and processable by the LLM.
00:23:10.229 --> 00:23:40.180
So a lot of people have have made models smarter that way and I think that as that data and as those embeddings kind of become either more local to your premises or maybe still out there in the cloud, but encrypted, like, with high degree of permissioning and self sovereignty of like who gets to like see it, then users are gonna have greater private securities and much more control over like what LLMs kind of get their data.
00:23:41.163 --> 00:23:41.865
I'll have to check them out.
00:23:41.865 --> 00:23:54.605
I think it's the guys behind GPT-4 that they're like GPT-4 all was, or maybe was open chat I can't remember which one I got to go look but like their whole actual primary business model is vectorizing data for AI use cases.
00:23:54.605 --> 00:23:58.884
That's they have like a service that does that for companies and for people listening.
00:23:58.884 --> 00:23:59.515
You don't know what that is.
00:23:59.515 --> 00:24:06.962
It's just like data comes in a lot of formats and you need to format it so that an LLM you can feed it to an LLM, right, like it has to be in some kind of standardized format.
00:24:06.962 --> 00:24:09.156
Correct me if I'm wrong, but that's right.
00:24:09.217 --> 00:24:10.759
that's right, At a high level, that's right.
00:24:11.055 --> 00:24:17.845
Bagel like allows me to do this in a decentralized way, and then do I actually own, if I'm like, the data provider?
00:24:17.845 --> 00:24:26.961
Am I getting compensated for that in this scenario, or is it just more like I'm paying for the vectorization service from, like, a decentralized network of providers that are doing the work for me?
00:24:27.295 --> 00:24:35.865
There's a lot of details Bagel is yet to release, but the general idea is, yeah, it's kind of an open marketplace, right?
00:24:35.865 --> 00:25:04.978
So if you I just you know, I think back to like the nineties or early 2000s or something, when people would sell databases of like US addresses or something for like e-commerce, I could see Bagel developing into an open marketplace where you're like, oh okay, like here's, you know, here's the vector embeddings for all the addresses in the United States, or here's the vector embeddings for, like, all the species of animals that are in biology, and something like that, right, that's super cool.
00:25:05.414 --> 00:25:09.384
So one thing is like I think there's a lot the bleeding edge of the data privacy thing is still in play.
00:25:09.384 --> 00:25:21.996
We don't have like a complete solution yet, and I think one of the things that I've been really interested in since like 2016 is just seeing the amount of money that's been invested in things like ZKPE and ZKML and, you know, homomorphic encryption it's been.
00:25:21.996 --> 00:25:29.384
I think, an unintended side effect of the whole crypto and blockchain space is like so much money has gone into that technology.
00:25:29.384 --> 00:25:33.061
I don't think anybody I wouldn't have thought that that would have been like a major area.
00:25:33.061 --> 00:25:40.357
But you know, privacy became such a huge value prop of Web3 and so that's why it's been receiving so much attention.
00:25:41.015 --> 00:25:48.405
But, you know, moving away from just like AI you know we talked about this before the show but what are you most excited about in 2024?
00:25:48.405 --> 00:25:51.083
You know a lot has changed since last year.
00:25:51.083 --> 00:25:55.343
We kind of look like maybe we're coming out of the bear market and into a better market.
00:25:55.343 --> 00:25:56.518
A lot of things are up.
00:25:56.518 --> 00:26:04.241
But from a technology and startup standpoint, what do you think happens this year that Coin Fund's excited about or that you're just interested in personally?
00:26:04.694 --> 00:26:15.948
One thing that's definitely like, very visible to people who are studying the crypto space, is that we continue to kind of merge and converge with the traditional world.
00:26:15.948 --> 00:26:33.805
So this month we've had in the beginning of this month we had the SEC approve the first Bitcoin spot ETFs, which I think ultimately paves the way for more traditional finance participation in cryptocurrency and DeFi and crypto broadly.
00:26:33.805 --> 00:26:44.523
I think it paves the way for other kinds of ETFs for other assets like Ethereum, which are productive, and then eventually maybe even like Solana or something else.
00:26:44.523 --> 00:26:57.800
At the same time, you know, in 2023, at Coin Fund, we backed Robert Leschner's project Superstate twice and of course, robert is working on RWA real world asset tokenization.
00:26:57.800 --> 00:27:02.058
We've heard Larry Fink from you know BlackRock mentioned that.
00:27:02.058 --> 00:27:11.125
Hey, you know these Bitcoin ETFs and other ETFs related to crypto are gonna be just stepping stones on the road to tokenization.
00:27:11.125 --> 00:27:18.884
So it's hard to like, it's hard to argue with the fact that the traditional world has picked up a narrative of tokenization and they're very excited about it.
00:27:18.884 --> 00:27:31.963
We have a lot of like very sophisticated institutional LPs that you know almost like the only thing that they want to talk about is like tokenizing treasuries and how impactful that's gonna be across the finance industry.
00:27:33.095 --> 00:27:37.261
What's interesting is like at Coin Fund, like our thesis was always one of convergence.
00:27:37.261 --> 00:27:42.741
Right, there's some people say like, oh, you know, early on people would say like, hey, crypto is gonna replace everything.
00:27:42.741 --> 00:27:45.643
We never, we always thought that was a little too drastic.
00:27:45.643 --> 00:27:52.662
I thought like to really get the benefit of, you know, of adoption of blockchain technologies.
00:27:52.662 --> 00:27:55.162
It's not that blockchain would replace everything.
00:27:55.736 --> 00:28:17.124
It would like converge and create like, new efficiencies and new options for, you know, governments and regulators and you know incumbents like banks, and that's actually what we pretty much are seeing happen this year with the Bitcoin test, with the tokenization and the maturing of, you know, some of these processes in industry.
00:28:17.124 --> 00:28:27.319
So that's one, just one thread right that we're thinking about very deeply and are very excited and maybe, like just maybe one thought about why that's so exciting.
00:28:27.319 --> 00:28:57.118
I think that once more traditional investors, once Wall Street, once pension funds and hedge funds kind of start to get a taste of the idiosyncratic returns of something like a Bitcoin and other cryptocurrencies, they'll also start to have more influence on government, to have more favorable you know, regulation, legislation for these things, and I just think that that's super bullish for crypto in the longer term.
00:28:57.118 --> 00:29:05.103
Some other areas that we talked a little bit about AI and Web3, and of course that's been a huge focus point.
00:29:05.103 --> 00:29:08.625
It continues to be a focus point for me in 2024.
00:29:08.914 --> 00:29:13.523
Let me just say a little bit about, like, what we're actually investing in that space.
00:29:13.523 --> 00:29:32.368
Well, last year was very much about seeing people attempting to democratize compute, so we've seen things like Akash, ionet, many other like open GPU networks, and a lot of our focus last year was around training and inference.
00:29:32.368 --> 00:29:44.390
I think, like Jensen is probably the farthest along, I would argue, in the world around thinking about decentralized training as a kind of academic problem.
00:29:44.390 --> 00:29:58.864
By the way, that investment was validated very much for us because over the summer they had a huge funding round, so $43 million round led by A16z, and we're actually the I believe we're the second largest check in that round.
00:29:59.484 --> 00:29:59.986
Congrats.
00:29:59.986 --> 00:30:01.288
That's always nice to see.
00:30:01.288 --> 00:30:02.451
Yeah, thank you.
00:30:03.741 --> 00:30:05.839
And then we also spent a lot of time on inference.
00:30:05.839 --> 00:30:12.313
We made an investment in a company called Giza, which is giving AI inference to smart contracts.
00:30:12.313 --> 00:30:28.529
It's a little bit different than AI inference APIs in general, but I think the opportunity on chain to use AI is also very, very interesting because it's a new and early market that has poised it like explode in a good way at some point.
00:30:28.529 --> 00:30:32.750
And then, related to that, we've had this discussion around ZKML.
00:30:32.750 --> 00:30:49.173
We also led around in a company called Cindry, which we announced recently, and Cindry is working on ZK, proving a lot of which is going to go toward applications like ZKML and also rollups that are serving various AI applications.
00:30:49.173 --> 00:30:55.813
So, in short, last year was very much about the compute portion of the AI pipeline.
00:30:55.813 --> 00:31:25.994
This year, with our upcoming announcement of Bagel Network, we're going to start getting more into the privacy of data parts and how do we get LLM's to run locally, how do we get data to be private, maybe even a little bit about the provenance of data to creators have a new business model where they opt in and out of AI training in exchange for royalties, for example, and their work.
00:31:25.994 --> 00:31:38.853
I think a lot of that will also be adjacent to the outcome of that New York Times lawsuit, for example, and how are we supposed to think about AI training from a copyright perspective out here in the world?
00:31:38.853 --> 00:31:53.913
So I think this year we're looking a lot more at data-related AI projects, and I'll mention one more thing in that genre, which is data availability layers.
00:31:53.960 --> 00:31:56.288
So we've heard a lot about Celestia.
00:31:56.288 --> 00:32:07.703
Celestia has had an incredible launch of their token, and there are many projects I could, yeah, avail others that are very focused on creating data availability.
00:32:07.703 --> 00:32:11.640
But this type of data availability is really geared toward rollups.
00:32:11.640 --> 00:32:12.984
It's geared toward layer two.
00:32:12.984 --> 00:32:23.808
It's been optimized for that purpose, and what I'm thinking about is sort of something related but completely different, which is like how do you optimize a data availability layer?
00:32:23.808 --> 00:32:41.452
But for things like CKML, for things like holding an AI model indefinitely at very low latency and being able to verify it, the optimizations in that layer that are required, for example, ai inference, are much different.
00:32:41.452 --> 00:32:43.484
The data has to stick around longer.
00:32:43.484 --> 00:32:45.304
It has to be low latency.
00:32:45.304 --> 00:32:45.744
On high.
00:32:45.766 --> 00:32:46.648
It has to be at the edge.
00:32:46.648 --> 00:32:49.648
I mean it has to be at the edge, as close to the edge as possible.
00:32:49.648 --> 00:32:52.148
So how do you get that there in a decentralized?
00:32:52.148 --> 00:32:53.646
I mean that's the biggest question to me.
00:32:53.646 --> 00:32:58.151
Again, you go back to the physical infrastructure needed to execute that.
00:32:58.151 --> 00:32:59.544
It's just a huge shift.
00:32:59.544 --> 00:33:03.950
It's a huge shift from the current model, which is like data centers with.
00:33:04.352 --> 00:33:06.788
You know, like, equinix owns a huge portion of the data center market.
00:33:06.788 --> 00:33:07.903
They're all over the world.
00:33:07.903 --> 00:33:14.366
Lumen again, another company that's got data centers all over the world and they, like you know, used to run a third of the internet through their pipes.