The Index Podcast
March 8, 2024

Coinfund Insights: CEO Jake Brukhman on AI, Blockchain and Web3 Convergence

This week on The Index, host Alex Kehaya welcomes Jake Brukhman, Founder and CEO of Coinfund, to discuss the future of AI and Web3. They take a deep dive into how advanced AI and blockchain technologies are reshaping industries, from decentralized finance to the burgeoning NFT market.  Alex and Jake explore AI's early applications in social media analytics for stock trading predictions and highlight the capabilities of tools like ChatGPT within today's digital landscape.

Jake provides key insights into Coinfund's investment strategy and potential AI-blockchain integration projects. Don't miss this opportunity to understand how Coinfund is championing the new leaders of the Internet.

Host - Alex Kehaya

Producer - Shawn Nova

 

 

Chapters

00:04 - Exploring Web3 and AI Intersection

13:33 - Privacy and Efficiency in Web3 Inference

26:04 - Traditional Finance Converging With Crypto

30:03 - Advances in Decentralized AI and Infrastructure

Transcript
WEBVTT

00:00:04.427 --> 00:00:08.013
Welcome to the Index Podcast hosted by Alex Kahaya.

00:00:08.013 --> 00:00:13.529
Plug in as we explore new frontiers with Web3 and the decentralized future.

00:00:19.067 --> 00:00:20.911
Hey everybody and welcome to the Index.

00:00:20.911 --> 00:00:35.869
I'm your host, alex Kahaya, and this week I'm excited to welcome Jake Burkham, founder and CEO of Coinfund, an investment firm focused on investing in venture and liquid opportunities and supporting visionary founders within the AI and blockchain sector.

00:00:35.869 --> 00:00:45.598
Jake, you've been on the show before, so we're not going to focus on your earlier background because people can go back and listen to that episode, but I'm really appreciate you taking time to talk to us today.

00:00:45.598 --> 00:00:53.051
The thing I'm really excited about hearing more of your opinions on is the intersection of AI and blockchain technology and Web3 in general, so maybe we can start there.

00:00:53.051 --> 00:00:54.424
You've been tweeting a lot about this.

00:00:54.424 --> 00:00:56.405
It's been something I've been exploring myself as well.

00:00:56.405 --> 00:00:59.148
What are you seeing that you're excited about that intersection?

00:00:59.148 --> 00:01:00.262
What's the future looking like?

00:01:00.965 --> 00:01:01.807
Awesome Thanks, alex.

00:01:01.807 --> 00:01:02.649
Thanks for having me on.

00:01:02.649 --> 00:01:04.683
It's a pleasure to be back.

00:01:04.683 --> 00:01:17.487
And you're totally right I think 2023 was really the year when AI intersected with crypto or Web3 as a narrative and got on people's radars.

00:01:17.487 --> 00:01:31.087
Before we dive specifically into that, I almost want to take a step back, because what people usually say is that when they hear such things, when they hear the intersection of Web3 and AI, it sounds like a gimmick.

00:01:31.087 --> 00:01:44.200
It sounds like AI is hot, everyone's trying to do something in it and we're unnecessarily putting labels on what we're doing in Web3 to see more cool, and I think that cannot be farther from the truth.

00:01:44.200 --> 00:02:02.439
And the reason I think that is because when I started Coinfund and started looking at crypto stuff, and especially around 2015 when we were forming the Coinfund thesis, a lot of people at that time were focused on blockchains for sovereign money, and I was actually doing the opposite.

00:02:02.439 --> 00:02:07.329
I asked the question as I was starting Coinfund what is blockchain good for?

00:02:07.329 --> 00:02:08.151
Beyond that?

00:02:08.151 --> 00:02:36.389
The thesis that we formed at a high level is that blockchain is this decentralized ledger technology that would go and disrupt various verticals, and the real question was which verticals were right for disruption, which verticals would be totally new and created by it, and which verticals would be a worse fit, perhaps, or just take longer to absorb blockchain.

00:02:36.389 --> 00:02:49.573
And so I just want to say that, because our fundamental point of view at Coinfund is that Web3 is a technology that comes to different areas and creates intersections with them.

00:02:49.573 --> 00:02:54.405
We've seen Web3 for finance that's called DeFi.

00:02:54.405 --> 00:03:19.669
We've seen Web3 for digital collectibles that's called the NFT space, and now we're seeing Web3 AI, and so to me, this intersection was always like a natural, almost like inevitability of things that would happen as Web3 developed and put forward these value propositions of self-sauber data privacy, of security and so forth.

00:03:20.599 --> 00:03:22.569
Now to go back to my personal experience with AI.

00:03:22.569 --> 00:03:24.560
I studied math and computer science.

00:03:24.560 --> 00:03:30.532
I was always aware of the AI field as I was studying and as I was going out to work.

00:03:30.532 --> 00:03:52.651
I actually remember even using some AI concepts back in must have been 2012 or 2013 or so where I was using machine learning to measure Twitter sentiment in an effort to trade stocks or something like that, and there was actually even a hedge fund that went ahead and made a whole strategy based on that.

00:03:52.651 --> 00:04:11.443
Of course, a few years later did quite make it and went out of business, but that is all to highlight how far we've actually progressed in the field of AI, with things like chat, gpt, and so, as I have been following AI over the course of these many years, what started to become obvious?

00:04:11.443 --> 00:04:21.209
I don't know like late 19, early 20, things like that we started to hear rumblings about AI language models, starting to make some progress.

00:04:23.000 --> 00:04:29.160
And really, in retrospect, the first investment that we made in this intersection was at the end of 2020.

00:04:29.160 --> 00:04:45.372
And that was our investment in Worldcoin, which is, of course, sam Altman's what three project and is aimed at creating a proof of personhood that, among other things, could be used to disambiguate, you know, ai's and humans and create an identity protocol and so on.

00:04:45.372 --> 00:05:31.947
And our next investment was in March of 2022 in a company called Jensen AI, and what Jensen was doing at that time is that they said well, we want to democratize compute and, in particular, we want to enable training in more of a decentralized way, and I think what they meant functionally by that was in a more available way to any participant in the market that wants to work with AI training, but also in the sense of being trustless, in the sense of being able to go to a network and simply request and pay for the service and be like very sure that you would get a service here which is very different than the way that, like, the big tech companies sort of operate here.

00:05:31.947 --> 00:05:40.641
At the time, in March of 2022, if you asked most AI people you know, is Jensen a good idea?

00:05:40.641 --> 00:05:42.043
Of course?

00:05:42.043 --> 00:05:49.170
This is coming on kind of the back of like GPT-3 and large foundation models being developed and sort of making a splash.

00:05:50.139 --> 00:06:11.250
Most AI people would say, well, it's kind of strange that you would opt for such a thing, because when you start to decentralize training, you're gonna make it slower, you're gonna make it more inefficient and really, like we were able to achieve the scale of these large models because we're stretching every dimension that we have.

00:06:11.250 --> 00:06:16.720
We're stretching that like the theoretical dimension, you know, to make these algorithms tractable by GPUs.

00:06:16.720 --> 00:06:29.783
We're stretching our ability to create, like large GPU clusters and data centers and Really, to them at that time, you know, slowing that process down Didn't quite make sense.

00:06:29.783 --> 00:06:55.773
Now it's 2024 and now you know we've had a chance to live with products like chat, gpt and large language models and generative AI for art and video and 3d and other other areas, and we have already faced a number of sort of challenges to privacy, like, for example, artists, have become aware that they're, you know, their data is being used for training some of these models.

00:06:55.773 --> 00:07:06.387
In fact, the New York Times recently famously has sued open AI and Microsoft for using AI articles in there in their training.

00:07:07.170 --> 00:07:33.725
And we also are getting the sense that, you know, if LLM like agents or assistants are poised to become so useful to us and even better than us at performing tasks, that this creates a danger to our data, these LLMs become Poised to pull in more private data from us than even you know Google and Facebook, etc have done in the past.

00:07:33.725 --> 00:07:33.966
Right.

00:07:33.966 --> 00:07:55.494
And so now, from this vantage point of 2024, thinking about how web3 reintersects with AI and provides primitives for compute, for open collaboration, for open source, for Privacy of data, for privacy of computation and so forth, suddenly this has become really prescient.

00:07:55.494 --> 00:08:06.732
There's a growing number of startups and different opportunities and networks and Also web2 startups that are concerned with that problem.

00:08:06.732 --> 00:08:13.894
So let me pause there because I know that was a lot as the backstory, but but that's kind of how I frame Web3 in my mind.

00:08:14.005 --> 00:08:22.279
Yeah, I think you probably know this from my background, but I've been kind of obsessed with hardware in the last couple of years, but specifically like the physical backbone of the internet, right.

00:08:22.279 --> 00:08:25.509
So the data centers all the world that make things like AI work.

00:08:25.509 --> 00:08:37.113
One of the questions I've been asking myself since I got into crypto in 2016 is like, how do you deploy the physical infrastructure For a decentralized network in his many data center locations and ASNs on the planet?

00:08:37.113 --> 00:08:40.330
Like, how do you create a really censorship resistant and decentralized network?

00:08:40.330 --> 00:08:44.851
It's something I've spent a lot of time thinking about and most of my work in the space has been focused on that.

00:08:44.851 --> 00:08:51.126
With deep in things like deep in things like these decentralized compute networks they're starting to answer that question.

00:08:51.126 --> 00:08:54.120
Things like Jensen, right, of how you actually get the physical infrastructure.

00:08:54.120 --> 00:08:57.835
That thing can compete with these larger companies, right.

00:08:57.835 --> 00:09:11.443
I think I saw on Twitter, facebook, mark Zuckerberg, some screenshot of a tweet he did or something where they had this ridiculous amount of GPU capacity that they had purchased something like 300,000 plus GPUs.

00:09:11.443 --> 00:09:14.509
I mean that's really hard to compete with.

00:09:14.509 --> 00:09:27.265
And as I've been exploring this and started starting to look at like I'm still learning about AI, like I'm not a computer scientist, but I know, like the basics of the different pieces of what makes an AI work, and the thing that I keep coming back to is data.

00:09:27.265 --> 00:09:30.813
That data ownership is the biggest piece of this that I think.

00:09:30.813 --> 00:09:34.407
Well, two things data and also access to GPUs.

00:09:34.407 --> 00:09:43.032
I think that there is this if you talk to like a big data center company today, they're gonna tell you there's like a 52 month wait or something for an h100 chip.

00:09:43.032 --> 00:09:48.533
There's this belief in the sort of web 2 space, in the traditional data center space, that GPUs are hard to come by.

00:09:48.533 --> 00:10:02.227
But actually I think that's where web 3 is starting to really disrupt the market is that there's there are a lot of GPUs that you can get access to at scale if you can leverage Token economics and a protocol to access them through.

00:10:02.227 --> 00:10:08.966
The other piece that I think that many companies are missing like enterprises they're looking at using AI in their business is their data.

00:10:08.966 --> 00:10:13.101
When they partner with like open AI or all Anthropic.

00:10:13.101 --> 00:10:16.211
Those are like the two bigger competitors in the space.

00:10:16.211 --> 00:10:27.052
They're handing their data over to the AI owned by those two companies and it's not their AI, and I think that the really cool thing if you talk about a timing standpoint.

00:10:27.052 --> 00:10:28.274
You talk about 2024.

00:10:28.274 --> 00:10:35.107
There are Jensen's, there are a cache networks, there's render, there's a bunch of different infrastructure layers that are getting built out.

00:10:35.107 --> 00:10:42.251
And then you look at the open source LLMs that are out there, like llama to and things like that open chat, gpt for all right.

00:10:42.251 --> 00:10:47.913
You're starting to get things that are not quite as good as open AI they're like, but they're like a GPT 3.5 equivalent.

00:10:47.913 --> 00:10:57.913
I'm getting excited because I'm starting to see a way to weave all these things together that you can actually achieve, like the end-to-end solution of different products.

00:10:57.913 --> 00:11:01.879
But different products would need to get built that actually leverage AI in a decentralized fashion.

00:11:02.541 --> 00:11:07.312
I do think that the data privacy piece is that that's the one that I the big blocker that I keep running up against.

00:11:07.312 --> 00:11:08.153
Like I haven't.

00:11:08.153 --> 00:11:18.679
Every anytime I go to like a web 2 company that wants to use, get access to GPUs that are cheap, and they start talking about like using one of these decentralized network, they're always like well, what about my data?

00:11:18.679 --> 00:11:19.969
How do I protect my data?

00:11:19.969 --> 00:11:21.539
Maybe Jensen solves for this.

00:11:21.539 --> 00:11:24.539
I would love to hear if you have seen some solutions.

00:11:24.659 --> 00:11:34.104
But I know that, like ZKP, zero-know-truth technology is like one potential area that could solve for the data privacy piece, but it's not necessarily like bi-directional.

00:11:34.104 --> 00:11:38.894
That was something I was told by like another engineer, so there's like an issue there that I'm not fully versed on yet.

00:11:38.894 --> 00:11:40.559
It's like an area of exploration for me.

00:11:40.559 --> 00:11:47.340
Long story short, I'm I'm still trying to figure out where the privacy preserving technology is, that's that's in the middle of all this.

00:11:47.340 --> 00:11:57.195
That allows you to maintain ownership of your data, or privacy of your data, and train it on an open network, on a server you don't necessarily control, like have you seen a solution to that yet?

00:11:57.195 --> 00:12:00.989
Or what are the solutions we should be looking for, like what's needed that doesn't exist yet?

00:12:01.900 --> 00:12:03.727
Yeah, this is a great question.

00:12:03.727 --> 00:12:09.966
I appreciate it, alex, because it really digs into the Technical details of what is happening here.

00:12:09.966 --> 00:12:13.813
So maybe we could get a little bit like nuanced.

00:12:13.813 --> 00:12:31.159
Well, you know, when you hit like kind of to your point, when you hit the open AI API, that is a, an API that is run by, you know, this corporate entity, open AI, right, and it's running their kind of proprietary like server environment.

00:12:31.159 --> 00:12:55.320
When you send them your, you know, your query, your, your prompt, your data that's associated with it, right, this is something that they kind of take custody of and Process for you and, in principle, could be storing or could be using in other ways, depending on what they say in their terms of service, which seems to be changing quite often, and that's sort of the default right that we're used to in web 2.

00:12:55.320 --> 00:13:02.551
You know, if you store documents on Google, you know they're your documents, but in some very strong senses they're not yours.

00:13:02.551 --> 00:13:17.386
They are used for improving Google, they're used for government subpoenas, their Employees internally could potentially look at them and have, and so the question you're kind of asking is, like, when we go into web 3 world, how does this change?

00:13:18.541 --> 00:13:21.250
It's a really good question and it's actually like a really difficult problem.

00:13:21.250 --> 00:13:22.937
Let's define some terms here.

00:13:22.937 --> 00:13:30.354
So when you are using an AI model, when there's an AI model that exists, you send it some input and it gets you output.

00:13:30.354 --> 00:13:32.083
That's called inference.

00:13:32.083 --> 00:13:36.879
A lot of web 3 is actually focused on this process of inference.

00:13:36.879 --> 00:13:40.200
We have these open models or maybe a proprietary models.

00:13:40.200 --> 00:13:45.388
How can we get you the outputs of those things you know in the form of a decentralized network?

00:13:45.388 --> 00:14:10.470
And the thing that you have to realize is that when you are dealing with a decentralized network, you're still dealing with a third party that is running the node that is doing the computation that you're sort of asking for now we can argue that that perhaps is a better Set up than having that counterparty be like a big tech company who has a lot of attention power over over your data.

00:14:11.160 --> 00:14:21.426
But it's still the case that that your counterparty in that network sees your input potentially right, sees your output potentially Can store that information and so on.

00:14:21.426 --> 00:14:33.394
And I think like one attempt in this area To create more sort of options for privacy has been called zk, amount z zero knowledge machine learning.

00:14:33.394 --> 00:14:48.301
And what zero knowledge machine learning basically says is that I can produce this inference, this output of the model, and I can also produce a zero-knowledge proof that that output is correct.

00:14:48.301 --> 00:15:08.506
And generally what that allows a decentralized network for inference to do is to first of all solve the verification problem, Let the consumer know that their problem is being solved or their query is being computed correctly, which is really important.

00:15:08.615 --> 00:15:18.860
When we go to Google, we have a lot of trust in Google as a kind of legal entity to give us such guarantees and when they fail on them, we have certain legal recourses.

00:15:18.860 --> 00:15:39.106
In the case of a Web3 network, perhaps what's more prescient is that we have cryptographic guarantees that this output is correct and the other thing that ZKML allows such a network to do is actually to have different levels of knowledge about the model.

00:15:39.106 --> 00:15:39.926
So here's what I mean.

00:15:39.926 --> 00:15:49.903
I mean that it could be a totally open model whose weights are known, or it could be a proprietary model whose weights are unknown.

00:15:50.163 --> 00:15:50.745
Where are the weights?

00:15:50.745 --> 00:15:51.958
Can you explain that a little bit?

00:15:52.595 --> 00:15:59.121
So the difference between like an open source, and a closed source, like neural network, is what is a neural network?

00:15:59.121 --> 00:16:04.144
It's a bunch of nodes and a bunch of connections between those nodes that have numbers associated with them.

00:16:04.144 --> 00:16:05.822
Essentially, it's a big data file.

00:16:05.822 --> 00:16:08.764
That's what a neural network is.

00:16:08.764 --> 00:16:15.561
In the case of something like stable diffusion 1.5, for example, which is a generative art model, all of those numbers are known.

00:16:15.561 --> 00:16:20.822
In the case of open AI, chat, gpt, we have no idea like is that one model?

00:16:20.822 --> 00:16:21.878
Is that five models?

00:16:21.878 --> 00:16:22.740
What are the weights?

00:16:22.740 --> 00:16:24.044
How do they interact together?

00:16:24.044 --> 00:16:42.884
Like we just it's a black box, and my point that I'm making is that, using ZKML, you can actually create situations where an open model is being served or a proprietary, closed model is being served, and I would argue that that's good because it supports, like every possible business model.

00:16:42.884 --> 00:16:57.342
So I could be a proprietor of some kind of private model and yet I could still give guarantees to my consumers that their output is correct.

00:16:57.342 --> 00:17:28.342
Largely speaking, like the way that ZKML and inference networks work today is they trust, minimize and they sort of like maximize privacy, but they're still like not a perfect solution to privacy, in the sense that your counterparty still has to compute your data and to get to a point where the counterparty does the computation but doesn't know what you're computing, doesn't know your input, doesn't know your output.

00:17:28.342 --> 00:17:38.884
That's a very hard problem that people are working on in the FHG space, but I don't think it's very efficient today.

00:17:38.884 --> 00:17:43.020
I don't think that that's a reasonable expectation to have from a network.

00:17:43.020 --> 00:17:45.682
So it's an improvement on privacy.

00:17:45.682 --> 00:17:53.539
Now you could also say, in the case of open models, privacy in some sense becomes less important because the models are open.

00:17:53.539 --> 00:18:05.361
But it might still be important in the sense like I don't want you to know which pictures I'm generating, but we've made progress, in other words, toward private networks, but we're not quite there.

00:18:05.434 --> 00:18:06.439
And I want to say one last thing.

00:18:06.439 --> 00:18:18.208
These mechanisms like FHG fully homomorphic encryption or zero knowledge proofs are still very heavy weight primitives today.

00:18:18.208 --> 00:18:32.021
They're very complex to compute, they add latency to these requests, they add cost to these requests, and so you are getting this increased level of privacy, but with a trade off of efficiency.

00:18:32.021 --> 00:18:44.146
And also that efficiency could be fatal in the following sense like you could have a model, but because of the inefficiency of ZK, that model can't be too big.

00:18:44.146 --> 00:18:55.335
It would be exceptionally cost inefficient to compute an output of a model the size of GPT-4, something like hundreds of billions of parameters.

00:18:55.335 --> 00:19:02.819
In fact, the state of the art of ZKML right now is computing a model with just a few million parameters and that's it.

00:19:03.615 --> 00:19:11.084
If you wanted to actually do chatbots and LLMs on chain, you would have to go to a different strategy than ZKML.

00:19:11.084 --> 00:19:12.066
It's just not feasible.

00:19:12.066 --> 00:19:18.942
And that strategy, or one of the strategies that has been offered, is something called OPML, which is called optimistic machine learning.

00:19:18.942 --> 00:19:27.782
We can talk about that, but long story short, it's a way where someone gives you the output and someone else checks them, and if their check fails then they pay a price.

00:19:27.782 --> 00:19:31.365
So it's similar to what optimistic roll-ups do versus like ZK roll-ups.

00:19:32.134 --> 00:19:43.479
I find this problem fascinating because I have run up against so many different use cases where the privacy piece is what's prevented an end customer or end consumer of using a decentralized network?

00:19:43.479 --> 00:19:45.044
Because right now they have two options.

00:19:45.044 --> 00:19:51.683
If you can find a solution that works on a decentralized network, you could use that for your AI use case, whatever it is.

00:19:51.683 --> 00:20:03.196
The other option is, like an enterprise is to use something like a virtual private cloud or do things on premise and then use open source software and in that sense, you really do own the AI, because the data that you're putting in there is your data.

00:20:03.196 --> 00:20:04.842
No one else has access to it.

00:20:04.842 --> 00:20:05.965
So you can own your AI.

00:20:05.965 --> 00:20:07.941
That's the way I started thinking about it, totally.

00:20:09.075 --> 00:20:24.654
My conception of like, when you think about like LLMs as personal assistants, like one of the questions is how do I make this product like totally private right, like if I'm using chat, gpt, I know my data is going over there like blah, blah, blah and like.

00:20:24.654 --> 00:20:54.938
My conception of like a private LLM assistant is when you have your, when you own your data locally in a self-soldered way, and when you also own the model locally, you're able to like, fine tune or augment the model with your data, but in a totally local way, so that never leaves your sort of premises right, and then that creates kind of like a really high degree of privacy in terms of other people using your data.

00:20:55.775 --> 00:20:58.017
Can you still append it with external data If it's running?

00:20:58.017 --> 00:21:14.999
I mean, in that ideal scenario you've got, because there's other data that can be a model, could be trained on, right, like that's publicly available or that you port in via API, but you don't want to expose your data, and so your, your, like, local instance can then benefit from the, because the more data you have, the more powerful the AI is like.

00:21:14.999 --> 00:21:18.047
The more data it's trained on, the better it is right.

00:21:18.047 --> 00:21:28.319
So if it's got your local data and you've got your local instance, but it can still talk to that, you know that external version, I mean, is that something that's possible or there are going to be more risks there to your?

00:21:28.319 --> 00:21:30.104
It's not exactly private if you do.

00:21:30.305 --> 00:21:36.547
If you do that, there's a few strategies that make kind of like core LL model smarter.

00:21:36.547 --> 00:21:42.539
So one strategy would be like fine tuning it on extra data and making it sort of more specialized for that data.

00:21:42.539 --> 00:21:49.760
I think that is generally difficult, requires a lot of compute, requires knowing the ins and outs of a particular model.

00:21:49.760 --> 00:21:53.506
I mean it's feasible that there would be services that help you streamline that.

00:21:53.506 --> 00:21:56.641
It feels expensive to do that because the compute.

00:21:57.462 --> 00:22:08.617
Another and this is actually how a lot of augmentation works today is that if you've ever played, for example, with open AI, is GPT's facility.

00:22:08.617 --> 00:22:16.855
It's sort of like a specialized prompt for GPT for, together with extra documents that it could process.

00:22:16.855 --> 00:22:20.223
So the question is like how does, how do those documents like enter the system?

00:22:20.223 --> 00:22:30.606
And I think the answer I don't know exactly how open AI does it, but in general, like the way that a lot of people put in extra data into models, is through vector embeddings.

00:22:30.606 --> 00:22:35.567
As a matter of fact, this is the subject matter of one of our startups.

00:22:35.567 --> 00:22:40.865
I think I could say it now because it's going to be announced tomorrow, probably before this podcast airs.

00:22:40.904 --> 00:22:41.826
Yeah this won't come out then.

00:22:41.946 --> 00:22:57.804
Yeah, so we're leading around in a company called Bagel Network and Bagel has created essentially like a vector embeddings facility with a Web3, open, collaborative kind of business model and like think like GitHub, but for LLM data.

00:22:57.804 --> 00:23:10.229
What you could do is you could take your personal data and you can feed it into this embedding and by doing that you're making it essentially parsable and processable by the LLM.

00:23:10.229 --> 00:23:40.180
So a lot of people have have made models smarter that way and I think that as that data and as those embeddings kind of become either more local to your premises or maybe still out there in the cloud, but encrypted, like, with high degree of permissioning and self sovereignty of like who gets to like see it, then users are gonna have greater private securities and much more control over like what LLMs kind of get their data.

00:23:41.163 --> 00:23:41.865
I'll have to check them out.

00:23:41.865 --> 00:23:54.605
I think it's the guys behind GPT-4 that they're like GPT-4 all was, or maybe was open chat I can't remember which one I got to go look but like their whole actual primary business model is vectorizing data for AI use cases.

00:23:54.605 --> 00:23:58.884
That's they have like a service that does that for companies and for people listening.

00:23:58.884 --> 00:23:59.515
You don't know what that is.

00:23:59.515 --> 00:24:06.962
It's just like data comes in a lot of formats and you need to format it so that an LLM you can feed it to an LLM, right, like it has to be in some kind of standardized format.

00:24:06.962 --> 00:24:09.156
Correct me if I'm wrong, but that's right.

00:24:09.217 --> 00:24:10.759
that's right, At a high level, that's right.

00:24:11.055 --> 00:24:17.845
Bagel like allows me to do this in a decentralized way, and then do I actually own, if I'm like, the data provider?

00:24:17.845 --> 00:24:26.961
Am I getting compensated for that in this scenario, or is it just more like I'm paying for the vectorization service from, like, a decentralized network of providers that are doing the work for me?

00:24:27.295 --> 00:24:35.865
There's a lot of details Bagel is yet to release, but the general idea is, yeah, it's kind of an open marketplace, right?

00:24:35.865 --> 00:25:04.978
So if you I just you know, I think back to like the nineties or early 2000s or something, when people would sell databases of like US addresses or something for like e-commerce, I could see Bagel developing into an open marketplace where you're like, oh okay, like here's, you know, here's the vector embeddings for all the addresses in the United States, or here's the vector embeddings for, like, all the species of animals that are in biology, and something like that, right, that's super cool.

00:25:05.414 --> 00:25:09.384
So one thing is like I think there's a lot the bleeding edge of the data privacy thing is still in play.

00:25:09.384 --> 00:25:21.996
We don't have like a complete solution yet, and I think one of the things that I've been really interested in since like 2016 is just seeing the amount of money that's been invested in things like ZKPE and ZKML and, you know, homomorphic encryption it's been.

00:25:21.996 --> 00:25:29.384
I think, an unintended side effect of the whole crypto and blockchain space is like so much money has gone into that technology.

00:25:29.384 --> 00:25:33.061
I don't think anybody I wouldn't have thought that that would have been like a major area.

00:25:33.061 --> 00:25:40.357
But you know, privacy became such a huge value prop of Web3 and so that's why it's been receiving so much attention.

00:25:41.015 --> 00:25:48.405
But, you know, moving away from just like AI you know we talked about this before the show but what are you most excited about in 2024?

00:25:48.405 --> 00:25:51.083
You know a lot has changed since last year.

00:25:51.083 --> 00:25:55.343
We kind of look like maybe we're coming out of the bear market and into a better market.

00:25:55.343 --> 00:25:56.518
A lot of things are up.

00:25:56.518 --> 00:26:04.241
But from a technology and startup standpoint, what do you think happens this year that Coin Fund's excited about or that you're just interested in personally?

00:26:04.694 --> 00:26:15.948
One thing that's definitely like, very visible to people who are studying the crypto space, is that we continue to kind of merge and converge with the traditional world.

00:26:15.948 --> 00:26:33.805
So this month we've had in the beginning of this month we had the SEC approve the first Bitcoin spot ETFs, which I think ultimately paves the way for more traditional finance participation in cryptocurrency and DeFi and crypto broadly.

00:26:33.805 --> 00:26:44.523
I think it paves the way for other kinds of ETFs for other assets like Ethereum, which are productive, and then eventually maybe even like Solana or something else.

00:26:44.523 --> 00:26:57.800
At the same time, you know, in 2023, at Coin Fund, we backed Robert Leschner's project Superstate twice and of course, robert is working on RWA real world asset tokenization.

00:26:57.800 --> 00:27:02.058
We've heard Larry Fink from you know BlackRock mentioned that.

00:27:02.058 --> 00:27:11.125
Hey, you know these Bitcoin ETFs and other ETFs related to crypto are gonna be just stepping stones on the road to tokenization.

00:27:11.125 --> 00:27:18.884
So it's hard to like, it's hard to argue with the fact that the traditional world has picked up a narrative of tokenization and they're very excited about it.

00:27:18.884 --> 00:27:31.963
We have a lot of like very sophisticated institutional LPs that you know almost like the only thing that they want to talk about is like tokenizing treasuries and how impactful that's gonna be across the finance industry.

00:27:33.095 --> 00:27:37.261
What's interesting is like at Coin Fund, like our thesis was always one of convergence.

00:27:37.261 --> 00:27:42.741
Right, there's some people say like, oh, you know, early on people would say like, hey, crypto is gonna replace everything.

00:27:42.741 --> 00:27:45.643
We never, we always thought that was a little too drastic.

00:27:45.643 --> 00:27:52.662
I thought like to really get the benefit of, you know, of adoption of blockchain technologies.

00:27:52.662 --> 00:27:55.162
It's not that blockchain would replace everything.

00:27:55.736 --> 00:28:17.124
It would like converge and create like, new efficiencies and new options for, you know, governments and regulators and you know incumbents like banks, and that's actually what we pretty much are seeing happen this year with the Bitcoin test, with the tokenization and the maturing of, you know, some of these processes in industry.

00:28:17.124 --> 00:28:27.319
So that's one, just one thread right that we're thinking about very deeply and are very excited and maybe, like just maybe one thought about why that's so exciting.

00:28:27.319 --> 00:28:57.118
I think that once more traditional investors, once Wall Street, once pension funds and hedge funds kind of start to get a taste of the idiosyncratic returns of something like a Bitcoin and other cryptocurrencies, they'll also start to have more influence on government, to have more favorable you know, regulation, legislation for these things, and I just think that that's super bullish for crypto in the longer term.

00:28:57.118 --> 00:29:05.103
Some other areas that we talked a little bit about AI and Web3, and of course that's been a huge focus point.

00:29:05.103 --> 00:29:08.625
It continues to be a focus point for me in 2024.

00:29:08.914 --> 00:29:13.523
Let me just say a little bit about, like, what we're actually investing in that space.

00:29:13.523 --> 00:29:32.368
Well, last year was very much about seeing people attempting to democratize compute, so we've seen things like Akash, ionet, many other like open GPU networks, and a lot of our focus last year was around training and inference.

00:29:32.368 --> 00:29:44.390
I think, like Jensen is probably the farthest along, I would argue, in the world around thinking about decentralized training as a kind of academic problem.

00:29:44.390 --> 00:29:58.864
By the way, that investment was validated very much for us because over the summer they had a huge funding round, so $43 million round led by A16z, and we're actually the I believe we're the second largest check in that round.

00:29:59.484 --> 00:29:59.986
Congrats.

00:29:59.986 --> 00:30:01.288
That's always nice to see.

00:30:01.288 --> 00:30:02.451
Yeah, thank you.

00:30:03.741 --> 00:30:05.839
And then we also spent a lot of time on inference.

00:30:05.839 --> 00:30:12.313
We made an investment in a company called Giza, which is giving AI inference to smart contracts.

00:30:12.313 --> 00:30:28.529
It's a little bit different than AI inference APIs in general, but I think the opportunity on chain to use AI is also very, very interesting because it's a new and early market that has poised it like explode in a good way at some point.

00:30:28.529 --> 00:30:32.750
And then, related to that, we've had this discussion around ZKML.

00:30:32.750 --> 00:30:49.173
We also led around in a company called Cindry, which we announced recently, and Cindry is working on ZK, proving a lot of which is going to go toward applications like ZKML and also rollups that are serving various AI applications.

00:30:49.173 --> 00:30:55.813
So, in short, last year was very much about the compute portion of the AI pipeline.

00:30:55.813 --> 00:31:25.994
This year, with our upcoming announcement of Bagel Network, we're going to start getting more into the privacy of data parts and how do we get LLM's to run locally, how do we get data to be private, maybe even a little bit about the provenance of data to creators have a new business model where they opt in and out of AI training in exchange for royalties, for example, and their work.

00:31:25.994 --> 00:31:38.853
I think a lot of that will also be adjacent to the outcome of that New York Times lawsuit, for example, and how are we supposed to think about AI training from a copyright perspective out here in the world?

00:31:38.853 --> 00:31:53.913
So I think this year we're looking a lot more at data-related AI projects, and I'll mention one more thing in that genre, which is data availability layers.

00:31:53.960 --> 00:31:56.288
So we've heard a lot about Celestia.

00:31:56.288 --> 00:32:07.703
Celestia has had an incredible launch of their token, and there are many projects I could, yeah, avail others that are very focused on creating data availability.

00:32:07.703 --> 00:32:11.640
But this type of data availability is really geared toward rollups.

00:32:11.640 --> 00:32:12.984
It's geared toward layer two.

00:32:12.984 --> 00:32:23.808
It's been optimized for that purpose, and what I'm thinking about is sort of something related but completely different, which is like how do you optimize a data availability layer?

00:32:23.808 --> 00:32:41.452
But for things like CKML, for things like holding an AI model indefinitely at very low latency and being able to verify it, the optimizations in that layer that are required, for example, ai inference, are much different.

00:32:41.452 --> 00:32:43.484
The data has to stick around longer.

00:32:43.484 --> 00:32:45.304
It has to be low latency.

00:32:45.304 --> 00:32:45.744
On high.

00:32:45.766 --> 00:32:46.648
It has to be at the edge.

00:32:46.648 --> 00:32:49.648
I mean it has to be at the edge, as close to the edge as possible.

00:32:49.648 --> 00:32:52.148
So how do you get that there in a decentralized?

00:32:52.148 --> 00:32:53.646
I mean that's the biggest question to me.

00:32:53.646 --> 00:32:58.151
Again, you go back to the physical infrastructure needed to execute that.

00:32:58.151 --> 00:32:59.544
It's just a huge shift.

00:32:59.544 --> 00:33:03.950
It's a huge shift from the current model, which is like data centers with.

00:33:04.352 --> 00:33:06.788
You know, like, equinix owns a huge portion of the data center market.

00:33:06.788 --> 00:33:07.903
They're all over the world.

00:33:07.903 --> 00:33:14.366
Lumen again, another company that's got data centers all over the world and they, like you know, used to run a third of the internet through their pipes.

00:33:14.366 --> 00:33:15.750
All that shifting now.

00:33:15.750 --> 00:33:27.523
The other thing I think is really interesting about that is it's all these companies are competing against Google and AWS and Azure for that compute to be as close to the user as possible, and it's a race to the bottom.

00:33:27.523 --> 00:33:39.724
They're all like getting outcompeted on price and I think that decentralized networks are going to put even more pressure on that because it is, in theory, will become cheaper for decentralized compute to offer like excess capacity into the market.

00:33:39.724 --> 00:33:44.905
But, like we talked about earlier, you still got to solve the data privacy piece to make that all kind of tie it all together, I think.

00:33:45.528 --> 00:33:45.807
Totally.

00:33:45.807 --> 00:33:51.984
Maybe I'll mention like two more areas and we can go on, but you know Deepin is a huge narrative right now.

00:33:51.984 --> 00:33:59.410
Decentralized physical infrastructure networks hey, we used to call them like resource networks or something, but got a new name.

00:33:59.410 --> 00:34:05.946
What's really cool here is that just a few years ago, alex, I think this was in like 22 or like 21.

00:34:05.946 --> 00:34:10.643
There are people who came out and said like Like kind of Silicon Valley.

00:34:10.643 --> 00:34:16.891
And they said like hey, like web 3 is dead on arrival because nobody wants to run nodes in their home.

00:34:17.132 --> 00:34:25.570
And I think like deep, it is like Dramatically like reputing that I think you know there's like over a million helium nodes.

00:34:25.570 --> 00:34:29.094
People are buying hive mappers, are buying Salana phones.

00:34:29.094 --> 00:34:42.376
There's a company, our portfolio, called demo, which allows you to reclaim your electric vehicle data by essentially buying a little device and plugging it into your car and, by the way, if you have a Tesla, you don't need to do that just connected to the API.

00:34:42.376 --> 00:35:06.592
Long story short, it turns out that there is actually a lot of people who are willing to run some kind of helium device or hive mapper device or demo device in their home and I just wonder if that Kind of keeps going and going and going until eventually someone aggregates All of these functionalities into a single piece of hardware that, like every family has in their home.

00:35:06.592 --> 00:35:13.393
You know, the same sense you know as a computer, but it's their note on, like decentralized web 3 network.

00:35:13.393 --> 00:35:17.628
Right, that would be the ultimate defeat of the web 3.

00:35:17.628 --> 00:35:18.592
Bears from 22.

00:35:19.161 --> 00:35:20.945
So two comments on that I think are important.

00:35:20.945 --> 00:35:29.184
One is I think what this last cycle proved is that these, these deep end networks can survive a bear market like helium.

00:35:29.184 --> 00:35:40.666
It was that wild ride right like up through the top and then bam like bottom down and people were probably not making money on those nodes for a long time, but they still kept them and skills like kept the network deployed and it's starting to grow again.

00:35:40.666 --> 00:35:42.068
That was probably the big question.

00:35:42.068 --> 00:35:44.692
The bears like hey, can this is have staying power?

00:35:44.692 --> 00:35:46.835
And I think the answer is yes, like it definitely does.

00:35:46.835 --> 00:35:49.103
I mean I think there's enough early adopters.

00:35:49.103 --> 00:35:56.750
The way I look at it is, at least in the case of helium, it's like owning a piece of the railroad I can install this tower at my house or just a hot spot.

00:35:56.750 --> 00:36:03.273
I own a piece of the infrastructure of a future telecom, of a current existing telecom, actually because it's already live and working.

00:36:03.273 --> 00:36:09.327
And then to your point about these like a super box, I've been thinking like of a bunch of different deep end projects.

00:36:09.327 --> 00:36:18.896
I've been thinking of it as like a franchise model where you have like a business in a box, you can unwrap it and it has like all the instructions for how to operate it and you get in your own revenue off of it.

00:36:18.896 --> 00:36:21.824
And there's a company actually called mycelium.

00:36:21.824 --> 00:36:25.932
Shout out to those guys have been talking to them for a while, based out of, I think, north Dakota.

00:36:25.932 --> 00:36:30.000
They've got like 400 helium hot spots deployed, all of all of North Dakota.

00:36:30.000 --> 00:36:32.143
They've got a cool business model.

00:36:32.143 --> 00:36:43.019
They make money off of the devices but they've also built a whole software stack around managing their fleet and their fleet acts as a test bed for, like an accelerator almost for other deep end projects.

00:36:43.019 --> 00:36:50.855
So other deep end projects can like give them a bunch of notes to deploy and now they're essentially kind of starting to form that that like super node.

00:36:50.855 --> 00:36:58.969
You know they might have like five different use cases at one location on their footprint and I think that there's going to be similar to like the property management industry.

00:36:58.969 --> 00:36:59.911
I think it's scale.

00:36:59.911 --> 00:37:15.726
They're going to be a bunch of like property management companies that come out and deploy these nodes at businesses and they do ref share deals with businesses and they've got like the franchise, you know, in a box that's run with a with a couple of layers of software on top for the management piece of it and you'll start seeing it scale.

00:37:15.726 --> 00:37:30.219
That way you'll have like a bunch of individual consumers, hobbyists, who run these things at their house, but you'll also have these more like professional operators, just like we've seen with the professional value Market right, where there's like these professional validator companies that validate on you know a hundred different networks, along with the you know thousand individuals who do it as well.

00:37:30.219 --> 00:37:35.152
Yeah, so it's just like a really interesting path to adoption.

00:37:35.152 --> 00:37:38.219
And then, you know, I think the last thing I'm excited about to is RWA's are pretty interesting to me.

00:37:40.463 --> 00:37:48.525
I wrote a paper I'll share it with you later if I haven't already on the mortgage industry and like two years ago, three years ago, like before I got into the salon ecosystem, I was like maybe I'll do something in mortgages.

00:37:48.545 --> 00:37:51.596
I bought a couple houses, like real estate investments, and really hated the mortgage process.

00:37:51.596 --> 00:38:00.831
And the area that I landed where I thought that there would be low hanging fruit is the secondary market.

00:38:00.831 --> 00:38:05.197
People don't realize that mortgages are actually physical pieces of paper that the broker holds.

00:38:05.197 --> 00:38:11.574
In the second you buy a more, you like, buy house and get a mortgage, they're on the phone selling it to somebody else that piece, that piece of paper and they're customing that paper somewhere.

00:38:11.574 --> 00:38:16.706
It's a pretty inefficient market.

00:38:16.706 --> 00:38:24.878
There's been a couple players who tried to tokenize mortgages before and I don't know that it's going to be in the next like two years or something, but I do think it's something that will get tokenized eventually.

00:38:24.878 --> 00:38:31.653
And then there's a lot of benefits from like the, the ability to like automate distribution of payments and things like that through through the network.

00:38:31.653 --> 00:38:35.552
But I don't know if you have anything else to add to that before I ask my next question.

00:38:35.632 --> 00:38:48.594
But that's a great use case for, you know, for decentralization, technologies and in terms of creating more efficiency, the organization, all that stuff.

00:38:48.594 --> 00:38:50.559
So I would love to be in that world where mortgages are streamlined in that way.

00:38:50.559 --> 00:38:52.119
Tell me about Miami.

00:38:52.119 --> 00:38:54.387
You're Miami, I know there's an ecosystem there.

00:38:54.407 --> 00:38:56.619
I have a couple of friends who live in the area.

00:38:56.619 --> 00:39:03.980
It just seems to me to be like this hopping center of Web three in our space, but what's the best way to do that?

00:39:03.980 --> 00:39:09.670
Web three in our space, but what's going on in Miami, which, if you're in town visiting, where should you be looking?

00:39:09.670 --> 00:39:11.153
Who should be talking to that kind of thing?

00:39:11.153 --> 00:39:13.762
Totally one of the reasons.

00:39:13.842 --> 00:39:19.253
I moved down to Miami, it's not just because of the weather, which is great, but it's because I was around.

00:39:19.253 --> 00:39:29.773
You know, 2020, 2021, I was hearing this narrative that Miami is going to become more of a tech hub, more of a crypto center, and I'm an early adopter of things like that.

00:39:29.773 --> 00:39:31.195
So we came down here.

00:39:31.195 --> 00:39:36.568
It has definitely, like, made a lot of progress over that period of time until now.

00:39:36.568 --> 00:40:06.496
You know, I think, like when I, when we just came down here, there's a lot of focus on Trading tokens, like a lot of NFTs folks were down here, I think, over the course of us being here coin fund, first of all, we ran a monthly meet up for for a time when we would invite our portfolio company CEOs essentially to come and talk not just about, like the tokens and NFTs, but like how do they build their crypto business and like how does it work.

00:40:06.496 --> 00:40:12.503
You know what is the, what are the entrepreneurship aspects of, of running a crypto network or crypto company.

00:40:12.503 --> 00:40:19.652
And then, more recently, I think, just the sophistication of folks in Miami has gone up.

00:40:19.652 --> 00:40:23.476
Like we've been, again coin friends, been sponsoring these monthly happy hours.

00:40:23.476 --> 00:40:27.527
The attendance the last couple of times has been through the roof.

00:40:27.527 --> 00:40:27.867
Right now.

00:40:27.867 --> 00:40:30.893
We have one coming up in three days, on Jan 25.

00:40:30.893 --> 00:40:34.480
We have over 205 people who signed up for this.

00:40:34.480 --> 00:40:40.552
You know, for this event if you're in town, you know, come by there always like really, really fun.

00:40:41.152 --> 00:40:52.028
Almost every, or the majority of coin from the company is like someone on the team you know is either in Miami or coming through their ton of crypto managers down here.

00:40:52.028 --> 00:40:59.739
Other than coin fund, obviously, there's folks like Arrington, hypersphere, block tower, frictionless capital, many, many others.

00:40:59.739 --> 00:41:11.532
There's a lot of like really interesting founders who just some of them are like born and raised here, like will wine route, who's the CEO of cryptos, which is a company in our portfolio.

00:41:11.532 --> 00:41:15.099
Or or Luca from budget penguins actually lives not not too far from me down here in Miami.

00:41:15.099 --> 00:41:21.014
I think over the last couple of years it's it's been, it's been quite interesting community.

00:41:21.054 --> 00:41:23.539
It's built, been built up here and I would say, like, what is this community's challenge?

00:41:23.539 --> 00:41:29.110
I think the challenge is getting more engineers on the ground.

00:41:29.110 --> 00:41:41.340
So, like, unlike New York, like you know, stanford or something, miami has fewer kind of technical universities around and so engineers tend to be imported from what I'm from New York, from California, from Austin and so on.

00:41:41.340 --> 00:41:54.539
So if you're an engineer and you're kind of Miami curious, like come down, hang out at our happy hour, you know, check out like the community here and this is a great place to like headquarters start up or just kind of like work out of Miami for a few months.

00:41:54.539 --> 00:41:58.226
That's what I would say.

00:41:58.728 --> 00:41:59.510
I'm not a CPA.

00:41:59.510 --> 00:42:02.396
This is not tax advice, but there are a lot of more favorable taxes.

00:42:02.396 --> 00:42:02.596
I hear in.

00:42:02.615 --> 00:42:04.219
Florida than there are at least in California where I live.

00:42:04.219 --> 00:42:09.052
So If you're already working in the crypto space in your engineer and you're doing well, it's a it's a good thing to consider.

00:42:09.052 --> 00:42:11.056
I have several friends who are moving to the area.

00:42:11.056 --> 00:42:16.010
Just because of that, it sounds like a good.

00:42:16.010 --> 00:42:17.673
Good way to connect with your team is to show up at one of these meetups.

00:42:17.673 --> 00:42:19.880
Are they happening like once a month, or how often do you guys do these sponsored meetups?

00:42:19.880 --> 00:42:22.166
It's been once a month for the last five months.

00:42:22.246 --> 00:42:26.454
I'll shoot you over the link and maybe you could post them to show 100%.

00:42:26.454 --> 00:42:26.739
We'll do that.

00:42:30.329 --> 00:42:30.650
Well, listen.

00:42:30.650 --> 00:42:31.112
Thanks so much.

00:42:31.112 --> 00:42:32.356
I know we're at the top of the show here.

00:42:32.356 --> 00:42:33.498
I appreciate your time and thanks so much for coming on.

00:42:33.498 --> 00:42:40.597
Thanks, alex, you just heard the index podcast with your host, alex Kahaya.

00:42:43.907 --> 00:42:48.659
If you enjoyed this episode, please give the show a five star rating and subscribe on Apple, spotify, google or your favorite streaming platform.

00:42:48.659 --> 00:42:50.246
New episodes available every other Wednesday.

00:42:50.246 --> 00:42:55.963
Thanks for tuning in.