- Copyright – enterprise decision-makers have to determine the risk AI poses to their industry, and then access how this technology will be used and managed internally.
- Liability – who is liable for false information that AI puts out? LLMs are difficult to control and spew out a lot of false information.
- Data management – disciplined data management is the key to maximize the value from LLMs.
Bill: Hey Mark. How are you?
Mark: Very good, Bill. Good to see you.
Bill: It’s good to see you too. So let’s start with a little background. So why am I talking to you about AI?
Mark: Well, I thought it was probably because we met and you’ve been following my work for a while.
Bill: That is true. Absolutely.
Mark: Yeah. But no, I mean, you jumped back into the legal tech business with both feet it sounds like. So, well, I’ve been in the R&D phase with AI for 25 years. My first interest in it was in 1984, way back up in Washington State, so I won’t spend too much time on that, but that was an early blip, but that’s when I first got interested in it. But didn’t really get heavily involved until the ’90s, when I was running my own lab, which was considered a Knowledge System Lab in northern Arizona. And it was a small lab, but at the time there were very few Knowledge Systems Labs. The largest one was at Stanford and I was running a couple of networks. One was for thought leaders, a learning network that was like the early LinkedIn. Ran that for three years during the late 1990s.
And it was during that time that I got really interested in cross-disciplinary work with algorithms, and the data that we were using, and it was personalized. So we were experimenting with a bot called Lookout, which was a personalized digital assistant, it wasn’t a talking assistant, but it was text, and we were doing some really neat stuff. And that’s when I came up with a theorem that KYield, my company, is based on, yield management of knowledge. And so I decided to jump in with both feet. I started a VC firm in 2002, so I was incubating KYield on the side. And then in about 2006, we could finally see where eventual technical viability of the theorem and filed the core patent. And then 2008, moved over to Silicon Valley in discussions on a VC merger, just in time for the financial crisis, which I walked away from.
The other firm was not looking very good, the LPs and stuff, so we moved to New Mexico. I got involved with the Santa Fe Institute and shifted gears on our research. And that’s when I started looking at more advanced AI systems, for the past… Over what, 12, 13 years now? And the company, we incorporated KYield in 2018, and that’s when we really started getting serious about going to prospective customers, with our flagship system, which is an enterprise-wide enterprise operating system. And I don’t know how much you want to talk about all that, but that’s probably, I assume, why… And then I publish quite a bit. I’m pretty active and have done a lot of videos so the educational curve has been very steep for everybody, for us, for customers, for everybody writing about it and understanding it. So whether you like it or not, it’s a very intensive learning curve for all of us.
Bill: Well, I’ve been following your work for a while now, and we’ve gotten to a very interesting inflection point where everybody’s paying attention all of a sudden thanks to our friends at OpenAI. And you have been working on and concerned about the governance and controls and understanding of the results of AI for many years, and now this is an issue at the forefront.
Mark: Well, one of the benefits of running a large network, when I came up with KYield, in designing a system architect, was that it was obvious to me from the beginning that you’re going to need governance, strong governance in any of these systems. You’re going to need a data management system because that’s what it’s all run on. And so you’re going to have to have the ability for things like lineage, and providence, and strong governance, and you need to be able to enable the customers in an enterprise to be able to easily make decisions on how it’s going to be governed, and accessed, and that sort of thing. So yeah, I’ve been very involved with governance all along because it just seemed… And that’s part of our core patent and still, part our core system in the KOS. And so I was actually shocked to see because I’ve been pretty aware of these large language models and the transformers. I followed the research very carefully. I know quite a few of the leading scientists, and I’ve interviewed them, and in my own writing and things, I’ve met quite a few.
And so I was shocked that they released OpenAI, released their chatbot, which I considered to be prematurely. And I was even more surprised because I was an early booster of Microsoft way back in 1982, and I knew them well. In fact, we ran product boot camps for them in 1983, in my quote, learning end, that the head of research in Microsoft three decades later called the learning end. And so I was shocked that Microsoft, even at arms length, would be so aggressive on this. Because there are some very obvious issues for those of us that have been in the trenches for a long time.
And that’s why you’ve seen things like the open letter come out, and some of the leading scientists, even Hinton in the CBS interview that I posted on LinkedIn from just a week ago tomorrow, I think it is, he said… And he’s one of the strongest advocates for large-scale machine learning. He said that “Look, its guardrails are very difficult with LLMs”. And he thought Microsoft was being reckless. He was surprised too. And I haven’t talked to any of these people, but it’s interesting to see when they get interviewed that they’re thinking the same, and all these different people, Yoshua Bengio and quite a few others. It’s a very interesting time, and it’s not over yet, especially on the consumer side. There are some big legal issues coming up, so it’s going to be interesting to see how it works out.
Bill: So suddenly we have this public beta or alpha, depending on how you want to look at it, of generative AI. Everybody knows this term now, generative AI, where four months ago, who was talking about that at the water cooler? And so we’ve now all gotten a taste of what it can do, which is actually amazing. I mean it’s really impressive the results that you get, but there are some real challenges. So let’s talk about what some of those are.
Mark: Yeah, well we focus primarily on the enterprise like you do, right? And you’re industry-specific, our systems are universal, so they’re not… And then once you get them installed, then you do deep dives with their people, their data scientists, or third parties, and you can go down into the granular level, but it’s not industry specific. We have a couple of designs for industries, but it’s not like you guys, specializing. But in any event, there are multiple serious issues for enterprise decision-makers. And I’ve had some really interesting discussions with some of the big companies in the last few days, CEOs of some of the biggest companies in the world. And so let’s lay them out. I talk about it a little in my newsletter, but one is, right off the bat, is copyright. And so obviously the Supreme Court has not had a chance to review this in detail or rule on it.
There are a bunch of lawsuits that have been filed. There’s a lot of stuff working, but copyright is a major issue. During the ’90s, there was a grand bargain that was made with search and social networks, and then essentially fair use, okay, copyright will allow you, regulators, society at large, will allow you to use a fair amount of content, provided you’re linking to the original source. And for that, through legislation, section 230, we’ll provide you with protection on liability so that you don’t get sued because it’s not your content after all, right? You’re not providing the full content, you’re providing a blip. And if let’s face it, if we hold search engines responsible, liable, for all of the content on the web, it’s not going to be a very good search engine. You’re going to get a few hundred places that they know there’s no risk, a few thousand, but you’re not going to get to… If Mark wants to put up a website talking about his favorite hobby, they’re probably not going to list it in the search engine.
So that was what happened in the ’90s. Now, here we have the LLMs with massive, infinite computing power, scraping unbelievable quantities of data, trying to essentially scrape the world’s knowledge base and put it into one system towards artificial general intelligence, and provide full content reproduction. And so it’s an entirely new world. That has never happened before, and certainly, at this scale would be able to reproduce across all industries, we call it the knowledge economy. So if you’re an enterprise decision-maker, you’ve got several issues. One is, “Okay, my industry could be at severe risk if this continues because they have the ability to glean so much more content than we do, and data, and apply AI to it.” And then if we’re going to use it, this technology internally, there’s a big difference between a plugin to a system like that using your own data, or as we discussed before the talk, purpose-built systems where you’re using your own data like you were talking about with the legal industry, and what’s starting to happen in a large scale now.
So those are two really big issues on the copyright. We can delve into that. But then the second big issue that I touched on is liability for the false information that it puts out. Well, there we have some pretty good indicators on which way the courts are leaning. Gorsuch, in the current legal case with Google on a somewhat similar case. But it was interesting, he came out and said, “Well, let’s assume that these chatbots are not going to be covered by 230,” so they’re not going to have liability protection from being sued. Well, one of the problems, as we all know with these LLMs, is they’re very difficult to control, and they spew out a lot of false information, including about people, companies, and everything else. Well, okay, so what’s going to happen there with OpenAI and anyone like them, where they’re spewing out huge quantities of false information about competitors, about bias, just about all kinds of stuff?
And to me, their future is very much in question on that issue. And then just recently, a few days ago, the actual two lawmakers that wrote that law said very clearly that it won’t be covered. It’s not even close, so those are the big issues that haven’t been decided on, assuming that it’s going to need to be SCOTUS that decides on those. And then of course we have the FTC that just came out yesterday, a complaint filed saying that OpenAI is bought, and you can extend that to any that are almost identical in the type, that not following the guidance on AI, and they are breaking some of the rules there. So those are the three big areas where we have governance issues that very much impact the legal community, and there are ways to deal with it.
So if we separate out the consumer chatbots that scrape the entire universe, I personally think their future is very much in question, without a radical change. But that doesn’t mean that there isn’t a huge opportunity for generative AI, especially in the enterprise, or even closed websites where they own the data, and people are voluntarily contributing their data and participating in that sort of thing. Networks, different new types of networks perhaps. So yeah, it’s not going to stop the AI development. There’s been a lot of hype about that. Oh no, headlines, they’re seeking to stall AI. That’s not the case at all. It’s really just this one model that has those serious issues for consumers that’s in question.
Bill: And both of those things could be addressed, right? Because you can choose what data you plug into your LLM, and you could also build guardrails. Some people call it a large libelist model because of the potential there. And so an example out of that, because one of the challenges with generative AI is it has a pretty significant hallucination rate and a specific challenge there is you can get a result that shows in quotation marks that Mark Montgomery said X, or that Mark Montgomery wrote a certain book that sounds like a book that you might have written. Statistically, it makes sense, that that sounds like a book you could have written, but of course, it didn’t actually happen, and the quote was made up. Well, those are two facts that are verifiable. You could take everything that’s in a quote and go check and say, “Well, can I actually find a reference where that quote does exist?” But currently, the ChatGPT, as we know it today, does not do those things and it opens up all kinds of avenues for abuse.
Mark: Right, yeah. And I ran some tests and some competitive issues with their company versus others and found it pretty interesting, too. And even in AI theory and some other things where there seemed to be a fair amount of bias. So yeah, there are a lot of issues in terms of liability for that model. On the technology side, it’s like Kenton and others have said, the guardrails, you can do certain things, but you can also work around them. And as aggressors are built that are automated, they’re going to be able to very quickly move around these bots and the guardrails that you put up. So you put up one guardrail, just for example, and there have been some really interesting interviews for the people that worked on these actual things early when they were raw. And oh my goodness, talking about, I don’t know if you’ve seen some of those, but talking about who would make a good individual to assassinate, and what would be the best way to do it, and that sort of thing.
And it just automatically spews out directions on how to do it, and it just stunned the people that were working on it. Scared the heck out of them. So you’re asking for things like, and they’ve already addressed this, so I’ll bring it up, but if you’re talking about how to build a bomb, well pretty easy in any of these language systems to guardrail against the bomb, right? The keyword bomb. But if you’re a clever terrorist or someone else, or if you have a software program that’s relatively clever, well, not to mention the state-sponsored effort that has resources, you could talk about the specific chemicals that have completely legitimate uses, and frame your questions for a WMD that says nothing about a bomb. And the bot would not be able to differentiate, would not be… Your guardrails would not be able to catch that, not at any time within the foreseeable future.
So there’s some real risk issues, cyber has already been one that’s been tested a lot, and there’s been proven that it’s extremely good, these bots, at giving directions on behavioral workarounds, and actually code. Giving you the code to be able to attack and do cyber attacks, because that’s frankly all available on the web in different forms. So yeah, there’s a lot of liability, there’s a lot of risk issues that people hadn’t thought about, and we know that the… I know some of the scientists that worked in these early on, and there are some brilliant people working on it, but they’re working on algorithms, and they’re working on the science and theory, and some of them have a little experience on the safety side, but the problem is that you have this unprecedented experiment on humankind that’s released in the wild, and it didn’t have the type of scientific rigor that one would expect from a Manhattan Project, or medical experiments, et cetera, for this century anyhow, or for the last three or 400 years.
So it’s almost a little bit like the old guy in the west riding around on his wagon, selling the latest little concoction to cure all your ills. It’s a little scary. And then people, how much are they going to drink and what are they going to use it for? Just don’t give it to your dog, please, whatever you do. Yeah, it’s a little… There’s reason to be concerned on the risk side, and I know Microsoft is working hard on it, I know. But the problem is that the inherent technology in these LLMs makes them very, very difficult to make them safe. Unless you control the data like you were talking about in your enterprise. Then if your data is safe and secure and of high quality, not a problem at all.
Bill: So we’ve got a snake oil salesman problem on the consumer side.
Bill: Because for the consumer side, you want to ingest everything so it applies to everything. But if we switch to an enterprise model and talk about tackling specific markets, then we have a very different circumstance.
Mark: Absolutely. And it’s not limited to LLMs then, because one of the benefits, or one of the reasons why OpenAI and others are using the very large scale consumer option is because they need enormous… It’s fundamentally based, the science is based, on the scale. And so you need to scrape something at the website to be able to approach artificial general intelligence when you’re using that model. There are other ways to do it. There are synthetics, there are other theories, and there are other ways to pursue AGI. And a lot of the leading… Certainly, there’s no consensus on this. A lot of the leading thinkers in AI think that this is, we’re just about hit the limits of what these LLMs can do. One reason is that we’re running out of data. There’s not a lot of data left for them to run out and improve the systems. So if you get into the enterprise, you don’t have that scale, then there’s a lot of different algorithms that you can use, it’s not just LLMs, to create…
You can certainly use transformers, you can do… But there are a lot of different options that you can use in it. And that’s one of the reasons I, in my latest newsletter on EAI, focused on neuro-symbolic AI, because that’s one of the primary fields now that we’re looking to for the future of the enterprise, to be able to deal with governance, precision, and yet still benefit from the generative AI and some of the other models. Not necessarily for full content reproduction, although it would take longer, you could certainly do it. If you had a large law firm and you had enough data, you could quite easily set up a thing that, “Write me a letter on X topic,” and based on all of your previous work in the law firm, it would pretty easily do that for you. And that would be very good for productivity, right?
Bill: Well, that’s what has everybody excited, right? There’s the potential. The question is how do we get to that potential? Because I feel like we’re at peak AI hype right now because of the launch of ChatGPT, and we almost have to go through a phase of reality and retrenchment to then get to the true realization, which could be even much greater than how we’re seeing it right now.
Mark: Well, yeah, some of us hope that the hype will be toned down, because it’s been punishing on the ears and eyes recently, just because of the intensity. But yeah, it’s true. I think one of the benefits of this, you can see it in investment, right? I think a lot of the investors are getting it wrong, but still, it did demonstrate to the world the capabilities of what AI can do, and it didn’t restrict them from the questions that they wanted to ask. And so in some sense, it did contribute to the education of AI, whereas previously, we were all restricted on what we could do, especially in the enterprise. Because even if you do anything in the enterprise, typically you’re pretty bound with NDAs and stuff on what you can talk about, or what you can show. You’re dealing with some of the most sensitive stuff inside the… intellectual property and workflow in those enterprises… and there’s a lot of secret work going on. Things like drug development that is very secretive.
They don’t want anybody to know what they’re working on, with good reason. And so this at least allowed the public to get a taste at the level of their understanding in their real world, whether they’re asking the chatbot about the football games and who’s going to win, or whatever it is their interest is, or professional work, or whatever it is. At least they get a sense now of what the future is going to look like. And I think that the future is going to happen. But as I said, I don’t see a pathway where these consumer bots with the current LLMs, without a radical change in structure, I don’t see how they can even survive. It just seems like it’s impossible, the liability issues alone… And I think that’s probably why Microsoft kept it at arm’s length. They’re willing to invest billions of dollars, but it’s a separate company, and they can go away and Microsoft would still be able to use LLMs in generative AI in their productivity products, where the data is controlled. And Microsoft would be fine, but I’m not so sure about some of their partners in the LLMs.
Bill: Well, but a lot of technology has been developed this way. Move fast, break things.
Mark: Oh yeah.
Bill: And often the first mover does survive that. And sometimes the first mover gets arrows in the back, and we are talking about the second mover, but sometimes breaking things is a successful way to build.
Mark: Yeah. The thing that’s unusual, that is about this unprecedented, is that you’re going in without… It’s unknowable what all the risks are. And so a lot of those risks can’t be taken back. So that’s, I think, what concerns scientists, a lot of scientists, is because we just… We’re aware of the risk, but we didn’t apply the rigor that you should be doing in a controlled environment. So for example, how it should have been done, you could have Microsoft, they could have partnered with Microsoft. Microsoft could have gone out and partnered with some of their industry customers, large scale or whatever, and used those LLMs, let’s say, a large scale publisher with huge amounts of data, Dow Jones or whoever.
And you could have done some really interesting experiments with a very controlled environment, the employees and all the people could have been trained on the risks involved, and what to report, and that sort of thing, and give it a little more time. Now it’s speculation, but it’s pretty good speculation. It’s educated, yes. OpenAI needed money. Running that thing, they were burning just astronomical amounts of capital to keep that engine humming, and they needed capital. So they were under extreme pressure to demonstrate, to be in negotiations with Microsoft apparently, and others, to get enough capital to continue on with that model. And that’s probably, it’s a fair assumption that that’s why they jumped out so early. They were just under pressure. They may not have survived otherwise.
Bill: So a $10 billion win for them there.
Mark: Well, they could have had a gun to their head and there was maybe no guarantee that… And if they weren’t able to demonstrate it, and as it was, it was a very aggressive move by Microsoft, but who knows? I mean, Microsoft may not have invested if they didn’t make it public. It’s questionable.
Bill: But there are more specific applications that seem a lot more straightforward, like Microsoft supplying the same technology to copilot in GitHub, that does not seem open to the same level of problems and challenges as ChatGPT, with a huge upside in terms of programming productivity.
Mark: Well, Microsoft has a ton of programming code available that they own, and so they could certainly use the same, it doesn’t even need to be LLMs, but they could let’s just call it machine learning algorithms to be able to increase productivity for software programmers. And that’s been the assumption for years now, which is that programmers are going to be one of the early victims of automation. Now, will they all get replaced? No. For a long time, you know how these things work, it’s going to make them a lot more productive, and they’re going to get a lot more done, and then focus on other things. So it’s not so much automating people as it is tasks and a certain percentage of their work. But yeah, I think that’s a very good example of… And I think that’s a pretty good comparison to legal work because there are a lot of comparisons, legit comparisons, between focused and disciplined legal work, and focused engineering software, engineering in the code. One is a language, a legal language, and the other is software code. And so I think there is a… It’s a good analogy, it’s a good comparison.
Bill: I think programmers are going to be the beneficiaries, by the way, because we know in running software companies, there’s always much more code to write than ever gets written. So I see that upside.
Mark: In the future, I agree with you. I think it already is benefiting software programmers, and it’s going to be a long time before, let’s call them professional developers, are automated out. I don’t see it yet on the horizon personally.
Bill: We were talking about, before we got on here, with Bloomberg’s announcement that just came out of a finance specific product, which is in line with what you’re talking about where this all needs to go.
Mark: Yeah. Just saw it a half hour before our talk started, and I’m connected to one of their CXOs. And they just issued a paper, and I forget how big it was, but I think it was 5 billion or something. Anyhow, they’ve been working on it a long time, and I think they call it Bloomberg GPT. And it’s essentially a finance, purpose built for the finance industry, let’s call it chatbot. And presumably they’re going to sell it to their customers and the finance industry, but it’s a large scale. It’s tailored specifically to that industry, and they’ve been working on it a long time. So I scanned it real quickly. I haven’t had a chance to study it, but from what I saw, it looked pretty impressive.
Bill: But directionally, that’s the route to get to the realization of the value here.
Mark: Well, for enterprises, I mean, it certainly is… Not everybody’s going to be able to do that. I mean, I think they recognized a good opportunity and probably a risk. If you’re doing a SWOT analysis for Bloomberg on AI, it would probably come up top. AI is both risk and an opportunity. Very, very near the eye because they’re probably not a lot of risk for Bloomberg I would imagine otherwise. But this did open up pretty good risk. So they obviously invested a lot of money in there, but they’re a data company. When it gets down to it, they produce enormous quantities of data. It’s proprietary, they own it all, they’ve got their own media arm, so they’ve got huge quantities of data that they could tap to be able to produce this. So it makes perfect sense to me that they would do this, and it provides really good defense for them, and provides a good growth opportunity for their customers.
And if you’re one of their customers in finance, it’s probably going to be pretty difficult not to at least investigate that product and see what it offers you. So it struck me as a smart move on one of the first really big efforts in an industry specific way. But that probably isn’t going to mean… You could see, for example, we’ve done deep dives in most industries over the last 10 years. So I’m thinking about pharma, deep dives in pharma. They have access to enormous amounts of data, but it’s not necessarily, in fact, it’s not the type of data that is necessarily good for generative work. And it’s certainly not in terms of a chatbot, because it’s things like drug trials, human trials and scientific data, all kinds of interesting stuff that you can accelerate R&D you can bring efficiencies to it to reduce their costs a lot.
And there’s a lot of fraud in pharma, so that’s a concern, multi-billion dollar annual problem. But they’re primarily focused on accelerating R&D for blockbuster drugs. That’s their incentive, but they don’t have the type of data that’s going to be able to create a chatbot. So there would be an opportunity for somebody in the industry to go in. We have a digital assistant in our system and it’s universal, but it can be tailored once we run it on their data. It’s automatically tailored to their industry and to their people, based on their own data.
And that’s what we try to get across to people. It’s difficult for people to understand at first, but the architecture of the system doesn’t really care what industry it is. As long as your people are experts in what they’re doing, and they’re working with high quality data, that’s really what matters, and that’s still the differentiating factor between competitors, are still your people. And I don’t see that going away anytime soon. We can certainly apply some really good augmentation, and there’s some interesting automation going on, but especially at the higher levels of professional levels, it’s still going to be human against human in the competitive world for quite some time.
Bill: But I think that argument very much applies in the legal profession, and it’s a text-based profession, right? The output is text. And so I think that’s why we see so much excitement in legal around this technology.
Mark: Right, yeah. I mean it would be, I think, probably pretty similar to what Bloomberg faced. Let’s say if you’re a large law firm, let’s say a giant law firm, international, and if you’re doing your annual update of your strategic plan and you’re looking at doing a SWOT analysis, AI… We haven’t focused on the legal industry much, but in the industries that I have, and the conversations in most industries, I can tell you the largest companies in the world, the CEO’s, AI has been top of mind in both risk and opportunity. So it’s been a very high priority for years now. Now, we have this consumer issue that comes out that changes that equation quite a bit, but it’s been on the minds of Fortune 100 companies for at least eight years, 10 years, as when we first started 12 years ago, nobody had ever… We would contact a company, let’s say oil and gas, I can recall, and they had artificial intelligence in their enterprise. And some of them had supercomputers, which you still needed to do AI then.
But they were using it exclusively for things like mining data, to be able to produce more oil. They had never considered using it to augment their enterprise or their workforce, which is what we’re focused on, to presumably, or possibly even then, look into getting into other industries, saving money, preventing… One of the things that we worked on quite a lot was the BP oil spill and preventing that. That was nearly fatal, right? 63, $64 billion event, and a few million dollar investments, I’m convinced, today anyhow, would’ve been able to prevent it. At the time, eh, the technology wasn’t quite there yet to prevent that kind of thing, but today, most of those types of events can be prevented. So yeah, there’s an awful lot for the boardrooms to consider, and they don’t have the luxury that I do of spending most of my waking hours up at 3:00 AM reading through the latest papers to stay on top of it. It is really difficult because of the volume, and the noise level. It’s hard for everybody now to keep up with it. It’s crazy.
Bill: Yeah, I think the SWOT analysis in law firms is in the same exact box. When you get industry-specific, and you really tune for a specific data set, how does that address the guardrail and hallucination issues? How does it make that circumstance better?
Mark: Yeah, I’ll explain that in two different ways. One is our way of doing it in our system, and that is we have what’s called a CKO engine, and that was part of the original core patent, and the first generation of technology. We have a whole other generation, but that’s from the semantic data and symbolic and natural language in LP. So it’s an easy-to-use natural language interface for the… It’s not necessarily a CKO, but senior-level people that are approved in the organization. And so that is the governance, a CKO engine for the whole enterprise.
So for example, you determine who has access to what data right off the bat, which is really important, especially now with these bots. Because as you can imagine if that bot gets into sensitive areas of a client in a law firm, and let’s say it’s a partner, or a researcher, or somebody, even somebody within the firm that is not supposed to have access to it, and suddenly it spews out, that chatbot spews out really sensitive information, let’s say, it could even be criminal. Or it could be just a huge financial issue, or it could be IP, trade secrets, or industrial espionage.
So you have to be very, very disciplined in your data management, number one. And so that governance, that’s how we handle it in our CKO engine because it’s a precision data management system. Now, if you’re using just LLM, their generative AI, you’re limited on what you can do because of those guardrails, you need… Okay, so the quality of the data is going to be there. Let’s say you’re in a large law firm, I don’t think you have to worry about some of the craziness or the hallucinations that you have to worry about in the LLMs on the consumer web because your inputs are high-quality data. So all you’re going to be concerned about is governance, right? Data governance is being able to box it off by clients so that you have security for your clients to protect your own law firm.
If I was a senior partner, especially in a certain area, I would want, and through ours we have data, our digital assistant, I would want the ability to protect my own data and give access. So we do that in our system. Who’s going to have access to this sensitive data? Well, if I’m working in a team, obviously there’s going to be a team of people that needs access, but on this project, the next project, it might be a different team that has access to it. So that’s the key issue is data management, whether it’s LLMs or any other kind of machine learning, because you can’t have a data warehouse scenario with artificial intelligence, and everybody has access to it when you’re in anything like a major enterprise, much less something like a law firm. I just don’t see how you could possibly do that. You can’t-
Bill: In a law firm, you have to pay attention to the ethical walls in the law firm, or you’d have to choose to only run against the documents that are not subject to any ethical wall.
Mark: Yeah. So in our data, for example, each individual has their own data library that they upload their own library of data. Then there’s a corporate data store that everybody has access to or at least everybody that the corporation, whom the CKO engine gives access to. And so that’s how we deal with it, and very rich metadata. Here’s the thing, data physics is what I call it. And you see a lot of this in the last couple of years written about… And it’s true, it’s amazing that it took this long for people, that even Google, one of their senior people, very, very well respected computer scientists, and I know them about three years ago, came out and said, “Well, guess what? In artificial intelligence, it actually turns out that data management is the key.”
And it’s like, oh my goodness. Because they were focused on algorithms for so long, and the whole sector was all the scientists, and nobody wanted to do data. And in fact, there was a survey done on it, and everybody in AI wanted to work on models because that’s where it’s much more interesting. It’s challenging from an academic perspective and an intellectual perspective. And so nobody wanted to work on the data. But when you’re talking about the enterprise, the quality is, it’s the data management is absolutely key. You can’t even, you shouldn’t even, allow artificial intelligence in an enterprise unless you have it very, very disciplined, boxed off on access to the data. Otherwise-
Bill: If you want good output, you got to have good input.
Mark: Well, yeah. And then if you’re only looking at a small data set, then you probably won’t even want to use LLMs. It wouldn’t make sense. You could use other algorithms that are better for it. And then you can also apply synthetic data to augment it, that you can beef it up and get some better results. There are tricks that you can do. So yeah, it’s really difficult to do after the fact. And that’s what happened to chatbots, OpenAI, is that they didn’t think through their architecture governance from the get-go. It’s a fairly new company, they didn’t do the architecture in advance. They didn’t plan it out. And now after the fact, especially with LLMs, it’s almost impossible.
It’s literally physically almost impossible to do a good job because there are workarounds. However, if you’re structuring it well and you design a governance structure from the get-go as we did, then your risk is very low. It’s physically impossible not to have governance, unless somebody really does something, hacks into the system or something like that because it’s physically separated and managed with more traditional data management. And that’s why the combination, that’s why I focused the newsletter this month, because of the hype, because of all the problems, and looked at neuro-symbolic, because that’s really the way we need to go in the enterprise, in the future.
Bill: Well, Mark, this has been a great discussion. I really appreciate you coming on and tackling this. Couldn’t be more timely.
Mark: Yeah, it was fun. I appreciate you inviting me on, Bill, and it was good to see you again. Hopefully, we can get together physically here, in the not-too-distant future, and have dinner or something.
Bill: I agree. Thank you.
Mark: Thanks a lot, Bill. Appreciate it.