As a lawyer and consumer of legal technology, Damien was frustrated by the lack of interoperability between vendors and their taxonomies. Fast forward to 2019, and Damien is part of the team solving this problem with SALI tags:
- SALI tags allow vendors, law firms, and clients to easily standardize their data for better legal services classification and delivery.
- SALI enables vendors to speak to one another; think of this analogy: you’re trying to communicate with someone across the globe, but your telephone doesn’t talk to other telephones. Why not get the interoperable phone that talks to everyone’s phone? That interoperable telephone is SALI.
- Progressive law firms who invest in knowledge management will have an advantage over their peers; the value of good data will increase through SALI.
Damien Riehl
Bill Bice
Transcript
Bill Bice: Hi, Damian. Welcome back.
Damien Riehl: Thanks, Bill. Thanks for having me.
Bill Bice: So, when we talked last time, you mentioned something that I want to get into some depth with SALI because it’s always a whole bunch of problems. So why don’t you give a little background about why you care about this problem, how you got involved, and so on?
Damien Riehl: Sure. So, I’ve been a lawyer since 2002, and as a consumer of legal technology, I was surprised and dismayed that all the vendors I was using had different taxonomies. They had different ways of calling a motion to dismiss for breach of contract in the Southern District of New York. So, if you did it on one system company, you know vendor A, you may be moved to vendor B. It would just not work right? That’s one problem. And then problem #2 is that vendor A doesn’t talk to vendor B, and that’s largely because they’re using different tags, aware that the same thing motion to dismiss should be the same between vendor A and vendor B. So, these two things stuck in my craw as a consumer, and it came to the fore as I moved from a lawyer to a legal tech person.
So, as I started working for these companies, that problem became even more acute.
I realized this is a problem that must be solved. So, around 2017, I went to Codex Stanford in Stanford, and Toby Brown said, “Hey, we’re starting a thing called SALI. The standards advancement with the legal industry will ensure interoperability between vendor A and vendor B. The way that we’re going to do that is to be able to have standard ways to express areas of law and the services that are under those areas if they’re the law and the industry is for whom we do our work and the documents that we do this work, essentially everything that matters to the law, we will standardize those tags, standardize those metadata so that the data is interoperable between the vendor’s A, B through Z. Now everyone can talk to each other in this interoperable way.”
So fast forward to 2019, and I worked for Fastcase, which vLEX recently acquired. I was talking, I sat down at a table, and I said, “Hey, I’m Damian Riehl,” and he said, “I’m Toby Brown.” I said, “I saw you a few years ago, and we talked about SALI. You haven’t built out the litigation side.” He said, “Do you know anybody that can do that?” I said, “hell yeah, I’ll do that for you.” I started working with him then, and there were about 1200 when I joined. We’ve expanded from 1200 things to in 2020, we pushed it to 10,000 things that mattered, and we are currently at almost 13,000 things that matter, 13,000 metadata tags that cover both the substance of law and the business of law. This is being used by some small companies like Thomson Reuters, Lexus, Bloomberg, iManage NetDocuments, and small law firms like DLA Piper, uh, Gibson, Perkins, Couey, and others. So, this idea is that all of them will be interoperable in a way that now fulfills my dream.
I’m sharing my screen now with us, you know, to say that these are some of the small companies that are using SALI, and we’re solving a problem for each of them. We’re solving two problems for each of them. Problem number one is the problem that my friend Corey, at times the writer, at… oh! And by the way, I should say that SALI is a nonprofit. I’m a volunteer for that nonprofit. Everything that we have is free and open source. That is free, as in speech, fries, and beer. It is also open source so you can download it in GitHub. So, I’m not selling anything. This is all free, so the problem-solving for Corey, at times reporter, is Cory’s boss said, “Hey Corey, we have 30 products. They don’t talk to each other. Make them talk to each other. And Corey thought, wow if I’m to build that, I must build the taxonomy. That’s hard, especially since I’m not a lawyer. And then he looked at SALI and was like, oh, SALI is good. So, Corey is using SALI as a universal translator amongst those 30 products to make them all talk to each other, and then that’s an internal problem we’re solving.
Then we’re also solving an external problem where Cory can say to Gibson or DLA Piper, Clifford Chance, all these large law firms if you want to hit any of these 30 products, send me the SALI tags, and then you can talk to all thirty of those products. That’s an external problem that he solved.
We’re solving that internal and external problem for TR and Lexus and Bloomberg and NetDocuments for iManage, Intapp, and Litera. All these companies have internal products that don’t talk to each other, and they all have external customers that they need to serve, or, in the case of law firms, DLA Piper has lots of these companies here, and they don’t want to have to map their internal taxonomy to whatever these people map their taxonomy. Wouldn’t it be great if we had this universal standard that if DLA Piper masked to SALI, they could then talk to all these people without having to remap their tags or metadata tags? So, let’s talk.
Bill Bice: It would. It would be a great cause. As you know, every time I’ve ever gotten in and implemented document management, system practice management system for a law firm, the tax, the taxonomies always have this huge hurdle, every firm has its taxonomy.
Damien Riehl: Yep. And so, we’re solving the problem. Say I have an area of law like intellectual property law and a type of intellectual property law. Patent, cool. Of course, you might say, “Well, Damian, where is patent litigation versus patent prosecution registration smartly before I joined SALI, I decided to switch up the area of law patent law with the service that I, as a lawyer, provide should I get a patent. Then I filed a patent for the Patent Trademark Office. Then, I licensed the patent, litigated the patent, and dealt with patent assets in bankruptcy. Each of those is a service that I, as a lawyer, do.”
You can imagine those who are fans of ontologies and knowledge graphs, can imagine how that simplifies the knowledge graph because if you were to have instead a child of this that says patents litigation as a child, patent prosecution is a child, you’d have to do the same thing for copyright litigation as child copyright prosecution as child trademark litigation as a child. So, it’s turtles all the way down. Instead, we have a tagging system where tag number one is patent locked, and then at intake, it starts as advice. But then quickly says, OK, let’s go ahead and file this with the Patent and Trademark Office. So, you keep tagging things up as more things happen so that tagging versus bucketing is the strength of SALI.
Then, once we have patent law, you could imagine while my customers don’t call it patent law, they’re German they’re Spanish, or they’re French. Or there, you know, they’re using Hebrew. Israeli, the beauty of SALI is that all the things, no matter how you call it, have this unique identifier right here, and this unique identifier is used by every one of these companies. Everything every time one of those companies tags up something using path and law, they’re going to slap this unique identifier, and then we’re building an API standard to say that you, as a firm, customer, can send me at Vex Vasquez. Doc alarms this unique identifier in an API call, saying Send me all your patent law things, and when we see this, we’ll send you our patent law. Things send that same API call to Thomson Reuters. They’ll send you theirs. Send the same API call from Lexus to Bloomberg to NetDocuments. I manage all the people here who that Universal API call will have so that everyone can talk to each other by API because we all use the same identifier. Even if we use different words to be able to express those, that’s the power.
Bill Bice: And I’m sure you’re going demonstrate this, but the beauty of this taxonomy and doing the cross-referencing means that you can ask for patent law that’s involved in litigation. And it’s very straightforward given that litigation is one tag on its own patent law is one tag on its own, and so on.
Damien Riehl: That’s right. That’s right. So, I want litigation. Cool. So, what part of litigation practice do I care about? Do I care about trial practice? Cool, do I care about pretrial practice? Cool discovery. Yes, I care about depositions. We have down, so if you think about listening to people, they may be familiar with the ABA task codes. The ABA task codes are also known as the UTMs, so as a litigator, I would often be using ABA task codes to be able to do L330, which was the bane of my existence. Depositions. Right. And ostensibly for pricing, this is to say, how much does a deposition cost? I looked at UTMs this L 3:30, and I said, Well, this depends on how much deposition cost depends on MI taking the deposition, or am I defending the deposition or merely observing the deposition? Those are three different price points is that a fact witness, or is it an expert witness, or is it at the CEO of the company 30B6 because that’s going to affect the cost? L330 doesn’t tell you any of those things, but Sally tells you all because you can say, am I taking the deposition or am I defending the deposition? Or am I merely observing it? Is it a fact witness or an expert witness? Or is it a corporate Rep? Each one of these things has its unique identifier that you can tag up with much more granularity in a way that you can say OK for patent law, show me all the time that we’ve taken an expert deposition. You can tag these things up and get that structured data back.
Bill Bice: One of the things that strikes me about this conversation and what we were talking about with, with what’s happening with generative AI previously, is that firms’ investment in KM is suddenly going to pay off in whole new ways because if you do a great job of coding your data, you’re going to be in a vastly better place than your peers.
Damien Riehl: That’s right. Because of the retrieval augmented generation, you could imagine asking the LM out of the box, showing me all the time that we took an expert deposition in a patent case, and it’s going to hallucinate all over the place.
Right. But if you tag it up on the front end, here are the 50 or 100 documents where we’ve taken an expert deposition in a patent law case. Now, you have that structure to give the large language model much more to work with. So, I think you’re right that knowledge management, you know, this is not MLM or knowledge management. This demonstrates how they can do even stronger than any of them separately, and so you know the beauty of what we’re building here is that it can be done programmatically. That is, you don’t need to be able to do much to be able to do what I’m showing on my screen. Here you can tag up entire matters. Shoplifting is a criminal law area of law, and the service you provide is disputed.
So, tag up all your matters. Also, tag up the matters for the document side in your DMS. Saying this is an M and a matter save this as a patent litigation matter. But then, once you do that, you can also go into the document itself and say, OK, these are some tags. So, when NetDocuments is announced by the end of the year, they will say to the user, hey, user, we found these 25 SALI tags. Do you want us to tag these things up for you and the user can say yes, please. Similarly, how many ways in time entry are there to say deposition or contract or affidavit or statement of undisputed facts? Not many, and we’ve covered all those here. So, you can imagine going through all your time entries and saying what did what’s the work we did. How long does it take these things? And you know, doing this from a technical standpoint is trivial. We have 13,000 things that matter, and we have it all in GitHub. So, if you want to go to negligent misrepresentation if I say something false about Bill, which will probably happen by the end of this call, he can sue me next year. Misrepresentation. You can imagine three lawyers arguing over whether a lawyer would, number one, say that’s a negligence claim, and they’d be right. And then lawyer #2 would say, well, an account of this, but it’s also a misrepresentation claim, and they would be right; lawyer three would say it’s both. But Damian is saying something false about Bill, so that false statement is a defamation claim. And they’d be right. So, how do you settle the argument between lawyers one, two, and three? And the answer is you don’t because it is a negative claim. It is also a misrepresentation claim. It is also a defamation claim, and I went through hundreds of jury instructions throughout the country, and some jurisdictions call it negligent misrepresentation causing harm. That is a synonym. All of those have this unique identifier here, and then because it’s on GitHub, you could take all this and say Wow programmatically, look at this MS OWL file; that LSF OWL file is exactly what I’m showing you right here. So, this LMSS all file if I want to search for negligent misrepresentation and misrepresentation, here is the XML class for negligent misrepresentation as I make this a bit bigger. You could see that it is negligence. It is a type of negligence claim is also a type of. It is also a type of misrepresentation. The claim is also a type of defamation claim, and another name for negligent misrepresentation is negligent misrepresentation causing harm. All of them have this unique identifier here.
So, for the technical people on this call, you take this XML class and the entire XML class for all 190,000 rows of this here, and you can adjust it. And when my friend Jason Barnwell at Microsoft said that because it’s on GitHub, I could just go to this permalink BLOB here, and every time we update it like we just did last week, you get the most recent updated version of this. So, for example, Mike Bommarito, who helped beat the bar exam, took and translated to UK English to Spanish to Mexican Spanish, to Israeli to Hebrew. I guess to Hindi. Indian Hindi. So, now we have translations. So, every time we enrich the data set in this way, you can just, you know, go and pull it down and be able just to merge it into your tech stack. Uh, the only way this works is if it’s free and open source, and that’s why all of these companies here are working here.
Bill Bice: Right.
Damien Riehl: It would never happen if any of these companies tried to do a proprietary version of what we’re just doing. But because it’s free and open source on GitHub, this is the reason that we’re getting some traction.
Bill Bice: No, we need everybody to use it. I love the app that’s on the SALI website that demonstrates how you can take your taxonomy and translate it to SALI and therefore, make an easier transition.
Damien Riehl: Yes. I will show you how to do it manually, and then I will show you how the app makes it a little bit better. So, what Intapp did, Intapp is a, you know, a partner and user of SALI. They went through a bunch of websites like DLA Hogwarts and Jones Day, et cetera. They spoke. What do they call their areas of law? And so, you could imagine being a list you work for a law firm, you have an area of law list. So, one of the areas of law that a law firm says appeals to the PTAB. That, I’m sad to say, is not an area of law. The PTAB is the patent trademark and appeal or patent trial and appeal board. So, the area of law is patent because it appeals to the PTAB. Your service is the Appel practice; the PTAB is not an area of the law. The PTAB is a forum where you argue things, and so once people, technology people within a law firm, look at this, they say, oh, this makes way more sense. I can have a Frankenstein kind of variant law here or break it up into component parts.
Now I can run analytics on showing me all the patent things, all the appellate things, all the patent appeals, or all the patent appeals in the tab. Right. I can run all sorts of analytics, but this Frankenstein thing just doesn’t let me do so. The tricky part in the past, though, has been that it’s taken a long time to break these things down, so it took me like an afternoon to do a few hundred. But the tool that Bill mentioned is doing that job much, much easier where you could be able to these large language models to be able to do what I’m showing you on my screen here to be able to say, OK, here is an area of law list from a law firm that I’m not going to name because this area of law list is frankly embarrassing. Garnishment is not an area of law. Garnishment is just an A remedy that you tried to do. You garnish people’s wages.
The gas pipeline is not an area of law. The gas pipeline is just something that lives in the world. So, you can take a garbage ontology like this and say that SALI’s area of law is listed here. Then, you could give me four potential tags, four candidates in columns in the order of highest probability to lowest probability and look at what happens. It takes your garbage ontology and translates it to say, you know, that thing you call garnishment. The area of law looks a lot like labor and employment law or wage and hour law. The thing you called gas pipeline that looks a lot like energy law or oil and gas law. So, it takes conceptually maps your garbage ontology to SALI’s ontology. That then is being able to be interoperable with all of these. Here, we do that. This is done just in law. But the tool. You can see here that we can say garnishment, and it goes not only for the area of law, which is employment law, but it also goes down to SALI, which already has a thing called garnishment. It’s an objective. Then, you could go on to your next thing as you map it. You could be able to say, OK, patent infringement; patent infringement is the thing that I call an area of law. The area of laws, patent law, and patent infringement is a claim. So, it’s an objective; you can see here that it collects everything we did. Garnishment is an employment law.
The garnishment objective is garnishment pat infringement. The area of law is patent law. Patent infringement of objective is patent infringement. We could take all of that and put it into a CSV file. So now I’m downloading the CSV I just created, opening it up, and you can see here what I have. Garnishment is an area of law. Here’s the beautiful IRI for each of those things. Here’s the SALI label. Say garnishment relates to employment law, and garnishment relates to garnishment, et cetera. So now, this is a way that you could be able to create things essentially. And by the way, we also show definitions to the extent we have definitions and where we got this. It went from garnishment to area of employment law by using the large language model for patent infringement. There’s a thing called patent infringement. So, we just went with the label match. We didn’t even need to go to the lab, so this is all a way to do very quickly what it used to take a long time to do this kind of mapping. Now it takes a few seconds, and it doesn’t require the kind of knowledge that I have. Of all 13,000 things, you could use the tool and just be able to do a question.
Bill Bice: Yeah, there’s this feeling that maybe LM will save us from having to go in and fix our data. And ironically, the better your data is, the better you’ll be. But you can also use that technology to help push your data in the right direction, right? It becomes a very virtuous circle.
Damien Riehl: That’s right. The way this tool works that I just showed you is taking your text, doing a fuzzy match on it, then doing a Jaccard similarity, and then putting it out and using the tag. You can do that using open AI or barred or Dolly or T5 or whatever you want to do, but to the virtual circle of cleaning things up, you can imagine version two, which we’re building, and we had blex to our building. You can imagine saying Why don’t we go ahead and prompt all the areas of law that are saying here is a bunch of text. Here’s the input text large language model. Why don’t you give me the areas of law and legal concepts and companies and industries, and then you can imagine annotating it as you see here, being able to say here’s draft looks like a task.
Motion dismissal looks like a documented trademark, an area of law. And then be able to take that output and say, OK now for everything that the large language model thought is a task, run that across the SALI tasks, and it turns out that draft is a task within Sally and then map it up that way. Umm, so you can imagine having HTML-like or XML-like spans for your entire data set for all your documents, and doing it from a bottom-up way is probably a pretty smart way to be able to take up your data without having to have a lawyer take up their valuable time trying to take up drafting. Motion dismissed, etcetera.
Bill Bice: So, if I were a general counsel, I would want every firm I work with to use SALI so that I can ingest that work across multiple firms. Any what? What? What do you see happening there? Any early success stories?
Damien Riehl: That’s exactly right. Jason Barnwell, my friend from Microsoft, says he’s doing just that. So, he’s using SALI to take his internal business process, that people’s questions. So, as a businessperson, I ask a question of the legal department if he is ingesting that and being able to tag it up using the SALI tags. Then maybe the machine he’s building can answer that question directly without a human. But if not, it gets escalated to humans internally, and then if that internal human can’t do it, he could set out to a law firm. And so, he’s saying, hey, just like I’m tagging things up on my side, you as a law firm should also tag up the same SALI tags on your side so that when you give me the answer now, I could be able to say, OK, here’s the answer for this area of law in this jurisdiction for this industry. And now, I can enrich his internal data set based on the external tagging that the law firm has done. So, this is not just a way that you as a law firm can reach out to your vendors like tones or writers like Lexus, but also to reach out to your clients to say that we are forward-looking because we know you need to tag up the data internally at your corporation too. We’re going to help you with that.
Bill Bice: Yeah. And then you think about the additional benefits of pricing marketing, you know, going to your experience database and having a built-in experience.
Damien Riehl: That’s right. All the tasks not experienced in the database drafted a motion to dismiss for breach of contract in the Southern District of New York in front of Judge Smith. Each is a SALI tag, drafted a merger agreement with a force majeure clause regarding Indonesia. Each of those is a SALI tag, so every single noun. Each thing I just mentioned is a noun, and then every single verb drafted, reviewed, filed, each of those is similarly usually tagged. So, every work in the legal sphere, whether in-house or in the law firm, is covered by SALI.
Bill Bice: Yeah. So, getting the entire industry to use 11 taxonomies would be incredible.
Damien Riehl: Yeah, it’s a. It’s been my dream, and you know, if you told me I, I showed you that list of or showed you the list of vendors. That is the logos.
If you told me five years ago that we would have that kind of logo adoption by today, I would tell you you’re crazy. We could not get all those people to agree to things. We could only get them to agree because it’s free. It’s open source, but most importantly, it solves many of their problems and does that very smartly. That’s the reason we’re getting the wildfire adoption we are.
Bill Bice: Yeah, you are just the pain points. Very real. Every firm trying to create integration between their systems, every firm trying to create a data lake every, every single one of these problems is made easier if you have a taxonomy that everybody uses for the same purpose.
Damien Riehl: That’s right. And if you are in a law firm or an organization, you have two choices. Choice number one is to build your ontology. That’s not SALI, right? Number one, that’s hard. Also, you’re probably going to get it wrong. It doesn’t talk to anybody else, right? That’s option one. Option two is to use SALI. It’s easy. It’s good, and it talks to everybody. So, option one is hard and doesn’t talk to anybody; that’s like only one telephone in the world. Why would you use a valid telephone that doesn’t talk to anybody else? Why not buy the interoperable telephone that talks to everybody in the world? That interoperable telephone is SALI.
Bill Bice: Yeah. I’m impressed, operating in a market full of unique individuals and firms that have built value around that unique collection of individuals. And yet, the leverage that would come from this is still tremendous. So, I commend you on the progress that’s been made, and we all need to go out and be part of the effort to give full adoption.
Damien Riehl: Agreed, adapter yourself and force and not force, but demand that your vendors also adopt, and that’s and because it’s one thing for me to speak to a large vendor and say, hey, you need to get on board, and they’re like, yeah. Yeah, I know, but I have a road map that’s 10 miles long, so it’s one thing for me. Is SALI to do that? Another for you as a vendor-customer to say, hey, I’m paying you money. I need you to be on SALI because we’re on SALI. If they get enough of those requests, they’ll jump on the SALI bandwagon, and life will be better for everybody.
Bill Bice: Yeah. Well, thanks, Damian. Lots. Lots of good stuff.
Damien Riehl: Thanks so much, Bill. I appreciate you having me.