AI Impact on Software Development with GitLab’s David DeSanto

Jul 17, 2023

This is Cross Validated, a podcast where we speak to practitioners and builders who are making AI deployments a reality.

Today's guest on Cross Validated is David DeSanto, Chief Product Officer at GitLab. GitLab’s AI-powered DevSecOps platform serves more than 30m registered users and 50% of the Fortune 100.

1×

0:00

-50:06

Listen and subscribe on Spotify and Apple.

Follow me on Twitter (@paulinebhyang)!

Transcription of our conversation:

Pauline: Welcome to Cross Validated, a podcast with real practitioners and builders who are making AI in the enterprise a reality. I'm your host, Pauline Yang, and I'm a partner at Altimeter Capital, a lifecycle technology investment firm based in Silicon Valley.

Today our guest is David DeSanto, Chief Product Officer at GitLab, which is the leading DevSecOps platform. GitLab's platform enables organizations to plan, code, build, secure, deploy, and monitor software in one application with a unified data model and one interface.

Thanks so much for being on the show, David!

Would love to kick off with just a little bit more background on GitLab and the company's mission, as well as your role as Chief product officer.

David: Absolutely. So to start off, GitLab's mission is that everyone can contribute. And that's how we've come to be the leading DevSecOps platform. We want to make sure everyone who's involved in delivering software value has the tools they need to be successful to your question about history,

GitLab did start. Many years ago, about 10 years ago as a dev tool was focused very much on source code management or SCM. But as the company evolved, we realized you needed to have CI and CD be as close to your SCM as possible and that launched the platform. And then over the last several years, we've begun to include things like security and monitoring as part of the platform as well. And that's how we've become the DevSecOps platform we are today.

The company, as we continue to morph, is looking at what is the next thing we need to do to help companies be efficient in delivering software. And that kind of brings us to our topic today around AI. And we actually now see ourselves as an AI-powered DevSecOps solution.

Pauline: That's great. And how did you come to be Chief Product Officer at GitLab?

David: I actually joined GitLab in 2019. It was part of our growth to build out the platform even further. I was actually hired and brought in to add security and compliance to GitLab as a platform. And that's what's become our Ultimate tier today of our application.

And over the course of the last almost four years, I started September 2019. I ended up taking over more of product. And last year I got the honor of stepping up and becoming the Chief Product Officer for the company.

Pauline: Congratulations on the new role! And it's been really exciting. GitLab has announced so many AI releases in the last two months. And I know in your last earnings call about a month ago, CEO Sid Sijbrandij talks about AI fundamentally changing the way software is developed. Can you talk a little bit more about that? How do you see AI transforming how software is built today and then in the next, call it, 3-5 years?

David: So what we're seeing With AI is that AI is really about efficiency for a lot of companies. It's about leveling up their teams, leveling up their performance and being able to deliver software faster.

And so when we look at AI and GitLab, we see it as a way to help everyone be more efficient. And so we've been looking at ways to apply AI to all personas that use GitLab. That can be helping product teams plan more effectively and get their sprints ready.

Helping developers with things like our code suggestions, helping them code more effectively and complete code faster, helping teams build out unit tests So they know that codes being tested. Security teams being able to better triage and manage vulnerabilities that get introduced. And, operations teams and making sure that they're able to build CI / CD pipelines and deliver software effectively.

And so the reason why I started off with our goal is that everyone can contribute. We see that with AI too. If you just focus on just your development organization. Yes, they may get more efficient, but something else will break and we want to make sure that everyone's getting that benefit of AI, not just one team at the company

Pauline: When people think of AI and software right now, I think everyone thinks of GitHub Copilot. It really was the first use case that really took off in AI. And certainly GitLab’s position is DevSecOps. You need all three, and maybe others as they come on.

Can you talk to us about what is that workflow and why does AI need to touch each of them? And do you need sort of a unified way that AI is going to touch all of them? Or is it really more individual workflows that you're talking about that AI should empower?

David: Yeah, so you're correct. I would say that the first big splash into the market was GitHub Copilot. Microsoft has done a really great job with marketing, making sure everyone's aware that exists.

The reason why we approached it differently was that we, in talking to our customers, realized that there were pain points they were having today that were not actually related to code generation.

There were other parts of the workflow. And so to kind of your question, like, why AI everywhere? GtiLab first applied AI in 2021 into our code review functionality. When talking to our customers, the hardest part for them was getting through code review. And the reason why it took them so long as they were struggling to find the right code reviewer at the company.

And so we in 2021 acquired a company named UnReview who was focused on finding the right code reviewer using AI. And what was so exciting about this is that even when we applied it ourselves internally, we found that it was more effective at finding the right code reviewer than GitLab was with our own internal systems.

And why it's important is that we pride ourselves in how quickly we develop software. We deploy software over 100 times a month to GitLab.com. Every month we have a new software version. It usually has 20, 30, 50 new features in it. Lots of performance improvements and whatnot. And that's only doable because we're so effective at delivering software.

And we saw that that made us more efficient and addressed that customer pain point. We decided that was the best place to start. And so once we started looking at how that was helping, we then started going, I'll say shifting it left and getting into code completion. That was our next feature. We announced our AI powered code completion solution, which we call Code Suggestions December of last year.

And so we've just continued to see that get more mature. But to your original part of your question, Sid announced all the features we have released and it is three times more than what Copilot offers today. It's everything from helping with getting through planning. Give you an example of why that was important.

I'll use me as a user. So I'm the CPO of GitLab, but I'm still actively involved in working with the team on delivering software. I'll get pinged on a GitLab issue and they'll say, hey, can you give us your feedback on this? And that could have a hundred pages worth of discussion on it. How would I read that?

And be able to respond to them. So we thought, well, if you can apply AI to give a summary of the conversation actions taken, someone could be more effective in getting that issue resolved, to get into a plan for delivery. And so that's a great example of another place we applied AI. And we realized that then could shift right into the other parts of the DevSecOps lifecycle.

And that's where we announced at RSA this year, back in April, the launch of Explain This Vulnerability, giving developers and operations and security teams the ability to understand the vulnerability in natural language, see an example of how it's exploited, and then what to do to resolve it so that code is no longer exploitable.

And if you're then looking at that dev, the sec, and the ops, I always think about it as like multiple effects might be the right way to put it. If you made your developers a hundred times more effective at delivering software, something else is going to break. And if that's something else is going to break, you lose that 100x efficiency.

So we started looking at it. Code review’s a bottleneck, take care of that. Getting through planning is a bottleneck. Creating the software is a bottleneck. Understanding the vulnerabilities about another bottleneck. If you start applying that across the entire life cycle, you now help companies be more effective as a whole.

Not to really plug ultimate too much, but we did have a survey that said that customers who adopted get love ultimate. 7x improvement in their efficiency because all their teams are working together in a single platform, unified UI, unified data model and walls that they didn't realize when the organizations were broken down.

And so if you apply AI the same way our big, hairy, audacious goal is to make everyone 10x more efficient using AI. But that's only doable if everyone's using the same platform, every team is getting the boost, and then everyone can be more effective.

Pauline: I love that. And I love so many pieces that I love the dogfooding within GitLab, really listening to customers and delivering on their customer pain.

That's awesome. And I know that, we can't talk about the product roadmap probably in too much detail. As you think about this 10x improvement going forward, what are some of your hypotheses as to where those are going to come from, and how GitLab is going to be able to enable developers to get there.

David: Yes, a great question. So we can talk highly about roadmap. The one thing that's great about GitLab is we're very transparent. We're probably the most transparent publicly traded company. So a lot of what I can mention is actually on our website. So where we're seeing AI go as part of DevSecOps is really helping companies deliver software faster. And I know that sounds kind of cheesy, but when you think about it, earlier you mentioned the comment about AI and developers and what goes there.

I think what's going to end up happening is things like the low code / no code market might get consumed by AI, because if I now can write a sentence to describe what I need and I can generate it for me, I'm now building software faster.

I also see AI in the case of us finding ways to resolve issues in production. And so we've begun exploring how to do resolve this vulnerability. Allow not only just a description of it, but let a developer just hit a button and the AI can rewrite that section of code to remove that vulnerability.

Same thing on the operations side. AI Ops was a big talking point a couple of years ago. I think now with the technology that exists today, you can really make a true AI ops platform that can actually resolve production incidents and let you know what it did as opposed to needing to page someone in the middle of the night and then have to get them wake up, get them to understand and try to resolve the problem.

And I, for us, see us beginning to apply in a more proactive way versus a reactive way, which is a lot of the way AI is handled today. It's much more of a I need this problem and I'll go fix it as opposed to AI just taking an action on your behalf because it now understands the environment properly.

Pauline: And that's something that I think has been on a lot of practitioners' minds, which is AI right now is very powerful, but there's still so many limitations. Can you talk about maybe what you have found have been the biggest limitations in terms of fully actually achieving this goal of. AI allowing us to be 10x, 100x engineers, and what that’s looked like?

David: So for us, kind of the things that hit us as part of building out what we have today and where we want to go have really been around a couple of items. The first is scale. And scale could be looked at a couple of different ways. Obviously, one, there's still a chip shortage today and getting GPUs is a challenge.

And so even us, as we continue to grow and consume more GPU architecture to run the GitLab AI features. We sometimes run into that as well, where we go, oh, well, we've now hit a wall. We need to add another GPU to the cluster. It's also scalable knowledge. A lot of the time, and GitLab started here with building our own models, and we now have partnered with companies like Google to help with that. But there's this whole ‘do I need to build the internal knowledge in my company’, or ‘do I need to build out really strong prompt engineering in my company and leverage expertise that's coming from somewhere else’?

And that was another thing we did start. A lot of our AI started with our own models, and we still have our own models today. Our first feature suggested reviewers that came from the acquisition of UnReview still uses its own model. It's a very unique use case that would be very hard for a large language model to handle.

And then I would say the other big thing is just how do you be privacy first with AI? You hear a lot in the news about how AI has done something really good. And then you end up seeing another article that says, oh, no. AI leaks someone's intellectual property. And so for us, it's been how do we make sure we stay privacy first.

GitLab - even though we're a very transparent company, we're open core, we’re source available. We're actually trusted by more than 50% of the Fortune 100 to secure their intellectual property. And so if you build on top of that, we need to be in a world where we can provide AI and not need to potentially accidentally expose their intellectual property.

That's one of the most powerful things to me is GitLab is that as a journey as a company and being open core to have large, financial institutions trust GitLab with their source code that's very private. They want that burst of AI and we need to find ways to do that without having to compromise the integrity of their source code.

Pauline: Before I move on from this topic, can you talk a little bit about hallucinations? It's certainly been such a big topic and so many researchers we've met with are trying to solve this. What have been your interactions with the LLMs in terms of hallucinations and how to bring that back, especially in the world of software development?

David: I think hallucinations and I heard someone the other day called them confabulations to be a little more sensitive to what hallucinations would mean for humans. But really what it comes down to – and this is maybe part of a risk assessment you do as an organization – is how do you apply AI and avoid the pitfalls?

One is hallucination or confabulation. The other one is leakage of data. And what we've done at GitLab is we've taken a slightly different approach. I've seen a lot of companies today and a lot of organizations take a if my LLM is a hammer, everything is a nail approach. And so you get these really large.

Language models that have been trained on basically the internet, and then you can get an engagement. All of a sudden, they get off track and you're talking about the wrong thing. We took a different approach to say, we want to do best in class AI, which means for each of those use cases, we support those 13 that Sid showed on our earnings call,

Those are tied back to models that are specific for that use case. So that way you can get to smaller train models and more hyper focused train models that help begin to eliminate some of that hallucination. And so the best example I'd give for you of that is for AI and for security, we're using things like Google SEC Palm 2 model.

It's only trained on vulnerability data. If it's only trained on vulnerability data, it only knows to talk about vulnerabilities. You can't get it to talk about how to build a bomb, which was one of the things that was leaked online that one of the chat bots you could get it to describe napalm. It doesn't understand any of that.

So it can't start going off the track. And that's where I talk about that risk vs. the outcome you're looking for. And at GitLab, we've done that by doing specific models for specific use cases. Same thing with our code completion, by the way. It's only using models that have been trained on software and not on any content that's on the internet.

Pauline: That makes total sense. You also mentioned, first, you talk about the partnership with Google, and it sounds like GitLab is leveraging Google Cloud’s foundation models and open generative AI infrastructure. You’re also talking about having your own trained or fine-tuned models. I guess let's start with how do you think about the criteria of where are you using someone else's models and where are you using your own models?

David: So there's a couple of different ways we go about that. The first is the use case itself. And I'll use the one I mentioned earlier about our suggested reviewers. That is a model that's trained to recommend the right code reviewer. And that data is very hyper specific to the project where those developers work. And so when asking, can an LLM tell me the right code reviewer, maybe it can make a guess. Hopefully it would be a guess of an employee at the actual company and not something from a blog it consumed.

And so there we said, okay, we can build a very small model. It actually doesn't require GPUs to train and run, and it can be trained on that project’s code commit history to be able to identify the right code reviewer. And so there was a clear path to where we needed to provide our own model to do that.

But then there's other cases where, say, code completion, that actually last year when we launched that feature, it was using models that both we built and were open source models. The thing we ran into was, first, scalability. To continue to scale, that was getting very, very expensive for GitLab.

GitLab is not a cloud provider. We don't have cloud infrastructure. We are a consumer of actually all three public hyperscalers to run our business. So, we needed to rely on someone who has that architecture and that infrastructure. The other part became the efficacy of it. When we were training, we were training on approved projects, and we could get to a good level.

One of my favorite demos that I do with a customer in the future is like, let's build a port scanner. That's not something that's commonly seen online. And so I could get it to build that, but we want it to be more effective than what we could build. And so partnering with Google and specifically to your point, the Vertex AI offering they have as part of Google Cloud. We could then leverage their foundation models and add our own additional training on top of it, and then get to a higher level efficiency at suggesting code.

And I would say that's how we've kind of approached all of it. Even our chat bot, which we call GitLab Duo Chat, it was initially internal models. We've now decided that there's a good combination of partner models we could be using to get to the goal we actually want.

And so it really depends on the use case. Again, the risk we want to apply to that use case. And then what do we do to make sure that our customers are going to get the best outcome?

Pauline: That's a really helpful and thoughtful framework. So appreciate you going through that. You mentioned also how important customer data is. And to train your own models, you also need very specific data that is applied to your own workflows. And so how do you think about the fact that you have a very strong privacy-first approach and then balancing that with needing to move forward with data, which is so important to actually getting better models?

David: Yeah, so that's a great question. There's kind of two parts to how to look at it. So the first part was if we're not going to build the model and we're not going to provide all the training data, how do we do that through a partnership? And that's where partners like Google Cloud come in. We can actually have our same privacy-first approach and leverage their foundation models to do that.

And the one thing, and I mentioned this earlier about GitLab being very transparent, you can actually go to our GitLab doc site so docs.gitlab.com and see which models we're using, where the training data came from and all that. And that was a requirement to partner with someone like Google, and Google Cloud was able to provide those details so we can actually show that to our customers.

I think that's really the first part of that is like how do you do it? And then the other part is I think the future data training because the one question our customers last is like, well, that's great, you trained it all today, but how do you continue to train? And that's where we then look to leverage the open source community, partner with them,

Say, hey, if you're okay, we would like to also consume your code as part of the training. And we have a very I'll call it open source friendly program, open source projects can actually get GitLab Ultimate for free on GitLab.com if they move their projects there and then we find ways to partner with them and help them support their project and help us continue to support training and improving the models.

The one thing I will add is that there are AI services today that are available and some of them also lead with a privacy-first approach and that makes some really great partners for GitLab because they're seeing the same problems we have. And we don't want to be the reason why a customer's intellectual property leaks online, and they have that same value as well. And they're finding ways to work in that same framework that we set for ourselves. And that's where it's really great.

The one thing I couldn't stress enough for you is like when we decided to go into AI, we set some core tenants for how we wanted to operate. The first we talked about already is that boost of efficiency and making the entire organization effective. We've touched on the transparency and the privacy first, and those are really about making sure your code stays your code and we're telling you exactly what you're doing. So you can trust us. Those together make us enterprise grade.

But we just briefly touched on that best in class AI part. If you think that you can take one model and make it solve every problem, you can, but your risk is going to be higher. And if you have a risk averse business, that's what drives you to be that best in class AI. And I think that that's really one of the things that differentiates GitLab. Not only are we trying to be best in class as a DevSecOps platform, but we're also now trying to apply AI and a best in class method to give you, our user, the best result.

Pauline: That makes total sense. Certainly I think every enterprise that we speak to talks about the value of their data. And so I think that privacy first approach that GitLab is taking is gone pay dividends and already has, I'm sure.

Particularly as it relates to code or software development workflows, we hear of customers saying that the baseline or the benchmark issue of, yes, we want to be 10x better, but what's the base? Especially for you guys who've been working on, as well as building code and certainly have an internal goal of building software for yourselves faster, how do you think about that issue?

David: It's actually one of those, I call it like a chicken and egg problem. A lot of people say, oh, I heard that's going to make me 50% more effective. Why am I not there? And a lot of that has to do with being able to set the right expectations for yourself and then grow as the platform grows.

When code completion started and it's been a couple of years of products starting to offer that, what the industry saw was that if you're an intermediate developer, that's where you get that 50% boost in your effectiveness.

If you're a principal level developer, you may find AI code completion to be a nuisance and it just slows you down. There's been a lot of really interesting YouTube videos about this where a developer will say, hey, I had to generate this. Syntactically it's correct and sure it'll run, but it doesn't scale. And here are all the problems with it.

And so when you're talking about that baseline and like the expectation, you have to ground it in the maturity of your team today, the maturity of the product you're using, the goals you set for yourself. And then how do you grow with those to achieve that? Even GitLab, like when we adopt a new feature, we do roll it out slowly internally as well, because we don't want to impact the productivity we have today before getting it fully adopted.

I'm glad you called out the dogfooding earlier. I think that's one thing that I've really enjoyed about GitLab. We have such a low level of shame here that we'll release the future to our own development team and take the risk of them getting like, David, what the heck did you and your PM team come up with? This is terrible.

But it allows us to make sure if it works, it works for us. Then we feel very confident releasing it. And that's where I'd say that is too on that confidence level. It's just that trade off. Is it, do you have a bunch of new developers? They're going to get a big burst. Do you have a bunch of senior developers? Maybe not so much, but then how do you still help them become more effective?

Pauline: That's really interesting. And actually on that point, what's been, given your low level shame at the company, what's been maybe a surprising thing where you rolled out something with AI and said actually this is so terribly received and what's the learning that's come out of that?

David: So I'm not sure we've had a future yet where we've said, oh, wow, this is terrible. It's not doing what we expect. I think the two ones that I always think of the first would be our test case creation. So we have the ability to look at the code change and the merge request and generate unit tests for it.

On paper, that sounds really cool. And honestly it works really well. The lesson we learned was we can't make the decision whether those are the right unit tests for that project because maybe they already have a unit test for that in their test suite. And now we've just generated something that's duplicative.

We also have the situation of like, where does that test go? So today it all sits in our chat and the developer has to copy it out and put it in, but we were like, oh, maybe all those tests, we could find them in the project. And we found out that a lot of our customers don't always put their unit tests in the same project with the code base.

And so for us, I think that was one of those. It wasn't necessarily like, oh God, this didn't work as we expected, but more like, oh, we hadn't thought about all these other we'll call them use cases around that use case we're trying to solve.

The one that I will say was very effective, but I was surprised by who adopted it. The explain this vulnerability feature and the explain this code feature. Those are ones I thought of as very developer centric. I was like, okay, the developer wants to understand the code they're going to onboard into. They can have it explained to them.

And my favorite one, by the way, is if you select a function, it not only tells you about the function, but it tells you the health of those variables, like where did they come from? Where are they going next? How are they going to be manipulated? And all of a sudden it makes more sense.

And then I explained the vulnerability earlier, but it explains the natural language, gives examples of how to exploit it and how to resolve it. I thought developers would just eat those features up and I'm sure they do and they love them, but the descriptions coming back from our customers when we talked about it.

It's their operations teams and the security teams or QA teams that are adopting those. I talked to a platform engineering team. And they said the explain this code lets them know how to best support that code when it's deployed into production. And so now they don't have to be an expert and say go to understand the application. They can actually get that assistance from GitLab Duo. And that's going to then tell them what that code is doing so they know how to properly support it in production.

And same with the vulnerability items as well. There are amazing security researchers. I actually started my career as a security researcher. And sometimes you run into where you don't understand the programming language that app you're testing is running and they like that explain this vulnerability because it helped them better understand the actual code to be able then to better exploit it.

Pauline: That's really helpful. And I guess it is a reflection of how powerful AI is that there's no terrible product that's coming out of it. It's more of these other components that you have to think of so that makes a ton of sense. We've heard more recently, and certainly we've thought about this ourselves, this doomer mentality that software developers are going to be obsolete. I'm sure you've heard that as well. What's your reaction to that?

David: It's actually funny. The question, not the fact you asked it, but the actual question. What I've enjoyed is I've actually spent probably more weeks on the road this year than at home meeting with customers. We allowed GitLab employees to start traveling again now that the pandemic is not gone, but spun down enough.

And so getting to sit across from the desk and have these conversations with our customers has been phenomenal. And so I usually end up in these conversations where someone in the room is describing, this is going to make us 5x a better company. And then someone else in the room says, oh, no, the AI overlords are going to take over, we're going to be obsolete and the world's going to end in three years.

And I think what we have to do is look at AI for what it does today and what it can do. And if you start to demystify, it might be the right way to put it, you begin to realize, okay, it's not going to replace developers today.

Generative AI is getting better. But it's not to the point where you can actually say, build me this entire application, make it highly performant and deploy it here. You still need that knowledge of the developers.

I think what's going to happen over the next three to five years is that the role of the developer is going to change. And we're already seeing this at GitLab. We were looking at our team organizational chart and how we're structured to deliver software. And we're starting to realize we're having more and more prompt engineers than we are necessarily having like rails developers or front end developers. Those are still by far our largest developer group – the ones who are writing the actual application.

But I'd say three years ago, two years ago, we had like a handful of prompt engineers. We're knocking on probably close to about a quarter to a third of our total engineering org is doing some sort of prompt engineering now. And so that's how we can build so many features. We have to be able to engage with the models through prompt and analyzing output.

But what that's telling you is that change is what's going to happen with software engineering. And that's why I think the low code / no code market, which has had some good successes and some not-so-good successes in its existence. I think it's going to be consumed by generative AI and AI helping with software development.

And developers are going to be doing more abstract work. Maybe that's going to be prompt engineering. Maybe that's going to be assisting with code reviews and deployment of code. Maybe it's going to be writing the right architecture upfront so the generative AI can come in and fill in the actual application. Or maybe it just ends up being that they're now focused on building low code / no code type style applications, leveraging AI for their company to use.

There's a really good commercial on TV, and I don't know what software company it was, but they actually showed someone who started their day off doing I think it was helping with AR at the company and making sure things are getting paid, and at the end of the day, releasing an application. And the whole thing was with generative AI, this person now has a job that they can do more than just the thing they were hired in to do.

Back in the day, people were concerned about jobs disappearing because automotive manufacturing was starting to use robots. Those jobs did become less but the industry adopted and grew and people got different types of jobs.

And I don't see software development going away just like those jobs haven't gone away. It's just what they do will change. And I think that's actually okay. That's showing as an organization, as a world, we're maturing and growing and becoming more advanced.

Pauline: I love that answer. And certainly I think we don't agree with that doomer mentality, and I think it seems like people are sort of coming back to that's not going to happen, but certainly there is going to be a lot of change and I know we can't see into the future and you talked about a 3-5 year time frame.

If you look out 10 years, and I know that's quite long, given how much change has happened in the last 10 years, or even the last year, but what is your prediction as to what software development is going to look like in 10 years and what is the role that GitLab is going to play then?

David: It's a great question. So right now what we are talking about when we talk about where GitLab goes as a company, we talk a lot about how we become the single source of record for R&D organizations. I always think about it as when people say, hey, what's that GTM tool, they go Salesforce. Or what's the design suite you would use: Adobe. That'll be GitLab for R&D.

And so we've been focusing on how do we support the additional personas needed to deliver software effectively. And one of the big things we're doing today, which is a longer term investment, is model ops. I think model ops is going to become the thing that a lot of those software engineers may end up doing. It's still writing software, but now it's building AI models, training AI models, getting the right data sets, making them more effective, building the prompts to go with it.

And so about middle of 2021, we released GPU support for our CI runners. Just in the last couple of months, we've now released it for our SaaS runner fleet. So if you're a GitLab SaaS customer, you don't actually have to go source your own runners. You can actually use our runner fleet to do that. And this year we're working on things like a model registry to better version the training data, the ML models as they're built.

And to your question in 10 years, I'll take a step back and say, when we did our IPO, Sid, our CEO and co-founder, said every company has to become a software company to be successful. I think it might not be 10 years out, but I think everyone's got to become an AI company to be successful.

And that's going to be how do I leverage AI to make better business decisions, better support my customers, generate better software, monitor my environment better. It's going to be a cloud native world. And you're going to need scale. And I think that's where AI is going to come in. And I think model ops will be the key path to get there.

Pauline: Well, we'll have to revisit in 10 years and see how on or off your prediction.

David: Well, will podcasts exist in 10 years, like who knows?

Pauline: I'm not sure. We'll see.

David: We'll be on like a live stream on whatever V3 of Twitch will be at that point. And we'll catch up there. It might not be on a podcast.

Pauline: Exactly. Maybe in the metaverse.

So last question before we move on to Rapid Fire, GitLab has been competing and, in many cases, winning against GitHub, which, of course, is owned by Microsoft, the company du jour for AI right now. What have you learned in that experience? And what would you tell startup CEOs today trying to navigate the competition with Big Tech companies?

David: So there's a couple of things I would say to that startup founder, and I actually do enjoy talking to startup founders. So if you're listening to this podcast and you want to speak, reach out on LinkedIn, reach out on Twitter. You can also find me on Threads now too, wherever you want to reach out. I would love to have that conversation.

And so I usually give a couple of guidance points. The first is be thoughtful in how you partner. You can make a big impact with the right partner. Make sure they align on your company's values, your company's mission. We talked a lot about privacy today, and there's different AI providers, easily half dozen today. Some of them are privacy-first focused. Some of them are not. What is the risk tolerance you have there?

The other thing is, I would say, don't necessarily think first to market is the way to win a war. And I say that because there are some really big, large, successful tech companies who are known as the fast follower who becomes the best in market. And don't trade off that risk for that I need to be first and claim all the joy.

Whether it's social media platforms. I think the first rounds of those are all gone today. Same thing with some of the telephone, mobile phone providers. Some of those things that are first to class don't even exist anymore, or first to market don't exist anymore. And focus on being what the best version of you is and how you get there successfully.

And when it comes to AI, don't assume you have to do it all yourself. A lot of the current successful companies are becoming AI companies like GitLab. Yes, we have some data science teams. We have some really amazing, smart data science engineers and architects, but we're also still leveraging partnerships.

And if you look at all of them, even including GitHub, they're leveraging OpenAI. They're not doing everything themselves so Big Tech companies can be very scary. GitLab was one of those people, as you said, took on a large, Big Tech organization, and we're doing great. We continue to show that we can provide that enterprise DevSecOps experience, and now we're calling it AI-powered DevSecOps, and that's what you can do. So that's what I'd say to startup founders.

Pauline: I think GitLab has been an inspirational story in terms of taking on the David. And so, love that. Not to be punny with the name.

David: I am David and there was a Goliath.

Pauline: That's right.

So let's move on to the Rapid Fire round. First question: what is your definition of AGI? And when do you think we'll get it?

David: So I think AGI has a lot of different ways it can be coined. I've always thought about it as the thing that comes after generative AI. Generative AI is very much look at the input and then predict what the next word is. And there's not always a lot of logic or sound. That's how you end up with hallucinations or the confabulations. When AI models can begin to take less input and make bigger decisions, I think we're then starting on the precipice of what AGI will be.

As for the timeframe, I always jokingly say, I think the AGI will tell us when it exists as opposed to us saying that. But I think between hardware limitations, software programming limitations today, we're still several years away, but I will gladly be proved wrong if next year someone accomplishes that.

But even if you look at the big AI focused companies like Google, they're saying that's still a ways out. Generative AI is not there yet. And that's all about big data sets, big models, and being able to try to predict.

Pauline: As a quick aside, I have gotten feedback of why do you still ask that question? And I would say, I think hopefully we can look back and see how the definition as it stands today, mid July of 2023. And so I still like asking that question. So I appreciate you humoring me.

David: It's a great question. I would not stop asking it. I listen to a lot of podcasts, now including this podcast. And I love to hear what people think it is and like when they think it's going to happen. It's become like the next gambling drinking game thing where you're like, hey, if it's before this day, you've got to take a shot.

Pauline: That's right. Second question: what is your AI regulatory mental framework?

David: So I think this one's a tricky question in that regulation means a lot of things to different people. It could be government regulation, could be corporate self-regulation. For me, because of how quickly AI and specifically in this context, generative AI is growing, I think there needs to be regulation and regulation policies.

I think of the social media boost and boom and user privacy concerns and whatnot, and AI can do that same thing. And so for me, from a mental framework standpoint, how we make our decision at GitLab, it's that privacy first, being transparent and then, again, that best in class to make sure we're reducing risk as much as possible. But I think as fast as AI is changing, I think broader regulation would be a good conversation to have.

Pauline: What do you think that regulation should look like?

David: That's a great question. So I may be parroting some other smarter people than myself there. I'd like to listen to Pivot and Hard Fork and they talk a lot about regulation. And a lot of it is comparing it to nuclear or clean power or some of the other things around war that have been regulated.

If you can get everyone to agree, like, here's the line we won't cross. What are the things we're not going to be willing to do? Is it training on data that no one's given us permission for? Is it using it to build a smarter bomb? Is it to do hyper trading and shift money to more richer people? Like there's a lot of those conversations.

A lot of those are David DeSanto's opinion and thoughts of it, not GitLab as a company who's trying to make those decisions. But I just think that regulation sometimes is a scary word and sometimes it's not. So I just think we should have the conversation.

Pauline: And we'll certainly, I think, learn a lot more in the next 6-12 months. It is interesting though, given that I don't, I can't think of another technology where companies at the forefront have been proactively asking for regulation. And so certainly this is a first in many ways.

David: Yeah, definitely.

Pauline: Third rapid fire question: what is the biggest challenge facing AI practitioners or researchers today?

David: I think it's a combination of the means, which would be the hardware again and GPUs. It's also the right data to train on. I think what is starting to happen for practitioners is they're able to see the bigger picture and get to where they want to get to.

But now it's do I have enough data to train that LLM? Do I have the infrastructure to train it? There were some interesting data points that came out for some of the larger tech companies of the cost to run their AI infrastructure. And these were things measured in the billions. If you're an AI practitioner and you don't work at Meta or Apple or Google or OpenAI, you may not have the money or the means of your company to do that.

And so I think the biggest challenges are the ability to do it and the data set to do it are really the two that resonate for me as their biggest challenges today.

Pauline: Any advice that you would give startup founders who are dealing with it? I mean, certainly we meet with companies who want A100’s or H100’s and need to raise a minimum of $20 million in order to even get started. How would you advise those CEO’s?

David: I would leverage partners with things like Google's Vertex AI, Amazon AWS announced Bedrock. Those platforms allow you to abstract yourself from the CPU or the GPU enough that you can do your work without needing to actually physically acquire all those GPUs.

For us, we did acquire a bunch of GPUs and they’re running in one of our cloud partners. Actually, not only do we use Google cloud, we also use Oracle OCI. And so we were able to do that because we aren't a tiny startup anymore. And so you may not be in that situation and you can leverage those types of partners.

I'll also say, as a nice plug for GitLab, like we do model ops work today. We do a lot of ML Ops. So you can leverage GitLab's shared runner fleet of GPUs to do your work as well. You can actually build, train, and deploy directly from within CI with GitLab. So look at all the different options you have. You can find one that is affordable, or at least won't require you to buy A100s at $45,000 each and need you to buy like a hundred of them.

Pauline: Certainly I think society will benefit if and when the GPU costs come down. And so let's all hope that that happens sooner rather than later. Second to last question: who are one or two of the biggest influences on your mental framework on AI?

David: I hope I'm not mispronouncing her name, but Ajeya Cotra is someone I actually look up to. She does a lot of AI safety research and opened my eyes on a lot of things. And so the one thing I hadn't really thought about in depth – and maybe in hindsight it was obvious – but the ability to train AI models to favor reward from humans, which may then lead to bad behavior.

And it was that, whether you're using ChatGPT, Bard, I've not signed into Claude but I think Claude does have this too, you can vote things up or down. Well, the model's being trained to get that yes. And so this gets into doomsday, but there's really a great paper she'd written or is on the discovery site that she runs and it talked about how, does it become the model wants to get the yes and doesn't matter how it gets that.

So, like an example being, you ask the AI that's your supply chain to build me 500 and I think it was like 5 million of certain widgets. But to do that, it has to melt down things that protect a hospital. Those are the things that I think about when we're talking about prompts, responses, validation, so those responses. And so that's been really insightful.

The other inspiration I take is from Google and Anthropic who have found a way to focus on removing bias from the beginning and actually start with that. I think Anthropic calls it safety AI. That might be one of those things that as time evolves, getting that bias right before the model’s available could be the way that we actually get around some of the safety and biases and how AI can work today. And I think that's also something that's aspirational for GitLab.

Pauline: I'll have to put it in the show notes. I don't think I've actually come across her.

David: She was on Hard Fork and I then started following her on Twitter and then started subscribing to a newsletter. It was very interesting and insightful to hear an AI advocate talk about AI safety. And so a great person to follow and is really worried about the bigger problems, those doomsday scenarios that can happen from innocent things that are done today.

Pauline: Last rapid fire question: what is one thing that you believe strongly about the world of AI that you think most people would disagree with you on?

David: I'm not sure a lot of people disagree on this, but I like to remind people that AI is not magic. And I think a lot of us all got ourselves swept away last year and saw that you could talk to ChatGPT and it would answer things.

And you were like, oh my God, how does it do this? And I like to try to remind people that it is still just software. Now it's getting to be much smarter software, but it is still taking a prompt, predicting the outcome based off a series of events. And so don't get yourself blinded by it.

And I have a feeling a lot of the people who listen to your podcast probably don't fall into that category, but I have parents who are not young and they got swept up into it. I see organizations start to get swept up onto it. It goes back to my earlier comment about AI LLM as a hammer and then everything looking like a nail. And so I'm not sure people would always disagree with that, but keep that in the back of your mind that it's going to disappoint you at some point because it's only as good as the data was trained on.

Pauline: Always a good reminder. With that, really appreciate David, you coming on the show and for a wonderful conversation, really excited to continue to track how GitLab puts AI into its platform and where software development goes from here.

David: I enjoyed our time together. Love to come back anytime you'd like to have me. And thank you for all the amazing questions and the great conversation.

Cross Validated

AI Impact on Software Development with GitLab’s David DeSanto

Transcription of our conversation:

Discussion about this post