Tech

AI in coding and DevOps – research reveals a security black hole

Published

3 months ago

September 24, 2024

Admin

AI in coding and DevOps – research reveals a security black hole

As the AI Spring heats up, the US Federal Trade Commission has slammed the data protection records of social media platforms and video streaming services, as many deploy artificial intelligence on users’ data without their knowledge or consent.

But these are merely the most obvious challenges of an age in which the use of generative AI and Large Language Models is becoming not just commonplace, but the norm in some industries. One case in point is software development, in which faster and faster release cycles are persuading programmers to use AI to help them code – or to generate code for them.

So, what’s the problem? First some context.

AI is already affecting our use of the languages we use in human-to-human communications, as I explored in a report last month. In it, I revealed how a popular transcription tool has taken to amending text and flying in data from external sources, but without permission or disclosure, and without citing what those sources might be.

Put another way, AI is altering our record of the human world and beginning to editorialize our narratives – not necessarily for the worse, of course. But in that process, it is asking us to take its pronouncements at face value, and to trust both the tool and the system that made it. That’s a big ask in the context of multibillion-dollar lawsuits for copyright infringement against AI companies.

Now apply that principle to code. If you use AI to help you program, to provide code, or to suggest alternatives, then where has that code come from? Who was its original author? Or did the machine spin new code from a range of different sources? If so, which sources – and did the AI favour one solution over another in a way that may have antitrust implications?

Either way, is the code good, fit for purpose, and trustworthy? Or was it originally placed in a popular repository by a bad actor or hostile state? Was the training data open source, or was it proprietary? If the latter, then was it scraped from an external location without the author’s knowledge and permission – or pasted into a cloud-based tool by the employee of another company?

That final ‘if’ is no laughing matter: as previously reported on diginomica, proprietary code is one of the most common forms of data pasted into generative AIs’ cloud interfaces, usually by employees who don’t realise they are divulging it to an all-too-interested third party.

But does any of this matter if the AI-generated or assisted code does the job for you?

Questions like this trouble nearly all organizations – or they should do. That’s according to machine identity management provider, Venafi. A new report from the company finds that 92% of 800 security decision-makers in the US, UK, France, and Germany have deep concerns about the use of AI-generated code within their organizations.

The title says it all: Organizations Struggle to Secure AI-Generated and Open-Source Code, and the report asks whether 1,000 times DevOps productivity from AI is worth 1,000 times the risk by accepting code from unknown or undisclosed sources.

The report suggests that, for the survey’s respondents, at least, they are taking that risk regardless. Indeed, it reveals the classic paradigm – the orthodoxy – of such reports: an overwhelming majority of people using a technology because of commercial pressures while being extremely worried about it. In a survey of security professionals that ought to raise some eyebrows.

Eighty-three percent of security leaders say their developers already use AI to generate code, says Venafi – it has become standard practice this early in the AI Spring.

As above, 92% of all respondents are concerned about the risks, however, 72% feel they have no choice but to allow developers to use AI to remain competitive. At the same time, 63% of organizations have considered banning the use of AI in coding because of the security risks – not just of the code itself being dangerous, but also its source being undisclosed.

Step back and consider those findings for a moment: nearly two-thirds of organizations consider the risk to be so great that they have contemplated banning the use of AI in coding. Despite that, nearly everyone is using it anyway. Why? Because that’s what everyone else is doing. Yet most believe that the output of these decisions will be… disaster.

That herd mentality is the real risk of any hype-driven age, especially one in which billion-dollar lawsuits are ongoing against some providers and, in one case a US District Judge has allowed claims against a vendor to proceed to discovery. Meanwhile, AI companies want the law changed to allow their behaviour rather than hold them to account for it.

For Venafi, another factor is what the survey calls the ‘governance gap’, which reveals yet more cognitive dissonance in the enterprise.

Nearly two-thirds (63%) of security leaders believe it is impossible to govern the safe use of AI in their organizations, because they have no visibility into where AI is being used. This creates what I would describe as an infinite regress of problems: a technology into which users have no insight – in terms of its source training data, at least – is being deployed in a way that leaders have no insight into either. That does not sound sensible.

For security professionals – whose job is to worry, perhaps – two-thirds report that it is already impossible to keep up with their own AI-enabled DevOps teams. Overall, 78% believe that the adoption of all this AI-developed code will lead to “a reckoning”, with security teams losing control of safety within the enterprise.

Yet here is the real crunch point: despite all of this, less than half of the organizations surveyed (47%) have policies in place to ensure the safe use of AI in DevOps. Yet another dissonant finding that is all too common in such surveys, and which emphasizes the obvious: companies are being driven by hype and tactical concerns rather than proceeding in a mature, strategic, sensible way.

That leaders are groping around in the dark of some lights-out AI party hardly inspires confidence that this age will be safe and secure for the rest of us.

Kevin Bocek is Chief Innovation Officer at Venafi. I put to him the point that businesses are rushing to deploy a technology that few understand, and which often lacks transparency and disclosure. Was this a factor driving his own research?

He says:

Yes, that is highly correlated. As cybersecurity professionals, we’ve been observing the rise of AI in coding. But where is that code coming from?

And there are other issues. Even when you’re using, say, GitHub Copilot or another system, you’re bringing certain biases and ways of coding to that. And you might also be bringing opinions and inaccuracies.

Plus, even the most advanced models, they’re just machines, they’re functions that give an outcome based on certain probabilities. They are not self-aware.

But today, developers have essentially got superpowers, because now they can bring the power of LLMs to generate code, which they believe to be better than their original, or which they didn’t have the expertise to deliver in the first place. But we are fast moving to a future where code is building code, and machines are building machines.

So, we wanted to know, are security professionals seeing this? Because it is not a genie you can put back in the bottle.

The survey’s respondents don’t sound confident…

They’re reasonably freaked out and concerned about what is developing, yes. People are already worried about how LLMs might bring bias to written and spoken English and other languages, so imagine what happens when it’s code and it’s driving the decisions that computers are making!

One could argue that code is more powerful today than the English language.

Indeed. That said, every new technology brings with it a host of naysayers and doom merchants, so looking at the past can be instructive. Take cloud, he says:

Hopefully we can learn from those times. And the crowd that say, ‘We should ban AI in coding’, that’s not going to be the winning approach. Either it’s going to set you behind personally, or set your business behind. So, it’s about how can we make coding with AI safe, when it’s already becoming the default, and will be the default.

True, but there is an obvious tension there: AI has become the norm in coding almost overnight, despite risks that are strong enough for nearly two-thirds of enterprises to have considered banning it. And, from what Bocek is saying, that practice needs to be made safe after the fact.

Even AI’s most ardent supporter must acknowledge the risk in that scenario, especially given that even some AI companies profess to not understanding their own systems.

He adds:

The generated code having biases and inaccuracies, that’s where I would be more concerned. And if one AI is giving me proprietary code, it might be good code, but I’m now running it myself, so that’s also a concern. And I think the audience of cybersecurity professionals would be concerned too.

There is another angle in Venafi’s research – what it calls the “open-source trust dilemma”, where an AI might use dated open-source libraries that have not been well maintained.

Venafi itself says:

On average, security leaders estimate 61% of their applications use open source – although GitHub puts this as high as 97%. This over-reliance on open source could present potential risks, given that 86% of respondents believe open-source code encourages speed rather than security best practice amongst developers.

Is that a fair assessment of a collaborative global movement? Some would argue that open-source tools are, at the cutting edge, extremely well maintained – almost by definition.

Bocek says:

I would agree that the transparency that open source brings is an opportunity to improve security. I think the research, though, can be heavily influenced by recent events where we’ve had adversaries look to embed vulnerabilities, and what those outcomes have been.

We’re definitely strong believers in open source, where it is used correctly.

My take

On the open-source question specifically, it is certainly a factor that – as my reports last month from KubeCon in Hong Kong explained – China is the second largest contributor worldwide to open-source and cloud-native projects, with AI very much in the vanguard.

While there is no suggestion that code from China’s young, committed, enthusiastic developers is automatically a security risk, it would be foolish to assume it isn’t from a geopolitical standpoint.

Which brings us back to the heart of the matter in terms of Venafi’s research. Do you know where that AI-assisted code came from? What is its source, or what was the AI trained on? Are you certain it is good code, and that it is not biased, flawed, insecure, or rooted in proprietary work that has been scraped without permission, or shared without knowledge?

And above all, perhaps, if more and more basic coding functions are handed to AI, on the assumption it can do it faster and better than a human professional, then – in time – who will have the expertise to check AI’s workings? Who will be the human in the loop?

Such questions are with us now for the long term, and even the most ardent AI proponent must accept that evangelism is not the way to find sensible answers. As ever, critical thinking is essential; and that is not the same as pessimism, naysaying, or being a doom merchant.

It’s about being a grown up and a professional rather than some e/acc cultist. Here endeth the lesson. Good luck!