Cultivating responsible AI practices in software development

As AI technologies become embedded in software development processes because of their productivity gains, developers face complex security challenges. Join Balázs Kiss as he explores the essential security practices and prompting techniques needed to use AI responsibly and effectively.

Koen Vervloesem

The use of artificial intelligence (AI) in software development has been expanding in recent years. As with any technological advancement, this also brings along security implications. Balázs Kiss, product development lead at Hungarian training provider Cydrill Software Security, had already been scrutinizing the security of machine learning before the widespread attention on generative AI. “While nowadays everyone is discussing large language models, back in 2020 the focus was predominantly on machine learning, with most users being scientists in R&D departments.”

Upon examining the state of the art, Kiss found that many fundamental concepts from the software security world were ignored. “Aspects such as input validation, access control, supply chain security and preventing excessive resource use are important for any software project, including machine learning. So when I realized people weren’t adhering to these practices in their AI systems, I looked into potential attacks on these systems. As a result, I’m not convinced that machine learning is safe enough to use without human oversight. AI researcher Nicholas Carlini from Google Deepmind even compared the current state of ML security to the early days of cryptography before Claude Shannon, without strong algorithms backed by a rigorous mathematical foundation.”

With the surge in popularity of large language models, Kiss noticed the same fundamental security problems resurfacing. “Even the same names were showing up in research papers. For example, Carlini was involved in designing an attack to automatically generate jailbreaks for any LLM – mirroring adversarial attacks that have been used against computer vision models for a decade.”

Fabricated dependencies

When developers currently use an LLM to generate code, they must remember they’re essentially using an advanced autocomplete function. “The output will resemble code it was trained on, appearing quite convincing. However, that doesn’t guarantee its correctness. For instance, when an LLM generates code that includes a library, it often fabricates a fake name because it’s a word that makes sense in that context. Cybercriminals are now creating libraries with these fictitious names, embedding malware and uploading them to popular code repositories. So if you use this generated code without verifying it, your software may inadvertently execute malware.”

In the US, the National Institute of Standards and Technology (NIST) has outlined seven essential building blocks of responsible AI: validity and reliability, safety, security and resiliency, accountability and transparency, explainability and interpretability, privacy, and fairness with mitigation of harmful bias. “The attack involving fabricated libraries is an example where security and resiliency are compromised, but the other building blocks are equally important for trustworthy and responsible AI. For instance, ‘validity and reliability’ means that results should be consistently correct: getting a correct result one time and a wrong one the next time you ask the LLM to do the same task isn’t reliable.”

As for bias, this is often understood in other domains, such as large language models expressing stereotypical assumptions about occupations of men and women. However, a dataset with code can also exhibit bias, Kiss explains. “If an LLM is trained solely on open-source code from Github, it could be biased toward code using the same libraries as the code it was trained on, or code with English documentation. This affects the type of code the LLM generates and its performance on tasks performed on code that differs from what it has seen in its training set, possibly doing worse when interfacing with a custom closed-source API.”

Effective prompting

According to Kiss, many best practices for the responsible use of AI in software development aren’t novel. “Validate user input in your code, verify third-party libraries you use, check for vulnerabilities – this is all common knowledge in the security domain. Many tools are available to assist with these tasks.” You can even use AI to verify AI-generated code, Kiss suggests. “Feed the generated code back into the system and ask it for criticism. Are there any issues with this code? How might they be resolved?” Results of this approach can be quite good, Kiss states, and the more precise your questions are, the better the LLM’s performance. “Don’t merely ask whether the generated code is secure. If you’re aware of the type of vulnerabilities you can expect, such as cross-site scripting vulnerabilities in web applications, specify them in your questions.”

A lot of emerging best practices exist for creating effective prompts, ie the questions you present to the LLM. One-shot or few-shot prompting, where you provide one or a few examples of the expected output to the LLM, is a powerful technique for obtaining more reliable results, according to Kiss. “For example, if your code currently processes XML files and you want to switch to JSON, you might simply ask to transform the code to handle JSON. However, the generated code will be much better by adding an example of your data in XML format alongside the same data in JSON format and asking for code to process data in JSON instead.”

Another useful prompting technique is chain-of-thought prompting – instructing an LLM to show its reasoning process for obtaining an answer, thereby enhancing the result. Kiss has assembled these and other prompting techniques, alongside important pitfalls, in a one-day training on responsible AI in software development at High Tech Institute. “For example, unit tests generated by an LLM are often quite repetitive and hence not that useful. But the right prompts can improve them, and you can also do test-driven development by writing the unit tests yourself and asking the LLM to generate the corresponding code. This method can be quite effective.”

Here to stay

With all these precautionary measures, one might wonder whether the big promise of AI code generation, increased developer productivity, still holds. “A recent study based on randomized controlled trials confirms that the use of generative AI increases developer productivity by 26 percent,” Kiss notes, with even greater benefits for less experienced developers. Yet, he cautions that this could be a pitfall for junior developers. “With the present state of generative AI, it’s possible to write code without understanding programming. Prominent AI researcher Andrej Karpathy even remarked: ‘The hottest new programming language is English.’ However, if you don’t understand the generated code, how will you maintain it? This leads to technical debt. We don’t know yet what effect the prolonged use of these tools will have on maintainability and robustness.”

Although the use of AI in software development comes with its issues, it’s undoubtedly here to stay, according to Kiss. “Even if it looks like a bubble or a hype today, there are demonstrable benefits, and the technology will become more widely accepted. Many tools that we’re witnessing today will be improved and even built into integrated development environments. Microsoft is already tightly integrating their Copilot in their Visual Studio products, and they’re not alone. However, human oversight will always be necessary; ultimately, AI is merely a tool, like any other tool developers use. And LLMs have inherent limitations, such as their tendency to ‘hallucinate’ – create fabrications. That’s just how they work because of their probabilistic nature, and users must always be aware of this when using them.”

This article was written in close collaboration with High Tech Institute. Top picture credit: Egressy Orsi Foto