
Cultivating responsible AI practices in software development
The use of artificial intelligence (AI) in software development has been expanding in recent years. As with any technological advancement, this also brings along security implications. BalĂĄzs Kiss, product development lead at Hungarian training provider Cydrill Software Security, had already been scrutinizing the security of machine learning before the widespread attention on generative AI. âWhile nowadays everyone is discussing large language models, back in 2020 the focus was predominantly on machine learning, with most users being scientists in R&D departments.â
Upon examining the state of the art, Kiss found that many fundamental concepts from the software security world were ignored. âAspects such as input validation, access control, supply chain security and preventing excessive resource use are important for any software project, including machine learning. So when I realized people werenât adhering to these practices in their AI systems, I looked into potential attacks on these systems. As a result, Iâm not convinced that machine learning is safe enough to use without human oversight. AI researcher Nicholas Carlini from Google Deepmind even compared the current state of ML security to the early days of cryptography before Claude Shannon, without strong algorithms backed by a rigorous mathematical foundation.â
With the surge in popularity of large language models, Kiss noticed the same fundamental security problems resurfacing. âEven the same names were showing up in research papers. For example, Carlini was involved in designing an attack to automatically generate jailbreaks for any LLM â mirroring adversarial attacks that have been used against computer vision models for a decade.â
Fabricated dependencies
When developers currently use an LLM to generate code, they must remember theyâre essentially using an advanced autocomplete function. âThe output will resemble code it was trained on, appearing quite convincing. However, that doesnât guarantee its correctness. For instance, when an LLM generates code that includes a library, it often fabricates a fake name because itâs a word that makes sense in that context. Cybercriminals are now creating libraries with these fictitious names, embedding malware and uploading them to popular code repositories. So if you use this generated code without verifying it, your software may inadvertently execute malware.â
In the US, the National Institute of Standards and Technology (NIST) has outlined seven essential building blocks of responsible AI: validity and reliability, safety, security and resiliency, accountability and transparency, explainability and interpretability, privacy, and fairness with mitigation of harmful bias. âThe attack involving fabricated libraries is an example where security and resiliency are compromised, but the other building blocks are equally important for trustworthy and responsible AI. For instance, âvalidity and reliabilityâ means that results should be consistently correct: getting a correct result one time and a wrong one the next time you ask the LLM to do the same task isnât reliable.â
As for bias, this is often understood in other domains, such as large language models expressing stereotypical assumptions about occupations of men and women. However, a dataset with code can also exhibit bias, Kiss explains. âIf an LLM is trained solely on open-source code from Github, it could be biased toward code using the same libraries as the code it was trained on, or code with English documentation. This affects the type of code the LLM generates and its performance on tasks performed on code that differs from what it has seen in its training set, possibly doing worse when interfacing with a custom closed-source API.â

Effective prompting
According to Kiss, many best practices for the responsible use of AI in software development arenât novel. âValidate user input in your code, verify third-party libraries you use, check for vulnerabilities â this is all common knowledge in the security domain. Many tools are available to assist with these tasks.â You can even use AI to verify AI-generated code, Kiss suggests. âFeed the generated code back into the system and ask it for criticism. Are there any issues with this code? How might they be resolved?â Results of this approach can be quite good, Kiss states, and the more precise your questions are, the better the LLMâs performance. âDonât merely ask whether the generated code is secure. If youâre aware of the type of vulnerabilities you can expect, such as cross-site scripting vulnerabilities in web applications, specify them in your questions.â
A lot of emerging best practices exist for creating effective prompts, ie the questions you present to the LLM. One-shot or few-shot prompting, where you provide one or a few examples of the expected output to the LLM, is a powerful technique for obtaining more reliable results, according to Kiss. âFor example, if your code currently processes XML files and you want to switch to JSON, you might simply ask to transform the code to handle JSON. However, the generated code will be much better by adding an example of your data in XML format alongside the same data in JSON format and asking for code to process data in JSON instead.â
Another useful prompting technique is chain-of-thought prompting â instructing an LLM to show its reasoning process for obtaining an answer, thereby enhancing the result. Kiss has assembled these and other prompting techniques, alongside important pitfalls, in a one-day training on responsible AI in software development at High Tech Institute. âFor example, unit tests generated by an LLM are often quite repetitive and hence not that useful. But the right prompts can improve them, and you can also do test-driven development by writing the unit tests yourself and asking the LLM to generate the corresponding code. This method can be quite effective.â
Here to stay
With all these precautionary measures, one might wonder whether the big promise of AI code generation, increased developer productivity, still holds. âA recent study based on randomized controlled trials confirms that the use of generative AI increases developer productivity by 26 percent,â Kiss notes, with even greater benefits for less experienced developers. Yet, he cautions that this could be a pitfall for junior developers. âWith the present state of generative AI, itâs possible to write code without understanding programming. Prominent AI researcher Andrej Karpathy even remarked: âThe hottest new programming language is English.â However, if you donât understand the generated code, how will you maintain it? This leads to technical debt. We donât know yet what effect the prolonged use of these tools will have on maintainability and robustness.â
Although the use of AI in software development comes with its issues, itâs undoubtedly here to stay, according to Kiss. âEven if it looks like a bubble or a hype today, there are demonstrable benefits, and the technology will become more widely accepted. Many tools that weâre witnessing today will be improved and even built into integrated development environments. Microsoft is already tightly integrating their Copilot in their Visual Studio products, and theyâre not alone. However, human oversight will always be necessary; ultimately, AI is merely a tool, like any other tool developers use. And LLMs have inherent limitations, such as their tendency to âhallucinateâ â create fabrications. Thatâs just how they work because of their probabilistic nature, and users must always be aware of this when using them.â
This article was written in close collaboration with High Tech Institute. Top picture credit: Egressy Orsi Foto
