On Thursday, Google announced concerning developments surrounding the security of its Gemini AI chatbot. According to the tech giant, “commercially motivated” actors have attempted to clone knowledge from Gemini by employing extensive prompting strategies. One particular adversarial session reportedly involved over 100,000 prompts in various non-English languages, aimed at gathering responses to train a more cost-effective imitation of the AI.
This revelation was part of Google’s quarterly self-assessment focusing on threats to its products and serves to position the company both as a victim and a protector within the tech landscape. Dubbed “model extraction,” Google frames this activity as a form of intellectual property theft, which raises intriguing questions about the industry’s reliance on data, particularly when it concerns how LLMs (Large Language Models) are trained.
The Irony of Copycatting
Ironically, Google itself has faced allegations of misconduct in this arena. In 2023, The Information reported that Google’s Bard team was accused of utilizing outputs from ChatGPT, shared on a public platform known as ShareGPT, to further enhance their chatbot capabilities. This led to internal turmoil when Jacob Devlin, a senior AI researcher and creator of the groundbreaking BERT model, warned that such practices violated OpenAI’s terms of service. Following this, Devlin resigned from Google to join OpenAI. Although Google denied any wrongdoing, they eventually paused the use of the contested data.
Ethics Behind AI Extraction
Despite these allegations, Google’s terms of service explicitly prohibit extracting data from its AI models in the manner described. The acceleration of “model extraction” techniques underscores a growing trend where private companies and researchers, motivated by competitive advantage, are resorting to questionable practices. Reports suggest that these attacks are originating from various locations around the globe, although Google has opted not to disclose specific suspects.
The Deal with Distillation
In AI terminology, the method of using previously trained models’ outputs to create new models is often referred to as “distillation.” This practice allows developers who may not have the financial resources or time to train their own LLMs to leverage existing models as shortcuts. While this could democratize AI development, it raises ethical questions regarding the boundaries of intellectual property and competitive fairness in the fast-evolving tech landscape.
For a deeper dive into Google’s findings and the broader implications for AI ethics, you can read more Here.
Image Credit: arstechnica.com






