Like it or not, this open source AI definition take a giant step forward
HONG KONG — To paraphrase the late John F. Kennedy, we choose to define open-source AI not because it is easy, but because it is hard; because that goal will serve to organize and measure the best of our energies and skills.
Stefano Maffulli, executive director of the Open Source Initiative (OSI), told me that the software and data that mixes artificial intelligence (AI) with existing open-source licenses is a bad fit. “Therefore,” said Maffulli, “We need to make a new definition for open-source AI.”
Also: How open source is steering AI down the high road
Firefox’s parent organization, the Mozilla Foundation, agrees.
The big tech giants, a Mozilla representative explained, “have not necessarily adhered to the full principles of open source regarding their AI models.” Also, a new definition “will help lawmakers working to develop rules and regulations to protect consumers from AI risks.”
The OSI has been working diligently on creating a comprehensive definition for open-source AI, similar to the Open-Source Definition for software. This critical effort addresses the growing need for clarity in determining what makes up an open-source AI system at a time when many companies claim their AI models are open source without really being open at all, such as Meta’s Llama 3,1.
The latest OSI Open-Source AI Definition draft, 0.0.9, has several significant changes. These are:
- Clarified definitions: The definition now clearly identifies models and weights/parameters as part of the AI “system,” emphasizing that all components must meet the open-source standard. This clarity ensures that the entire AI system, not just parts, adheres to open-source principles.
- Role of training data: Training data is beneficial but not required for modifying AI systems. This decision reflects the complexities of sharing data, including legal and privacy concerns. The draft categorizes training data into open, public, and unshareable non-public data, each with specific guidelines to enhance transparency and understanding of AI system biases.
- Separation of checklist: The license evaluation checklist has been separated from the main definition document, aligning with the Model Openness Framework (MOF). This separation allows for a focused discussion on identifying open-source AI while maintaining general principles in the definition.
As Linux Foundation executive director Jim Zemlin detailed at the Open Source Summit China, the MOF “is a way to help evaluate if a model is open or not open. It allows people to grade models.”
Within the MOF, Zemlin added, there are three tiers of openness. “The highest level, level one, is an open science definition where the data, every component used, and all of the instructions need to actually go and create your own model the exact same way. Level two is a subset of that where not everything is actually open, but most of them are. Then, on level three, you have areas where the data may not be available, and the data that describe the data sets would be available. And you can kind of understand that — even though the model is open — not all the data is available.”
Also: This AI model lets you generate videos using only your photos
These three levels — a concept that also appears in training data — will be troublesome for some open-source purists to accept. Arguments over both the models and the training data will emerge as the debate continues about which AI and machine learning (ML) systems are truly open and which are not.
Building the Open Source AI definition has been done collaboratively with diverse stakeholders worldwide. These include, among many others, Code for America, Wikimedia Foundation, Creative Commons, Linux Foundation, Microsoft, Google, Amazon, Meta, Hugging Face, Apache Software Foundation, and UN International Telecommunications Union.
The OSI has held numerous town halls and workshops to gather input, ensuring that the definition is inclusive and representative of various perspectives. The process is still ongoing.
Also: Sonos is failing and millions of devices could go with it – why open-source audio is our only hope
The definition will continue to be refined and polished via worldwide roadshows and the collection of feedback and endorsements from diverse communities.
OSI’s Maffulli knows not everyone will be happy with this draft of the definition. Indeed, before this version’s appearance, AWS Principal Open Source Technical Strategist Tom Callaway posted on LinkedIn, “It is my strong belief (and the belief of many, many others in open source) that the current Open Source AI Definition does not accurately ensure that AI systems preserve the unrestricted rights of users to run, copy, distribute, study, change, and improve them.”
Now that the draft has seen the light of day, I’m sure others will get their say. The OSI hopes to present a stable version of the definition at the All Things Open conference in October 2024. If all goes well, the result will be a definition that most — if not everyone — can agree promotes transparency, collaboration, and innovation in open-source AI systems.