CARVIEW |
CC Signals Implementation
Dive into our early thinking below, then help shape what comes next! We’re looking for your ideas, feedback, and questions on the legal, technical, and social layers of this work.
❓We’d especially like to gather input on the following questions:
- CC signals are aimed at demanding reciprocal action by AI developers. What does reciprocity in the AI ecosystem look like to you? How can we improve the proposed signals to better achieve reciprocity?
- Community governance will be key to determining when and how CC signals are applied. How do you think decisions by content stewards should be made? Whose preferences matter in these decisions?
- CC signals prioritize standardization and machine readability to increase their utility, but this comes with costs. How do you think we should consider the tradeoff between context-specific considerations and the goal of collective action that spurs behavioral change by reusers operating at a massive scale?
Get involved by sharing your feedback.
The CC Signals
🏁Start here: If you haven’t already, read through the context and considerations that are informing the development of CC signals. You can also download our report.
Now that you have the background, let’s dig into the details.
The idea behind CC signals is simple. Using CC signals, a steward of a large collection of content can express a set of criteria that AI developers must meet. The criteria are organized around different dimensions of reciprocity, and are intended to drive meaningful, practical action.
CC signals are designed to be interpretable by machines, as well as humans.
The Suite of CC Signals
This project draws inspiration from fundamental concepts often referenced in the AI debate—consent, compensation, and credit— but with a particular angle. Our approach is driven by the goal of increasing and sustaining public access to knowledge.
Each signal includes the conditions by which content can be used for machine reuse. These are criteria that AI developers must meet in order to use the content for AI development. All of the criteria are designed to promote reciprocity in ways that are both meaningful and practical given the scale of machine reuse. Our initial proposal includes the following signal elements:
Credit: You must give appropriate credit based on the method, means, and context of your use.
Direct Contribution: You must provide monetary or in-kind support to the Declaring Party for their development and maintenance of the assets, based on a good faith valuation taking into account your use of the assets and your financial means.
Ecosystem Contribution: You must provide monetary or in-kind support back to the ecosystem from which you are benefiting, based on a good faith valuation taking into account your use of the assets and your financial means.
Open: The AI system used must be open. For example, AI systems must satisfy the Model Openness Framework (MOF) Class II, MOF Class I, or the Open Source AI Definition (OSAID).
🗒️Note: Credit is included in each signal because we believe it is a fundamental form of reciprocity, one that benefits the broader knowledge cycle. In this proposal, the other signals are mutually exclusive. The list of signals is intentionally limited so that the collective of data stewards and their communities data holding communities can align in calling for their adoption with AI developers. This will ultimately build networks for collective action, requiring reciprocity within the AI ecosystem.
How the CC Signals Work
Who is applying the signal:
A Declaring Party is someone who specifies how a content collection should be used by machines. Sometimes, the Declaring Party will hold copyright or have authority to represent rightsholders in the content. In these cases, a CC signal may have legal effect depending on the particular jurisdiction. In cases where a collection of content includes content from multiple authors, it will be the responsibility of the Declaring Party to coordinate among its community to determine the appropriate signal(s).
The scope of machine uses addressed by the signal:
The Declaring Party applies CC signals to a set of standard categories that encompass machine use, from general categories to more specific categories, such as Text and Data Mining, AI Training, Generative AI Training, and AI Inference. In order to maximize global interoperability, these categories will not be defined by Creative Commons. Instead, they will be based upon global standards being developed by the Internet Engineering Task Force (IETF). The CC signals framework is designed to evolve as the standard categories are finalized. The selected category makes up the scope of what activity the tool is intended to address.
What signal is applied:
The Declaring Party selects among the available CC signals. Once selected, the signal reflects the Declaring Party’s preferences regarding machine reuse. This means that the Declaring Party says that the selected category of machine reuse is allowed under the terms of the particular signal elements. The four proposed signal combinations are:
Credit
Credit + Direct Contribution
Credit + Ecosystem Contribution
Credit + Open
Similar to the CC licenses, CC signals will be both machine and human readable. The human-readable explanation of what happens when a signal is applied will be called a declaration. There will be a declaration for each signal, with variations based on whether the Declaring Party has copyright authority and the particular scope of machine reuse selected. The string of code used to apply a CC signal to a dataset will be called a content usage expression.
Legal Considerations
CC signals are designed as global tools, which means they operate across legal systems that work differently. In the context of machine reuse, copyright law is limited, uncertain, and inconsistent across jurisdictions. As a result, applying a CC signal is likely to have a different legal effect depending on who applies it and in what context.
Where copyright exists and is applicable, CC signals are intended to leverage the power of copyright without increasing its power.
This is not about creating new property rights; it is more like defining manners for machines.
For more detail, please see the report. Further research and analysis about the legal implications of CC signals will be a major focus of our efforts in the coming months.
Technical Considerations
CC signals are designed to build upon technical standards being developed by the Internet Engineering Task Force (IETF). We have included technical considerations and components of the CC signals on GitHub.
Adhering to CC Signals
Credit Signal
Attribution and provenance in the context of large AI models is complex, difficult, and rapidly evolving as technologies develop. However, this does not mean that the concept of credit should be seen as irrelevant or impossible in the context of AI. We seek to establish norms around what is possible, not letting the perfect be the enemy of the good. Like the attribution condition in the CC licenses, we imagine the credit signal element being enacted in any reasonable manner. We plan to develop guidance and best practices around credit in future stages of this work, drawing on the progress being made in this area by others in the field. For now, at a minimum, we expect this signal to require citation of the training dataset by the reuser. For techniques that enable models to retrieve information in response to queries, such as retrieval augmented generation (RAG), and other use cases where it is technically feasible to connect content with particular outputs, outputs must cite the collection as a source with a link.
Direct Contribution
This is not intended as a commercial transaction. It is designed to create a structure for financial or in-kind contribution to support the sustainability of the Declaring Party. The application of CC signals should not be seen as a business model, or even a way to reliably recoup costs. The contributions are intended to be proportionate, both to the particular type and scale of machine reuse, and to the financial means of the party undertaking it. As with credit, we plan to produce guidance and best practices for direct contribution as CC signals develop.
Ecosystem Contribution Signals
This is designed to spur contributions that support the commons as a whole. While the initial phrasing is very open-ended, we hope and expect that norms, best practices, and even new, collective-minded structures could grow around this notion in different sectors and for different types of reuses. The aim is to encourage a practice of giving back, infusing a norm of reciprocity in ways that will help sustain the ecosystem for all.
Open Signal
This signal element reflects the fact that making AI models open—by releasing model weights, code, or datasets for others to use and build on—is a form of reciprocity. Given the progress made by others in the field to provide meaningful definitions of openness, our proposal for this signal is more specific about what is required to adhere to it.
Incentivizing Adherence by AI Developers
We recognize that CC signals will rely on willing participation by AI developers to adhere to it. There are many reasons to be cynical about adherence, particularly when it is not legally required, and there are and will always be bad actors. However, we see many reasons to believe that uptake is likely.
For one thing, there is precedent. Although adherence hasn’t always been perfect, robots.txt functioned for many years as a way to encode normative expectations about—and help maintain the social contract for—machine reuse of content on the web. We also see the success of CC licensing as evidence that voluntary buy-in is possible. While CC licenses are built atop copyright law and therefore carry the weight of copyright infringement risk, in reality they work because people have chosen to adhere to them. Litigation involving enforcement of CC licenses is rare, and much of it involves litigants who are not operating in good faith. Instead, there are now tens of billions of CC-licensed works available in the commons because they are grounded in intuitive notions about what is fair and prosocial when it comes to sharing and reuse of knowledge.
There are also clear reasons why rational actors should respect and adhere to preference signals. As we’ve written earlier in this report, data from across the public web is a key component in developing large AI models. If those developing AI do not respect the wishes of creators, they risk eliminating incentives for people to share and widely distribute their works. Over time, this will compromise the accuracy, safety and currency of the models and services they build. This will be particularly acute for small firms, startups, nonprofits, and academic researchers, who would not have the resources to instead rely on costly licensing deals.
Share your feedback now on GitHub.