Little Octopus @OpenledgerHQ has updated an important explanation of the Proof of Attribution mechanism, which, as always, covers some technical content. I don't understand the technology, so I can only try to write my understanding and the practical significance of this Proof of Attribution mechanism in a way that is easy to understand for everyone. Teachers who understand the technology are welcome to communicate. 🙋 Traditional problems: Traditional language models use n-grams (fixed-length word sequences) to predict the next word, which is fast but lacks context and cannot track the source of predictions or data contributors. 📖 Infini-gram mechanism explanation: OpenLedger's Proof of Attribution (PoA) mechanism uses Infini-gram instead of traditional n-grams, thus breaking the limitations of n-grams. It is a symbol span matching engine based on a suffix array for an ∞-gram (infinite length) attribution framework, capable of indexing sequences of all lengths in the training data and supporting real-time tracing of each token in the model output to accurately locate its source. Infini-gram supports OpenLedger's Proof of Attribution system, achieving precise tracking from model responses to training data without accessing the model's internals, maintaining speed, transparency, and reproducibility. 👍 The practical significance of this mechanism: (personal interpretation, corrections are welcome) 1. Precise tracing, transparent and reliable: Infini-gram supports tracking the AI model's output word by word, telling you exactly which training data the answer came from. It's like giving AI a "source label," making the entire process clear. Users can see what the AI said and why it said it, maximizing transparency and completing the tracing system. 2. Fair recognition of data contributors: For those who provide data, Infini-gram can clearly identify which data helped the AI generate the answer, allowing data contributors to finally be seen and recognized. This approach makes the AI ecosystem fairer and can encourage more people to share quality data. 3. Scalability + efficiency: Infini-gram's suffix array-based framework can track sources in real-time, regardless of the data volume, maximizing efficiency. Moreover, it does not require complex gradient calculations, allowing for simple and efficient scaling to massive datasets. 4. Making AI more reliable and compliant: Infini-gram can provide AI models with a "verifiable ID card," allowing the output content to be accurately traced back to its source. This not only increases user trust in AI but also meets regulatory requirements, helping AI to be more compliant and adhere to ethical standards. 🤔 The above is my personal thoughts and interpretations: Currently, some people are indeed trying to guide AI to make mistakes by feeding it incorrect information data. Little Octopus's function is extremely practical for helping everyone trace erroneous data and correct AI mistakes. I look forward to seeing the future implementation effects of this function.
Openledger
Openledger10.7. klo 20.45
OpenChat is powered by you, and built to give you credit. Every message you send, dataset you share or model you fine-tune is logged on-chain. With Proof of Attribution, your contributions aren't just remembered. They're rewarded 🐙
21,95K