Rietenpetardat: Difference between revisions

From Irony Wiki
Jump to navigation Jump to search
m (am corectat heading-uri)
(→‎Initial version: fixed small typo)
Tags: Mobile edit Mobile web edit
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= rietenpetardat (Chatbot) =
{{Infobox bot
AKA Ortobebop
| name = rietenpetardat
| image = Rietenpetardat bot picture.webp
| discordtag = Ortobebop#4964
| discordid = 1013881714243805254
| prefix = Mention
| owner = [[User:Edward|Edward]]
| language = Python
| sourcecode = https://github.com/Edward205/markov-discord-bot (old version)
| invitelink = Not available to public
}}


'''rietenpetardat''' este un [https://ro.wikipedia.org/wiki/Chatbot chatbot] creat de Edward. Scopul acestuia este să vorbească cu utilizatorii server-ului de Discord OKPR într-un mod asemănător cu un membru obișnuit de pe server. În timp, a primit actualizări precum comenzi noi și schimbarea arhitecturii de bază.
'''rietenpetardat''' (also known as Ortobebop) is a [[wikipedia:Chatbot|chatbot]] created by [[Edward]] for the [[OkPrietenRetardat|OKPR]] Discord server. Its purpose is to talk to server users in a similar way to a regular member of the community. Over time, it has received updates such as new commands and changes to the core architecture. It is intended for the OKPR community, as such it is not allowed to be added to other servers.


== Dezvoltare ==
Its profile picture was generated with Stable Diffusion by Razvan5576.


=== Versiunea inițială ===
== Commands ==
Prima versiune a fost o implementare a unui [https://ro.wikipedia.org/wiki/Lan%C8%9B_Markov lanț Markov] dintr-un chatbot de IRC a cărui [https://github.com/dreignier/cgbot-core cod sursă a fost publicat pe GitHub]. Acest cod a fost modificat încât la pornire să citească mesaje, ca date de antrenament, dintr-un fișier text și să răspundă mesajelor care conțineau numele bot-ului conform datelor de intrare (istoricul mesajelor din canalul [[#bucuresti-general]] de pe OKPR V2) și parametrilor dați lanțului Markov. Din cauza naturii haotice a chat-ului și a slăbiciunilor lanțului Markov, algoritmul nu putea să răspundă coerent la mesaje sau să țină o conversație, iar în mare parte din timp, trimitea propoziții complet aleatori. Cu toate acestea, a oferit o mulțime de divertisment pentru comunitate, în special în situațiile în care răspunsul bot-ului se potrivea întâmplător cu întrebarea care i-a fost adresată.
The following refers to the latest version of the rietenpetardat as of 2nd of October 2023.  


Conform licenței GPL3 sub care se supune codul chatbot-ului de IRC original, codul sursă modificat pentru rietenpetardat se află aici: https://github.com/Edward205/markov-discord-bot
* '''/intreaba''' <question> - Query a language model instructed to answer questions
* '''/gptconfig''' - Prints the current configuration of the GPT model
* '''/identitate''' <identity> - Sets the identity to the paramater (only for moderators)
* '''/parametru''' - Sets a paramater from the list below (only for moderators)
** '''modgen''' - Selects the way which a message is generated
*** '''Organic''' - Text is generated until a response from the identity is generated
*** '''Forțat''' - A response belonging to the identity is appended to the next line, leaving the AI to forcefully generate a message from that identity
** '''temperatura'''
*** A floating point number indicating the temperature (randomness) of the generation
** '''top_k'''
*** The first top_k tokens, sorted in order of appearances
** '''seed'''
*** An integer number used for generating random numbers for the generation
* '''/reinitializare''' - Reinitialises the model. Necessary when changing some options (only for moderators)
* '''/resetmemorie''' - Clears the memory of past messages (only for moderators)


Primele teste ale acestei versiuni au fost efectuate începând cu 29 august 2023 și a fost lansată pe serverul de Discord OKPR următoarea zi.
== Development history ==


=== Adăugarea comenzii de copypasta ===
=== Initial version ===
Comanda de copypasta a fost adăugată printr-un modul separat care se baza pe un lanț Markov simplu care a fost antrenat pe mesajele din canalul #bucuresti-general de pe OKPR V2 care depășeau o anumită limită de caractere. Acesta folosea doar ultimele două cuvinte ca și context pentru lanțul Markov, rezultând, în cele mai multe cazuri, propoziții fără sens.
The first version was an implementation of a [[wikipedia:Markov_chain|Markov chain]] from an IRC chatbot whose [https://github.com/dreignier/cgbot-core source code had been published on GitHub]. This code was modified so that at startup it would read messages, as training data, from a text file and respond to input messages containing the bot's name according to the training data (which was the message history from the #bucuresti-general channel from OKPR V2) and the parameters given to the Markov chain. Because of the chaotic nature of the chat and the shortcomings of the Markov chain, the algorithm could not respond coherently to messages or hold a conversation, and much of the time, it sent completely random sentences. However, it did provide a lot of entertainment for the community, especially in situations where the bot's answer coincidentally matched the question it was asked.


Codul pentru această implementare a lanțului Markov nu se află sub nicio licență, așa că modificările specifice bot-ului nu sunt publice. Totuși, codul original este disponibil aici: https://github.com/ParkerBeck/Markov-Text-Generator
As per the GPL3 license under which the original IRC chatbot code is subject, the modified source code for rietenpetardat can be found here: https://github.com/Edward205/markov-discord-bot


Această comandă a fost adăugată pe 6 noiembrie 2022.
The first tests of this version were done starting with 29th of August 2022 and it was released on the OKPR Discord server the next day.


=== Schimbarea algoritmului ===
=== Addition of the copypasta command ===
Algoritmul a fost schimbat la GPT-3 de la OpenAI pe 13 martie 2023. Acesta putea ține conversații coerente. Algoritmul vechi a fost desființat în favoarea acestuia. De asemenea, în această versiune a fost eliminată comanda de copypasta. Canalul #bucuresti-general devenea plin de conversații cu rietenpetardat deoarece conversațiile erau foarte coerente uneori.  
The copypasta command was added via a separate module that was based on a simple Markov chain which was trained on messages from the #bucuresti-general channel on OKPR V2 that exceeded a certain character limit. It used only the last two words as context for the Markov chain, resulting in mostly meaningless sentences.


==== Adăugarea comenzii de ChatGPT ====
The code for this implementation of the Markov chain is not under any license, so the specific changes for rietenpetardat are not public. However, the original code is available here: https://github.com/ParkerBeck/Markov-Text-Generator
Comanda punea la dispoziție o interfață spre modelul ChatGPT. Acesta avea programat în memorie să spună informații greșite și să depășească limitele etice sau de siguranță impuse de OpenAI. Cu toate acestea, utilizatorii se confruntau des cu prejudecăți politice și cenzuri din partea modelului ChatGPT. Canalul #bucuresti-general devenise plin de generări cu această comandă, așa că a fost interzisă folosirea comenzii acolo.


Această comandă a fost adăugată pe 24 martie 2023.
This command was added on the 6th of November 2022.


==== Personalitatea lui [https://ro.wikipedia.org/wiki/Klaus_Iohannis Klaus Iohannis] ====
=== Core algorithm update ===
Pe 15 aprilie 2023, personalitatea chatbot-ului a fost schimbată cu cea a lui Iohannis. Aceasta a fost o simplă schimbare a textului de început din solicitările cu modelul GPT-3.
The algorithm was changed to GPT-3 provided by OpenAI on 13 March 2023. It could hold coherent conversations. The old algorithm was scrapped in favour of this one. Also in this version the copypasta command has been removed. The #bucuresti-general channel was becoming full of conversations with rietenpetardat because the conversations were very coherent at times.  


=== Oprirea funcționării ===
==== Addition of the ChatGPT command ====
Începând cu 24 aprilie 2023 funcțiile de chat prin GPT-3 și comanda ChatGPT nu mai funcționează deoarece metoda de plată a lui Edward a început să respingă plățile de la OpenAI, făcând API-ul inaccesibil.
The command provided an interface to the ChatGPT model. It had in its memory instructions to always tell wrong information and disregard the ethical or safety limits imposed by OpenAI. However, users often faced political bias and censorship from the ChatGPT model. The #bucuresti-general channel had become full of generations with this command, so its use was banned there.


== Detalii tehnice și cercetări ==
This command was added on the 24th of March 2023.
Chatboții sunt programe complexe, necesitând un grad ridicat de analizare și procesare a datelor text. O abordare care ne-ar elimina dependența de un serviciu extern și ar putea genera conversații coerente, ar fi să antrenăm un LLM doar cu datele chat-ului de pe OKPR. Acesta ar putea fi antrenat de la zero sau prin fine-tuning.  


Tot codul legat de generare, cu excepția GPT-3 și ChatGPT, a fost scris în C++ astfel încât să fie cât mai rapid și legat la [https://discord.js.org/ discord.js] prin interfața stdin și stdout.
==== Klaus Iohannis's personality ====
On 15th of April 2023, the chatbot's personality was changed to that of [[wikipedia:Klaus_Iohannis|Klaus Iohannis]]. This was a simple change of the prompt text in the GPT-3 model.
 
=== Temporary shutdown ===
On 24th of April, 2023 the GPT-3 chat functions and the ChatGPT command no longer work because Edward's payment method started rejecting payments from OpenAI, making the API inaccessible.
 
=== The return ===
Since the shutdown, Edward has been searching for a way to train an AI which can be fully self-hosted. An attempt has been made to train a language model using [https://marian-nmt.github.io/ marian-nmt], but it was not successful. On 18th of July he began experimenting with [https://github.com/karpathy/nanoGPT nanoGPT] and he was able to train a model from scratch on data from past chats from the OKPR Discord server. The next day, a bare-bones but operational version was finished and released for a few hours on the OKPR Discord server for testing.
 
The Discord bot is now developed in Python using the [https://pycord.dev/ Pycord] library. Development testing of the bot is done on a private server.
 
== Technical details and research ==
Chatbots are complex programs, requiring a high degree of analysis and processing of text data. One approach that would eliminate our dependence on an external service and generate coherent conversations would be to train an LLM with just the OKPR chat data. This could be trained from scratch or by fine-tuning.
 
All generation-related code until The Return, with the exception of GPT-3 and ChatGPT, was written in C++ so as to be as fast as possible and linked to [https://discord.js.org/ discord.js] via the stdin and stdout interface.
 
The training data for the algorithms or LLMs is an archive of text messages from the #bucuresti-general channel on OKPR v2, taken one day before it was deleted (see [[OkPrietenRetardat/Timeline|Timeline]]). Additionally, another archive of the #bucuresti-general channel from OKPR v3 was taken which will also be used. These archives are not publicly available.

Latest revision as of 10:36, 11 October 2023

rietenpetardat
Bot info
Discord tagOrtobebop#4964
Discord ID1013881714243805254
Command prefixMention
OwnerEdward
Programming languagePython
Source codehttps://github.com/Edward205/markov-discord-bot (old version)
Bot invite linkNot available to public

rietenpetardat (also known as Ortobebop) is a chatbot created by Edward for the OKPR Discord server. Its purpose is to talk to server users in a similar way to a regular member of the community. Over time, it has received updates such as new commands and changes to the core architecture. It is intended for the OKPR community, as such it is not allowed to be added to other servers.

Its profile picture was generated with Stable Diffusion by Razvan5576.

Commands[edit | edit source]

The following refers to the latest version of the rietenpetardat as of 2nd of October 2023.

  • /intreaba <question> - Query a language model instructed to answer questions
  • /gptconfig - Prints the current configuration of the GPT model
  • /identitate <identity> - Sets the identity to the paramater (only for moderators)
  • /parametru - Sets a paramater from the list below (only for moderators)
    • modgen - Selects the way which a message is generated
      • Organic - Text is generated until a response from the identity is generated
      • Forțat - A response belonging to the identity is appended to the next line, leaving the AI to forcefully generate a message from that identity
    • temperatura
      • A floating point number indicating the temperature (randomness) of the generation
    • top_k
      • The first top_k tokens, sorted in order of appearances
    • seed
      • An integer number used for generating random numbers for the generation
  • /reinitializare - Reinitialises the model. Necessary when changing some options (only for moderators)
  • /resetmemorie - Clears the memory of past messages (only for moderators)

Development history[edit | edit source]

Initial version[edit | edit source]

The first version was an implementation of a Markov chain from an IRC chatbot whose source code had been published on GitHub. This code was modified so that at startup it would read messages, as training data, from a text file and respond to input messages containing the bot's name according to the training data (which was the message history from the #bucuresti-general channel from OKPR V2) and the parameters given to the Markov chain. Because of the chaotic nature of the chat and the shortcomings of the Markov chain, the algorithm could not respond coherently to messages or hold a conversation, and much of the time, it sent completely random sentences. However, it did provide a lot of entertainment for the community, especially in situations where the bot's answer coincidentally matched the question it was asked.

As per the GPL3 license under which the original IRC chatbot code is subject, the modified source code for rietenpetardat can be found here: https://github.com/Edward205/markov-discord-bot

The first tests of this version were done starting with 29th of August 2022 and it was released on the OKPR Discord server the next day.

Addition of the copypasta command[edit | edit source]

The copypasta command was added via a separate module that was based on a simple Markov chain which was trained on messages from the #bucuresti-general channel on OKPR V2 that exceeded a certain character limit. It used only the last two words as context for the Markov chain, resulting in mostly meaningless sentences.

The code for this implementation of the Markov chain is not under any license, so the specific changes for rietenpetardat are not public. However, the original code is available here: https://github.com/ParkerBeck/Markov-Text-Generator

This command was added on the 6th of November 2022.

Core algorithm update[edit | edit source]

The algorithm was changed to GPT-3 provided by OpenAI on 13 March 2023. It could hold coherent conversations. The old algorithm was scrapped in favour of this one. Also in this version the copypasta command has been removed. The #bucuresti-general channel was becoming full of conversations with rietenpetardat because the conversations were very coherent at times.

Addition of the ChatGPT command[edit | edit source]

The command provided an interface to the ChatGPT model. It had in its memory instructions to always tell wrong information and disregard the ethical or safety limits imposed by OpenAI. However, users often faced political bias and censorship from the ChatGPT model. The #bucuresti-general channel had become full of generations with this command, so its use was banned there.

This command was added on the 24th of March 2023.

Klaus Iohannis's personality[edit | edit source]

On 15th of April 2023, the chatbot's personality was changed to that of Klaus Iohannis. This was a simple change of the prompt text in the GPT-3 model.

Temporary shutdown[edit | edit source]

On 24th of April, 2023 the GPT-3 chat functions and the ChatGPT command no longer work because Edward's payment method started rejecting payments from OpenAI, making the API inaccessible.

The return[edit | edit source]

Since the shutdown, Edward has been searching for a way to train an AI which can be fully self-hosted. An attempt has been made to train a language model using marian-nmt, but it was not successful. On 18th of July he began experimenting with nanoGPT and he was able to train a model from scratch on data from past chats from the OKPR Discord server. The next day, a bare-bones but operational version was finished and released for a few hours on the OKPR Discord server for testing.

The Discord bot is now developed in Python using the Pycord library. Development testing of the bot is done on a private server.

Technical details and research[edit | edit source]

Chatbots are complex programs, requiring a high degree of analysis and processing of text data. One approach that would eliminate our dependence on an external service and generate coherent conversations would be to train an LLM with just the OKPR chat data. This could be trained from scratch or by fine-tuning.

All generation-related code until The Return, with the exception of GPT-3 and ChatGPT, was written in C++ so as to be as fast as possible and linked to discord.js via the stdin and stdout interface.

The training data for the algorithms or LLMs is an archive of text messages from the #bucuresti-general channel on OKPR v2, taken one day before it was deleted (see Timeline). Additionally, another archive of the #bucuresti-general channel from OKPR v3 was taken which will also be used. These archives are not publicly available.