Existing Solutions

After much searching on the internet, I couldn’t find that many guides for writing decent voice-enabled Discord bots. Unfortunately, at the time of writing this, the guide I primarily followed seems to be down.

After going through several code bases ( 1, 2 ), I’ve managed to put together a bot of my own that I find to be pretty clean. Here’s how you can do it.

Outline

Programming the Bot
- Pre-Requisites
  - Node
  - File Hierarchy
  - Installing Node Modules
  - Final Configurations
- Programming the Bot
  - Hooking up a Command / Event Handler
  - Handling Actual Events
  - Capturing Guild Member Audio
  - Capturing Text Commands
Conclusion

Programming The Bot

Pre-Requisites

Node

First and foremost, we need to make sure that we have the correct version of node installed. For my bot, I used v10.16.10. You can check your versions from the following commands:

$ node -v
v10.16.10

File Hierarchy

Start a new project and create the following files / folders

rin/
    /audio/
    /commands/
        - ping.js
    /events/
        - guildMemberSpeaking.js
        - message.js
        - ready.js
    /promised/
        - Dispatcher.js
    config.json
    index.js

Note!! If you choose to enable voice recognition for your bot (which is honestly the whole point of this guide), then we need to grab one more file from the internet. I won’t outline the steps, but they can be easily followed and found here under the “Before you begin” section. All you need to do is follow those steps until you get a google credentials file. I forgot the default name, but we’ll use google-credentials.json for the purposes of this guide.

Go ahead and place that file anywhere. I put it in my project root, so now my file hierarchy looks like this:

rin/
    /audio/
    /commands/
        - ping.js
    /events/
        - guildMemberSpeaking.js
        - message.js
        - ready.js
    /promised/
        - Dispatcher.js
    config.json
    index.js
    google-credentials.json

Installing Node modules

Copy and paste the following into your package.json. Don’t forget to change things like the name or description. Feel free to do this at the end of the guide as well.

{
  "name": "rin",
  "version": "1.0.1",
  "description": "Discord bot",
  "main": "index.js",
  "scripts": {
    "start": "node index.js",
    "lint": "./node_modules/.bin/eslint .",
    "lint-fix": "./node_modules/.bin/eslint . --fix",
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "pre-commit": [
    "lint"
  ],
  "author": "bryngo",
  "license": "ISC",
  "bugs": {
    "url": "https://github.com/bryngo/rin/issues"
  },
  "homepage": "https://github.com/bryngo/rin/blob/master/README.md",
  "dependencies": {
    "@google-cloud/speech": "2.1.1",
    "discord.js": "github:discordjs/discord.js",
    "dotenv": "^6.1.0",
    "enmap": "^5.1.0",
    "ffmpeg": "^0.0.4",
    "i18n": "^0.8.3",
    "mongodb": "^3.1.8",
    "node-opus": "^0.3.2",
    "opusscript": "0.0.6",
    "pino": "^5.13.0",
    "pino-pretty": "^3.2.0"
  },
  "devDependencies": {
    "eslint": "^5.8.0",
    "eslint-config-standard": "^12.0.0",
    "eslint-plugin-import": "^2.14.0",
    "eslint-plugin-node": "^8.0.0",
    "eslint-plugin-promise": "^4.0.1",
    "eslint-plugin-standard": "^4.0.0",
    "pre-commit": "^1.2.2"
  }
}

Now, make sure your in the project root and run npm install from the command line. This might take a while.

Note that at the time of writing, the version of discord.js installed is discord.js@12.0.0-dev. It’s the newest “production worthy” push that hasn’t been shipped out yet. I used it for my bot because I found a lot of voice and audio improvements from it.

There shouldn’t be any problems installing all of the node modules. There should now be a folder called node_modules in your project root. Just make sure you have discord.js@12.0.0-dev installed (or maybe even just discord.js@12.0.0). You can check this by running the following:

$ npm list discord.js
...discord.js@12.0.0-dev ...

Final configurations

Copy and paste the following into your config.json

{
  "discordApiToken": "{DISCORD API TOKEN HERE}",
  "guildId": "{GUILD ID HERE}",
  "voiceChannelName": "{VOICE CHANNEL HERE}",
  "textChannelName": "{TEXT CHANNEL NAME HERE}",
  "languageCode": "en_US",
  "twice-clip": "audio/twice-bryan-guinn-ezra-brandon.mp3",
  "prefix": "?"
}

You can get your discordApiToken by

going to the discord developer portal,
selecting your bot
selecting “Bot”
And then copying the token on the right hand side that’s hidden by default.

You can get your guildId by right clicking a discord server of choice and clicking on copy ID. You need to enable developer mode in Discord which can also be found in the client itself.

voiceChannelName and textChannelName are just the plaintext channel name strings. These will be the channel our bot listens in and outputs text to.

For the twice-clip, that’s just a little something I did initially for my bot. Since we’ll be using Google’s Speech to Text API, we’ll be looking for a user to say a trigger word, and have our bot play an audio clip in response. Feel free to replace this audio clip with anything you’d like of course :). Some other clips can be found in my git repo.

prefix is just whatever character you want to prepend to your text commands. Choosing ? means one of my commands will look like this: ?ping.

Almost done with configurations. Lastly, we need to set up an environment variable for the Google Speech to Text API to work. We’ll be using the dotenv node module to help us with that. Since we should have already installed it in the above step, all we have to do is create a file called .env and put the following line in it:

GOOGLE_APPLICATION_CREDENTIALS="google-credentials.json"

The value of GOOGLE_APPLICATION_CREDENTIALS should just be the relative file path to the google credentials file you got from the internet. Though we will never explicitly access this environment variable, the Google Speech to Text API knows where to find it.

Programming the Bot

Hooking up a Command / Event Handler

Hopefully, you installed everything with 0 problems (usually unlikely in my experience). Copy and paste the following into index.js

require('dotenv').config();
const Discord = require('discord.js');
const config = require('./config');
const fs = require('fs');
const Enmap = require("enmap");

const discordClient = new Discord.Client();

// read in all of our configurations
discordClient.config = config;

// link all the events
// explanation for how this works can be found here:
// https://anidiots.guide/first-bot/a-basic-command-handler
fs.readdir("./events/", (err, files) => {
    if (err) return console.error(err);
    files.forEach(file => {
        const event = require(`./events/${file}`);
        let eventName = file.split(".")[0];
        discordClient.on(eventName, event.bind(null, discordClient));
    });
});

discordClient.commands = new Enmap();

// read in all the custom commands
fs.readdir("./commands/", (err, files) => {
    if (err) return console.error(err);
    files.forEach(file => {
        if (!file.endsWith(".js")) return;
        let props = require(`./commands/${file}`);
        let commandName = file.split(".")[0];
        discordClient.commands.set(commandName, props);
    });
});


discordClient.login(config.discordApiToken);

There’s quite a few things going on here, and I won’t take too much time explaining it, but we’re essentially hooking up a pretty clean and simple event / command handler for our bot. This keeps our files a lot more organized.

Additionally, we login to our server.

Handling Actual Events

A discord client has many, many events. We’ll be focusing on guildMemberSpeaking, message, and ready. In events/ready.js, paste the following:

module.exports = async (client) => {
    console.log(`Ready to serve in ${client.channels.size} channels on ${client.guilds.size} servers, for a total of ${client.users.size} users.`);

    console.log(`Logged in as ${client.user.tag}!`);

    const guild = client.guilds.get(client.config.guildId);
    if (!guild) {
        throw new Error('Cannot find guild.')
    }
    const voiceChannel = guild.channels.find(ch => {
        return ch.name === client.config.voiceChannelName && ch.type === 'voice'
    });
    if (!voiceChannel) {
        throw new Error('Cannot find voice channel.')
    }
    console.log(`Voice channel: ${voiceChannel.id} ${voiceChannel.name}`);

    const textChannel = guild.channels.find(ch => {
        return ch.name === client.config.textChannelName && ch.type === 'text'
    });
    if (!textChannel) {
        throw new Error('Cannot find text channel.')
    }
    console.log(`Text channel: ${textChannel.id} ${textChannel.name}`);

    client.voiceConnection = await voiceChannel.join();

};

We’re basically just joining the voice channel here whenever our bot is ready. We should be able to run this and see console output now. To do so, simply run npm start on the command line.

Note The file naming for events and commands are very important. To capture a particular event, the file must be named that event. For instance, to capture the guildMemberSpeaking event, the file must be named guildMemberSpeaking.js.

Capturing Guild Member Audio

In events/guildMemberSpeaking.js, add the following lines:

const { Transform } = require('stream');

const Dispatcher = require('../promised/Dispatcher');
const googleSpeech = require('@google-cloud/speech');
const googleSpeechClient = new googleSpeech.SpeechClient();

module.exports = async (client, member, speaking) => {

    if (!speaking || !client.speechEnabled) return;

    console.log(`I'm listening to ${member.displayName}`);

    const voiceConnection = client.voiceConnection;
    const receiver = voiceConnection.receiver;

    // this creates a 16-bit signed PCM, stereo 48KHz stream
    const audioStream = receiver.createStream(member, {mode: "pcm"});
    const requestConfig = {
        encoding: 'LINEAR16',
        sampleRateHertz: 48000,
        languageCode: 'en-US'
    };
    const request = {
        config: requestConfig
    };
    const recognizeStream = googleSpeechClient
        .streamingRecognize(request)
        .on('error', console.error)
        .on('data', async response => {
            const transcription = response.results
                .map(result => result.alternatives[0].transcript)
                .join('\n')
                .toLowerCase();

            console.log(`Transcription: ${transcription}`);

            // play an audio file if keyword is detected
            if (transcription.includes("twice")) {
                await Dispatcher.playFile(voiceConnection, client.config["twice-clip"]);
            }
        });

    const convertTo1ChannelStream = new ConvertTo1ChannelStream();

    audioStream.pipe(convertTo1ChannelStream).pipe(recognizeStream);

    audioStream.on('end', async () => {
        console.log(`I'm done listenting to ${member.displayName}`);
    })
};

function convertBufferTo1Channel(buffer) {
    const convertedBuffer = Buffer.alloc(buffer.length / 2);

    for (let i = 0; i < (convertedBuffer.length / 2) - 1; i++) {
        const uint16 = buffer.readUInt16LE(i * 4);
        convertedBuffer.writeUInt16LE(uint16, i * 2)
    }

    return convertedBuffer
}

class ConvertTo1ChannelStream extends Transform {
    constructor(source, options) {
        super(options)
    }

    _transform(data, encoding, next) {
        next(null, convertBufferTo1Channel(data))
    }
}

In promised/Dispatcher.js, paste the following lines:

async function playFile(connection, filePath) {
  return new Promise(async (resolve, reject) => {
    const dispatcher = await connection.play(filePath)
    dispatcher.setVolume(1);
    dispatcher.on('end', () => {
      resolve()
    });
    dispatcher.on('error', (error) => {
      reject(error)
    })
  })
}

module.exports = { playFile };

At this point, your bot should be able to transcribe messages! Start your bot with npm start and watch it transcribe. Make sure you’re in the same voice channel as it, of course.

Capturing Text Commands

In events/message.js paste in the following lines.

module.exports = (client, message) => {

    // Ignore all bots
    if (message.author.bot) return;

    // Ignore messages not starting with the prefix (in config.json)
    if (message.content.indexOf(client.config.prefix) !== 0) return;

    // Our standard argument/command name definition.
    const args = message.content.slice(client.config.prefix.length).trim().split(/ +/g);
    const command = args.shift().toLowerCase();

    // Grab the command data from the client.commands Enmap
    const cmd = client.commands.get(command);

    // If that command doesn't exist, silently exit and do nothing
    if (!cmd) return;

    // Run the command
    cmd.run(client, message, args);
};

If you recall, we hooked up event / command handler in index.js. This event will detect whether or not a user entered a command (by checking for the prefix) and run a specific file in the events folder. For simplicity, we’ll implement a ping command.

In commands/ping.js paste in the following lines.

exports.run = (client, message, args) => {
    message.channel.send(`pong! ${args}`).catch(console.error);
};

Pretty simple! All commands will take the same form. The file name will be the name of the command, and the function that’s ran will have the same signature as above.

Conclusion

Now you should have a fully function bot that’s able to capture audio and textual input! Please feel free to contact me if you have any questions.