Accepting User Media Input with the Bot Framework

by Michael Szul on

No ads, no tracking, and no data collection. Enjoy this article? Buy us a ☕.

Chatbots don't just send information, but they receive information as well. Bots are normally seen accepting information in a very choose-your-own-adventure way, as well as through LUIS--an implementation of natural language processing. This was as textual input, however, which is great for conversations, but chatbots are not just conversations, they are conversational applications. This means that they can do more than receive text, parse text, and respond. The Microsoft Bot Framework allows your chatbot to receive various forms of user input, including files (e.g., images, documents) and speech input.

Some of the toy bots that you will find in the Skype bot directory deal with things like mashed up images. In order to do this, several of the chatbots will ask for a photo (if the image being mashed up is to include you, or an image of something that the service does not have). Bots can accept files, such as images for processing. This is especially helpful if you are integrating with other services, such as a computer vision services in Azure or Amazon Web Services (AWS).

With the Bot Framework, you can actually prompt users to upload an attachment. Here is some example code on how that would work:

dialogs.add(new AttachmentPrompt("attachmentPrompt"));
      
      dialogs.add(new WaterfallDialog("attachment", [
          async (step: WaterfallStepContext) {
              await dialogContext.prompt("attachmentPrompt", "Send me a picture of your favorite train!");
          },
          async (step: WaterfallStepContext) {
              const attachments: Attachment[] = step.results.value;
              ...
          }
      ]));
      

In the second function in the waterfall, you see an attachments array. This is returned to the chatbot when the user uploads media. The interface definition for the Attachment object looks like this (from the Bot Framework source code):

export interface Attachment {
          contentType: string;
          contentUrl?: string;
          content?: any;
          name?: string;
          thumbnailUrl?: string;
      }
      

The contentUrl property is for when the content resides at a URL to be downloaded, such as the case with Skype attachments. The content property would contain embedded content.

Processing uploaded media may depend on the channel, and the security around the channel. If there is additional security needed to download, and interact with media, you may have to account for that. Basic attachments can be grabbed from the activity when you process the activities:

if (context.activity.attachments && context.activity.attachments.length > 0) {
          ...
      }
      

You can then create a method for downloading and processing the attachment or attachments:

if (context.activity.attachments && context.activity.attachments.length > 0) {
          await processAttachment(context.activity.attachments[0]);
      }
      
      ...
      
      async processAttachment(attachment: Attachment): Promise<any> {
          const localFileName = path.join(__dirname, attachment.name);
          return await request({
              url: attachment.contentUrl
          });
      }
      

The above example code assumes usage of the "request-promise" library, so you will need to import * as request from "request-promise" at the beginning of your file, or can use any other request or fetch library to asynchronously get external resources.