Building an audio file and feed JSON with AWS Step Functions and Lambda Layers.
When a user says “Alexa, what’s my Flash Briefing” then all briefings in their playlist will play in the order they specify. This can be configured in the Alexa App.
For Number Spies, the Number Station: Alpha Flash Briefing speaks a series of digits that represent an encoded message sent from a fictitious numbers station. On the days that there is no transmission, an audio file with static will play.
An audio Flash Briefing for Alexa needs 3 things:
- Feed Configuration
- Feed JSON file
- Audio file
In this article, we will detail how the Number Station: Alpha briefing is generated.
To create a Flash Briefing, you login to the Alexa Developer Console and create a new Skill selecting the Flash Briefing model:
A Flash Briefing needs to be for a specific locale so if you want to deploy to multiple countries (even if you are using the same feed and audio) you need to create multiple instances:
There is little to configure:
- Preamble — phrase to say before the briefing starts
- Name — feed public name
- Frequency — how often: hourly, daily, weekly
- Content type — text only or an audio file
- Content genre — Skill Store category
- Feed — URL to the feed
- Feed icon — icon for the feed
The main value is the Feed URL pointing to the feed.json file served by a public S3 bucket via HTTPS.
When the user asks to play their briefings, the feed file is checked and if the content has already played, the briefing is skipped.
For the content type, we selected Audio which means the feed file must include an audio file to play. There are two feed file formats that are supported: JSON or RSS. The recommended is JSON.
It is simple to host the feed.json in an S3 bucket and access it from a URL. Here is a sample file:
This file must be generated.
Notice the streamUrl property in the feed.json file. On days that there is an encoded message, the audio file will include the generated message (final-ad.mp3). For the other days, a file containing radio static will be randomly selected from a few options (static1-ad.mp3).
Flash Briefings can contain ads but Custom Alexa Skills cannot. The process that generates the feed and audio file will created 2 files: one that includes an ad and one that doesn’t.
In part 3, Number Spies — System Components Overview there is a diagram of the entire architecture. In this article, we will focus on this part”
What drives the feed and audio file generation is the Daily Process.
In the most recent post, we talked about the Content Management System that contains content to represent a message, the day to transmit it, and the One-time Pad (OTP) to use for encryption.
The Daily Process takes content from the CMS and each day at UTC 0 generates the files using a CRON job.
The advantage of a “headless“ CMS such as Sanity.io is the ability to query the content using an API. Sanity using a query language they call GROQ (Graph-Relational Object Queries).
This is the query used to get content for any Transmissions for today:
AWS Step Functions
What’s happening behind the scenes is that the CRON job sends a POST to an API Gateway endpoint which calls AWS Step Function that executes the first Lambda Function defined for the State Machine.
Advantages of AWS Step Functions is the ability to see the logic flow in a diagram and being able to break things into tasks. Each task has its own Lambda Function. Properties can be added as data flows through the steps which controls logic flow.
The flow shows that after an attempt to retrieve content for the current day, if there is no message then multiple steps are skipped.
The Lambda Function for this step is always called. It queries the CMS for any transmissions for today. If there are no messages, then default (static audio) feed data is created and passed to the Write Feed step.
If there is a transmission for the specific day, that data is passed to the next step.
Part of the content returned from the CMS is the secret message in plain text and a series of numbers in the form of a OTP. This step encodes the message and passes it to the next step.
This step takes the encoded digits, wraps it in SSML tags, and uses AWS Polly to generate a polly.mp3 file that is only the reading of the digits. This file is written to the S3 bucket to be used in the next step.
Check out the AWS Serverless Application Repository to find Lambda Layers or other code samples. You must add any layers to the AWS console before you can use them.
In this case, each layer contains the executable for the audio processing utilities and any files they need to run. These libraries are deployed and associated with the Lambda Function and only need to be deployed once. When the Lambda Function starts, the files for the layers are already there and ready to use.
An audio file for a numbers station transmission contains the following parts:
- silence (random length to add suspense)
- musical intro (ABC song)
- polly.mp3 (encoded message digits)
- musical outro (ABC song)
Concatenating multiple audio files together into one file is simple with the SoX library, so that is what I used.
To make the transmission sound more authentic, I have a few files that contain radio static. Using FFmpeg, the output from SoX is combined with the static overlay and then cropped to the shortest file (which will always crop the static).
The new output is processed again by FFmpeg, to convert it into a format that is compatible with Alexa Skill short-form audio and Flash Briefing audio.
This output is saved to the S3 bucket as final.mp3.
Any processing using libraries in Lambda Layers is done in the tmp folder and has size constraints. This step caps the size of processed audio to 240 seconds.
Write Feed Audio
The final.mp3 file generated in the previous step is perfect for playing in the Number Spies Alexa Skill when the player asks to play the transmission. But Flash Briefings have the advantage of allowing advertisements. To take advantage of that, this step runs the file through SoX again to add a promo to the end. The file is named: final-ad.mp3.
Both branches of the flow converge at this step. The player will either hear final-ad.mp3 or static1-ad.mp3. In either case, the feed.json file needs to be created and saved to the S3 bucket.
If there is a daily message, this step also writes a data.json file that includes message details used in the Alexa Skill and on the website.
Post to Webhook
A webhook on Zapier is used to post encoded messages to a Facebook Page. We will talk about this integration in a future part of the series. All you need to know is that the trigger of that part of the system is the daily CRON job but only if there is a message.
The final step is to invalidate the CDN which is used to access and cache files located in the S3 bucket. The main files that would change are: final.mp3, final-ad.mp3, static1.mp3, static1-ad.mp3, feed.json, and data.json.
Much of the definition for the AWS services is done using the Serverless Framework and is configured in the serverless.yml file:
- CRON job
- API Gateway
- Lambda Functions
- Step Function
Services that could be configured in serverless, but instead I configured manually (mostly due to expediency or lack of knowledge) are:
- S3 Bucket
- CloudFront CDN
The Number Station: Alpha Flash Briefing used in Number Spies is simple in concept: replicate the transmission from a numbers station. The configuration, feed.json, and audio file needed are also standard.
What makes this architecture more involved is the daily generation of the files based on content in a CMS and requiring processing by audio libraries. Using Step Functions makes it easier to follow the flow.
What processing have you done to support your Flash Briefings?
What audio processing and libraries have you used for your projects?
Creating an Alexa Game — Table of Contents
- Intro — From Idea to Code and Beyond
- The Spark of Inspiration for Number Spies
- Number Spies — System Components Overview
- Content Management with Sanity.io
- Number Spies Alexa Flash Briefing (this post)
- Number Spies Alexa Skill — Language Model
- Number Spies Alexa Skill — Why I Chose the Jovo Framework
- Number Spies Alexa Skill — Text-to-Speech and Speech Markdown
- Website Domain Name and Skill Invocation Name
- Number Spies Alexa Skill — Code (multiple parts)
- Number Spies Alexa Skill — Unit Testing with Bespoken
- Number Spies Alexa Skill — Skill Store Info
- Number Spies Alexa Skill — Analytics with Dashbot
- Number Spies Alexa Skill — Exception Monitoring with Sentry
- Number Spies Alexa Skill — User Acquisition with Voxalyze Then Not
- Number Spies Website
- Game Promotion & Social Media
- Is the Game a Success?