Creating “The Functionality” Part 1: Introduction and “Existing Functionality”

In the post with the overall description with the magic formula for Skills, we broke those down in two parts: The Interaction Model and “The Functionality”. My usage of quotation marks is not just for fun. In the documentation provided by Amazon, the interaction model is mentioned by name all the time. The other part, no. So I decided to coin the term myself. Any kind readers with a better suggestion please leave it in the comments!

So, it’s now time to discuss “the Functionality”. This is what I’ve already said about the matter:

“The functionality” can be an application that already exists (Fitbit, Uber, etc. were happy systems with millions of users before Alexa was invented), or one you make now to be used specifically with Alexa. In the first case, the developers for that existing system will have to develop an interface that uses the AVS API. Well, actually, a product manager will have to identify the functionality that will be used via Alexa, then the developers will encapsulate that functionality in a way that can be exposed to the AVS API so that Alexa can use it. In some cases the developer and the product manager are the same person!

If you’re creating a Skill from scratch, then Amazon recommends that you build and host “the functionality” with Amazon Web Services and they suggest you do it as a Lambda Function. We’ll speak a lot about this soon, stay tuned!

 

I don’t know if it was clear enough, so here it goes. “The Functionality” is the stuff that the Skill actually does (telling the time, the horoscope, telling you the status of a flight, telling you how to prepare a mojito, suggesting which wines go well with pasta, etc.) And of course this functionality can already exist and is currently used through a different format (smartphone app, Web application, wearable, plain old desktop application, etc.) or you can create something totally new.

Existing functionality: Focus is Key

This will be the most frequent case. Your bank decides to offer their services to Alexa. Your fitness tracker expands the way to interact with you with voice.  And a long et cetera. Every month new Skills with existing functionality are published in the Alexa Skills list.

So, how does it work? You already have a working system with zillions of users. How do you add it to the list of stuff that Alexa is capable of doing? Well, first of all you need to define what functionality you want to expose to Alexa (“expose” here means “make available to”). Imagine you’re a bank. What do you want your customers to be able to do with voice? You have to get a lot of things into consideration. Stuff that is different with voice than with other interfaces. This list is not exhaustive:

  • Security & Privacy considerations: anyone in the house can give instructions to Alexa. It’s probably not a good idea to be able to do bank transfers via voice. And everyone around will hear what Alexa says. Is it okay to hear your account balance? Don’t even think of protecting transactions with passwords. Because the point of Alexa with many user personas is that they can only use voice, and the possibility of eavesdropping makes saying passwords aloud a no-no.
  • Ergonomy: Okay, this is the realm of the Voice Interaction Designer, but  you really need her input to decide what will fly and what will never work. Imagine you want to interact with your fitness tracker via Alexa. Will there be any value in hearing the list of your heart rates minute by minute? Will you remember it, will you aprehend it? The amount of information that a human can process depends on the sense being used. Sight is okay for browsing and for finding a needle of information in a visual haystack. Hearing is not.
  • Value and coherence: You want to implement stuff that is useful to the user and that brings value to your organization. And you want to paint a coherent picture to your user. He or she should not get frustrated because things you’ve implemented lead her to believe that similar or related things, ones that seem equally important to the user, are also implemented, when they are not.

Sounds daunting? No, not really. It’s just a lot of work. This is why you need a Product Owner, or you need to be able to act like one and devote enough time to it, when designing any kind of system. You need someone who knows well the needs of the organization, the needs of the user and who is capable of understanding the possibilities and limitations of the technologies being used.

Okay, imagine you’ve done all of that and you have a list of “services” you want to use through Alexa Voice Services. What do you have to do now? Easy. Get your Developer and your Interaction Designer together, get them together, make them read and understand the post about the Interaction Model (probably they know much more than me, so maybe they skip this part!), make them agree the “contract” between Functionality and Interaction Model (the Intent Schema) (don’t let them part ways until they do this!!)

Technical Implementation

Now your Developer can start work. It’s all about creating a Web service that exposes the functionality that you wish to serve via Alexa. Remember this diagram? It’s the bit at the bottom right.

AVS Overview
AVS Overview – “The Functionality” depicted on the bottom right part of the diagram.

Your Web Service must comply with the following (extracted from here. My comments between [square brackets]):

  1. The service must be Internet-accessible. [Pretty obvious, eh! But not easy to achieve in some big organizations.]
  2. The service must adhere to the Alexa Skills Kit interface. [More on it later]
  3. The service must support HTTP over SSL/TLS, leveraging an Amazon-trusted certificate.
    • For testing, Amazon accepts different methods for providing a certificate. [i.e. you don’t have to shell out money buying a certificate when you’re just testing]. For details, see the “About the SSL Options” section of Registering and Managing Custom Skills in the Developer Portal.
    • For publishing to end users, Amazon only trusts certificates that have been signed by an Amazon-approved certificate authority. [You work with Amazon, you leverage their services, you accept their rules. Certificates are a matter of trust anyways and you should use the ones they trust!]
  4. The service must accept requests on port 443.
  5. The service must present a certificate with a subject alternate name that matches the domain name of the endpoint.
  6. The service must validate that incoming requests are coming from Alexa. [This last point is actually trying to protect you from DoS attacks]

So, the secret of the sauce is in complying with the Alexa Skills kit interface. And believe me, this will be trite unless you understand how the custom skill works, what you need to do to react to Intents, how to handle slots, and so on. To do that, you need to understand very well the interface specification but most important perhaps, have a broad picture of how everything clicks together. To do that, I recommend two things:

This will be time well spent, it will pay off with a high return rate later.

Good luck!

Creating the Interaction Model

As we said in the previous post, a Skill has two distinct parts: the Interaction Model and what I call “the functionality”. In this post we will try to describe the elements of the Interaction Model, the rationale behind them, behind the split, and the shortcomings or limitations of the model adopted by Amazon.

So, quoting what we said already:

The Interaction model is everything related with speech. It’s where you specify the Invocation name, the slots that your Skill can understand, and very important, examples of whole sentences that your Skill can process. These sentences are called “Sample Utterances” and you will spend many hours perfecting those. There’s also something called the “Intent schema” and it’s very, very important, because it defines the different tasks that Alexa will be asking to “the functionality”, based on what the user has asked Alexa to do. It’s where you define the hooks between the two parts of the Skill.

We mentioned four elements:

  • Invocation name
  • Intent Schema
  • Slots
  • Sample Utterances

Let’s start from the beginning!

Invocation name

We saw the other day that this is not the name of the Skill, but you will probably decide that they are identical. The Invocation name is made of the words that you pronounce so that Alexa can figure out which Skill you want to use. You will always use it in conjunction with the wake word (Alexa! Echo! or Amazon! at the time of writing) and some verb: start, ask, etc. So, when you’re deciding on an invocation name, it’s worth trying it out. Just imagine how the Skill will be used:

  • “Alexa, start <<name of my skill>>”
  • “Alexa, ask <<name of my skill>> to…”

Make sure that the sentences above are easy to remember, easy to say, and easy for Alexa to recognize. My two golden rules would be:

  1. Make sure that the entire sentence is semantically and syntactically correct (i.e. makes sense). E.g. if you’re going to invoke your skill in the first way (“start”), it’s best that your Skill name represents a thing (saying “start the car” sounds okay, saying “start the driver” sounds really weird). If it’s going to be the second  way (“ask… to…”), then you probably want the Skill to represent a profession or a person who carries out a task (e.g. Wine Helper, Dream Catcher, things like that). Also avoid falling into language ambiguity. More on this later when we talk about Utterances, and limitations of the model adopted by Amazon.
  2. Make sure that it’s easy to pronounce, you don’t want to end up with a tongue twister, or drive those with a particular accent crazy. Using the word “think” it’s probably a very bad idea.

Intent Schema

This is the most technical part of the Interaction model because it’s the boundary between “the functionality” and speech. You can say it’s the “contract” between these two parts of the Skill. Once defined, voice interaction designer and developer can part ways and do their thing. Once they finish, if both complied with the Intent Schema, then everything will integrate nicely.

Not getting into code, but staying at the conceptual level, this is what happens.

The developer comes up with a list of different situations where the functionality will receive instructions from the user. Let’s call these “Intents”. Examples are: start, help, play, ask, quit, etc.

Some of those “Intents” will require a bit more info from the user. “Start” doesn’t require more information, but what about “Ask”? “Ask” what? So this must also be specified. This “what” is known as a a”Slot”, and their definition takes place in their own section of the Interaction Model. They are used here in the Intent Schema, though, hence this little explanation.

So, the content an Intent Schema will be something like this:

  • Start
  • Stop
  • Play
  • Quit
  • Ask “the time”
  • Ask “the weather” “location”
  • Ask “the weather” “location” “date”

The Intent Schema is written in JSON. This is a certain syntax or notation to write down information. To learn more about it, w3schools has a good tutorial. But the whole point of using JSON is that it’s “user friendly”, so having a good Intent Schema example is typically enough, it isn’t hard to modify it with the actual Intents and Slots for your Skill. This is the one that would represent the “Ask”: questions above:

{
  "intents": [
    {
      "intent": "GetWeatherIntent",
      "slots": [
        {
          "name": "Location",
          "type": "LIST_OF_LOCATIONS"
        },
        {
          "name": "Date",
          "type": "AMAZON.DATE"
        }
      ]
    },
    {
      "intent": "GetTimeIntent"
    }
  ]
}

Two things to notice here:

  1. The Intent names (GetTimeIntent and GetWeatherIntent) are not in written in human or natural language. They are code. It’s the task of the interaction designer to define the “human language” that must be mapped to those Intent names. She will do that in the Utterances section. A piece of advice with Intent names: it’s good practice to add the word Intent as a suffix (i.e. GetWeatherIntent instead of GetWeather). Don’t get lazy so that things don’t get confusing!!!
  2. The Slot names have a name and a type. Type means the kind of data that will be used when the Intent is invoked (e.g. 3rd April for the “Date” slot, Barcelona for the “Location” slot). And name is… well… self explanatory. We’ll explain those in the Slots section.

Slots

[If you’re a developer, to say that Slots are just a “variable” will suffice for you to understand]

They are the placeholder for the specific information that the user will provide to Alexa when using a Skill. . The Slot section is where you define them. You have to define them BEFORE you use them in the Interaction Schema, or you won’t be able to save your Skill in the developer console.

There are two kind of Slots: built-in, and custom. Built-in Slot types are those you would expect as basic data types in any programming language: numbers, dates, etc. Amazon is adding new ones all the time. This is the list as I type (obtained from here). Note they all have the prefix “AMAZON.” so that we can easily see that they are built-in:

  • AMAZON.DATE – converts words that indicate dates (“today”, “tomorrow”, or “july”) into a date format (such as “2015-07-00T9”).
  • AMAZON.DURATION – converts words that indicate durations (“five minutes”) into a numeric duration (“PT5M”).
  • AMAZON.FOUR_DIGIT_NUMBER – Provides recognition for four-digit numbers, such as years.
  • AMAZON.NUMBER – converts numeric words (“five”) into digits (such as “5”).
  • AMAZON.TIME – converts words that indicate time (“four in the morning”, “two p m”) into a time value (“04:00”, “14:00”).
  • AMAZON.US_CITY – provides recognition for major cities in the United States. All cities with a population over 100,000 are included. You can extend the type to include more cities if necessary.
  • AMAZON.US_FIRST_NAME – provides recognition for thousands of popular first names, based on census and social security data. You can extend the type to include more names if necessary.
  • AMAZON.US_STATE – provides recognition for US states, territories, and the District of Columbia. You can extend this type to include more states if necessary.

Custom slots are just lists of possible values (like the values in a drop down list). That’s why they are typically called LIST_OF_WHATEVER. An example would be LIST_OF_WEEKDAYS and the content would be

Monday Tuesday Wednesday Thursday Friday Saturday Sunday

Sample Utterances

This is the heart of the Interaction design. I bet you will spend many hours polishing this!!!

So, here what you do is this. Remember the Intent Schema? Well, for each one of them you have to come up with all the real-life examples you can think of. And you type them here. One by one. You will easily come up with hundreds of lines. I’ll discuss this in the shortcomings part of this post. A Sample Utterance looks like this:

NameOfIntent and a section in natural language that may contain none one {SlotName} or many {SlotNames}

In the example we’ve been following:

  • GetTimeIntent tell me the time
  • GetTimeIntent time please
  • GetTimeIntent what time is it
  • GetWeatherIntent tell me the weather for {Location} on the {Date}
  • GetWeatherIntent tell me the weather for {Location} {Date}

On the 4th example, the interaction designer is thinking of “tell me the weather for Paris on the 3rd April. On the 5th example, the interaction designer is thinking of “tell me the weather for Paris tomorrow”. This is like Pokémon: you gotta catch them all (possible examples of speech by your users!)

That’s it!!! Now you know how to create your Interaction Model!!!

[The Amazon folks explain this interaction business  here.]

Shortcomings or Limitations

You have to understand the big effort that Amazon is making here. The computing power that speech recognition uses is vast and you have to avoid the convoluted, complicated cases like the plague. From the days of Noam Chomsky and all his good work on Grammars, we know that natural language is inherently ambiguous, and that when you’re defining a synthetic grammar it’s quite easy to generate ambiguity and very hard (impossible in fact) to make sure you don’t.

If you don’t know what I am talking about, let’s analyze this sentence.

“In the show, I liked everything but everything but the girl girl.” If you don’t know that there is a band called “everything but the girl” with a female lead singer, you would think that the sentence above is gibberish, and discard it. Alexa would go crazy!

In order to avoid that, what AVS does is, before it accepts the Interaction Model of your skill, it runs some checks to make sure that you’re not introducing any ambiguity or crazy loops for Alexa to go crazy. In Computing Science terms, what you’re creating with the Interaction Model is a Context Free Grammar and the checks I mention are some heuristics trying to detect if the grammar is ambiguity-free. If you’re interested, some heavy reading here.

So, Amazon set very strict ways in the definition of your Interaction Model, and these generate the main limitations, in my opinion: that both Custom Slots and Sample Utterances are static: you have to define them beforehand and you cannot change them on the go while the Skill is live. If you want to include an extra Utterance, no matter how innocent it looks like, if you need a new value in one of your Custom Slots, you have to change the Interaction Model AND submit the Skill for re-certification. Best case possible: it will take two full working days to introduce the change.

Imagine that your Skill deals with names of people (names of players, names of friends… whatever) as Slots. You have to provide the list with all the possible names BEFOREHAND. You cannot add Anaïs, or any other person with a name you wouldn’t have thought of, on the fly through usage of the  Skill. You have to add it to the Interaction Model, and re-submit for certification.

Managing the Sample Utterances as plain text is also very, very tricky. You will just lose track of what’s in there and troubleshooting is kind of hard. My workaround is the creation of  a little Access database tool with a simple but relational data model and some wonderful macros that “dump” the content of the database as a long string of text that matches the syntax of the Sample Utterances expected by AVS in the developer console, then I copy & paste this super long string.

Everything else, I think it’s super, and I am really grateful to Amazon for opening the platform for all to explore and develop Skills.

 

 

How does a Skill work? How can I build one?

In a previous post we explained that Alexa can integrate both with hardware and with software, that is, you can create a voice user interface for any application (existing or new!). We barely scraped the surface of how hardware integration works, and we also mentioned a hackster.io initiative to get makers excited about it. We will now cover the integration with applications.

The first step to leverage Alexa Voice Services for your application is to understand what a Skill is, how it works, how they are created, and (very important) their lifecycle.

What is a Skill?

Imagine that you write down all the things you’re capable of doing. It will be a pretty long list! You can sleep, eat, recite Shakespeare sonnets by heart (maybe!), calculate square roots (but if you’re like me, you’ve forgotten how to do it, even though you know it’s somewhere in the back of your mind…). How would you call each item in that list? Maybe “stuff I can do”? What about calling them “my skills”?

Well, the good people of Amazon have done precisely that for Alexa. They have crafted a list of all the things that Alexa can do. A Skill in this context is “something that Alexa can do”. The list of Alexa skills can be found if you log on to alexa.amazon.com and select the Skills section, or if you have already installed the Alexa app on your Smartphone, you can also check it out. There are Skills that were created by Amazon as “basic functionality” (tell the time, set alarms, manage a to-do list, tell jokes!), and there are Skills created by everyone else. Some of them are from prominent companies (Fitbit, Uber, …), some of them, the most fun actually (!), are the ones created by independent devs like yours truly.

How does a Skill work? (as a user)

One of my favorite skills is called Big Sky. It is a weather forecast tool that uses my location (based on my Echo’s IP address), as opposed to Alexa’s “basic functionality” Skill that assumes that like Alexa, I, too, live in Seattle.

This diagram represents how I interact with Big Sky via my Echo device:

How to use a skill
How to use a skill

Wake word: Your Echo device is always listening but doesn’t really care about the noise it picks up unless it recognizes the Wake word. I like to think of the Wake word as the magic spell that brings the Alexa spirit to the otherwise inert black cylinder (yes, she has a distinct personality and I think she’s as alive as my 14 year old cat Sofía: their personalities are actually similar!!!). Originally there was only one Wake word (yep, that’s right: Alexa) but in view that Alexa is a not-so-uncommon ladies’ name and it could be very, very confusing to use an Echo in a household with someone called that, Amazon has expanded the list of possible wake words to: Alexa, Amazon, and Echo. To change it, go to the Alexa Web site or app, go to your Echo device,  settings, wake word, and select your choice from the pick list.

Invocation name: That’s how Alexa knows what she has to do besides understand your English! You can think that the invocation name is just the skill name, but that’s not true 100% of the time. When you create the Interaction model of your Skill (more on that a bit later!) you specify two things: the skill name (that’s what appears in the Skill list on the app or on the Web site) and the invocation name (if your skill name is too long or difficult to pronounce, you can choose something simpler, but if you’re happy with your skill name, you can just pick the same one).

Slot: You’re a programmer: that’s a variable and your skill may use none, one or as many as needed. You’re not a programmer: a slot is the “placeholder” for the extra information that you give to the Skill so that it actually does exactly what you want. Everybody: If your skill helps people check the schedule for flights, you may need three slots: arrival or departure or both; flight number; flight date. In the example above I chose to specify the location for the weather forecast, but this was not compulsory because the Skill is clever enough to pick up the location from the IP address of my Amazon Echo.

It takes a little while to get the hang of interacting with Alexa (she even appears to get a bit frustrated when she doesn’t undestand you, but so do you!) but the golden rule is to always say the Wake word first, then use the Invocation name very clearly. Don’t worry about grammar and you don’t have to be polite with her (pleases and thank yous are ignored).

How does a Skill work? (for real)

A Skill has two distinct parts: the Interaction Model and what I call “the functionality”.

The Interaction model is everything related with speech. It’s where you specify the Invocation name, the slots that your Skill can understand, and very important, examples of whole sentences that your Skill can process. These sencences are called “Sample Utterances” and you will spend many hours perfecting those. There’s also something called the “Intent schema” and it’s very, very important, because it defines the different tasks that Alexa will be asking to “the functionality”, based on what the user has asked Alexa to do. It’s where you define the hooks between the two parts of the Skill.

Interaction Model Sections
Interaction Model Sections for my Wine Expert Skill.

You work on the Interaction model through the Amazon Development console. When you create a Skill, you get 5 sections to fill out plus some testing tools (you can see those in the picture above). One of these sections is for the Interaction model. In another post we’ll describe it in depth.

A very important section has a cryptic name: “Configuration”. Besides other important stuff, that’s where you define where “the functionality” actually is. So, remember: with the Intent Schema in the Interaction model you describe “the hooks” between Interaction model and “the functionality”, but it is here where you say where to go for that functionality.

“The functionality” can be an application that already exists (Fitbit, Uber, etc. were happy systems with millions of users before Alexa was invented), or one you make now to be used specifically with Alexa. In the first case, the developers for that existing system will have to develop an interface that uses the AVS API. Well, actually, a product manager will have to identify the functionality that will be used via Alexa, then the developers will encapsulate that functionality in a way that can be exposed to the AVS API so that Alexa can use it. In some cases the developer and the product manager are the same person!

If you’re creating a Skill from scratch, then Amazon recommends that you build and host “the functionality” with Amazon Web Services and they suggest you do it as a Lambda Function. We’ll speak a lot about this soon, stay tuned!

So, once you build the Interaction model and “the functionality” and you hook them together, you’re ready to roll. And now you’re ready to learn about the lifecycle of a Skill. You’ll need it.

Lifecycle of a Skill

A picture is worth a thousand words. So:

Skill Lifecycle Diagram
Skill Lifecycle Diagram

First, you create the Skill. In the Amazon Developer Console you will see that you have one Skill in your list of Skills, with a status of Development.

You work on it as we’ve described before, and when you think it’s ready to be published, you submit it for certification. From this moment on, you can chill and take a break, because the Skill is frozen, meaning that you can’t work on it, you can’t even test it! It will take the Amazon guys a couple of days to get back to you with good news (Skill accepted!) or good news (there’s opportunity for improvement! Okay, not so good news, this means your skill has failed the certification process). When the Skill fails the certification process, it gets back to Development status and you can work on it again.

If you think you made a mistake while you’re waiting for certification feedback, you can withdraw the certification request. You can work on it again and re-submit for certification when you’re ready.

Once you get it right and your Skill gets certified, interesting stuff happens. First, your Skill becomes two Skills: one with Production status, frozen, impossible to modify, and another one with Development status. If you wish to add more functionality to your Skill, then you would work on the Skill under Development, and follow this process again. When you submit this new version of the Development Skill for certification, once it’s approved, the “original” Production Skill will disappear and be replaced with the new one, and you’ll get again a Skill under Development, just in case you wish to add more functionality again.

Understanding the Skill lifecycle is very important. Typically you learn it by practice (I haven’t seen a diagram like mine anywhere!), not without a good deal of uncertainty (where is my Development Skill! It has disappeared! How do I create new versions! Is there life in Mars! Et cetera). So I hope you find this explanation useful.

Internet of Voice Challenge @ Hackster.io

Internet of Voice Challenge
Internet of Voice Challenge

As we said last week, Amazon is investing heavily (and working hard, too) towards a world of things connected to the Internet where we interact with objects (and robots…) with natural language. With voice.

A new initiative in this front is the “Internet of Voice Challenge“, run jointly by Hackster.io, the Raspberry Pi Foundation and Amazon.

There are two categories:

Alexa Skills Kit and Pi

Finding innovative and clever ways of integrating Alexa Skills into maker projects.

The criteria will be:

  • Use of a Raspberry Pi (20 points)
  • Creativity (15 points)
  • Use of Voice User Interface (VUI) best practices (15 points)
  • Story/Instruction – Show how you created your project, including images, screenshots, and/or video (10 Points)
  • Project Documentation including VUI diagram (10 Points)
  • Code – Include working code with helpful comments (15 Points)
  • Published Skill (20 Points)

Alexa Voice Services and Pi

Making devices come alive by adding voice interactions.

And the criteria:

  • Use of a Raspberry Pi (20 points)
  • Creativity (15 points)
  • Use of Voice User Interface (VUI) best practices (15 points)
  • Story/Instruction – Show how you created your project, including images, screenshots, and/or video (10 Points)
  • Project Documentation including VUI diagram (10 Points)
  • Code – Include working code with helpful comments (15 Points)
  • BONUS – Published skill (10 points)

There’s 3 prizes for each category. This is the part that Amazon covers, so you won’t be surprised by this kind of bounty: Trophy, Amazon Echo, Echo Dot, Amazon Tap, $1,500 Gift Card for the first prize.

You can enter here. Submissions close on 1st August 2016. So far, 119 contestants, no projects submitted yet.

Would you like to speak to Alexa, but you don’t have an Amazon Echo? Answers here

So Amazon Echo was launched in December 2015 and since then, 3 million devices have been sold… in the United States. It’s usual for Amazon not to announce their product roadmaps or launch plans, so we have simply no idea when (more than if) it will be launched in other English-speaking countries, or when Alexa will speak Spanish or Tagalog with us… Therefore, if you’re curious about Alexa Voice Services but you’re not based in the US, you’ll have to take some workarounds to play around with the technology.

Smartphone

There are two ways of speaking with Alexa via your Smartphone:

  • Lexi. A $4.99 app for iOS.

    Lexi Logo
    Lexi Logo
  • Roger Voice Messenger. By the people that previously built Spotify. You can add Alexa to its options. Also you can exchange opinions with Chewbacca. Free for Android and iOS.
    Roger Logo
    Roger Logo

    Web-based

  • Echosim.io: a JavaScript Echo simulator that works really well. It was launched a couple of weeks ago.
  • The test tool on Amazon’s developer console: I’ll devote a full post for that, but it’s quite masochistic, unless you prefer JSON to speech.

DIY Hardware

Build yourself an Echo device with a Raspberry Pi. Here’s how they explain it from Amazon, here’s how they explain it from the Raspberry Pi Foundation. A colleague of mine tells me it’s totally doable. What you’ll need:

  • A Raspberry Pi 2 (model B) & typical complements (power cord, SD card, keyboard, mouse, USB Wi-Fi adaptor).
  • A USB 2.0 mini microphone.
  • A loudspeaker that works with a 3.5 mm jack (the usual one).
  • Homebrew Amazon Echo with Raspberry Pi
    Homebrew Amazon Echo with Raspberry Pi

    Black Market 

  • eBay. Not exactly the black market. There’s lots of Amazon Echos for sale out there. Good luck with Customs if you go down this route.

Alexa Voice Services: Open at both ends

Alexa Voice Services Context Diagram
Alexa Voice Services Context Diagram

Alexa Voice Services represents top notch weak AI for natural language processing. They are second to none in terms of quality, which in this area is measured by accuracy (okay, maybe Baidu’s Andrew Ng‘s product is better, but so far only available for Mandarin). Accuracy is a non-linear measure of usefulness. What I mean is this: Siri, Cortana, etc. may be at 95% accuracy. But this 5% of inaccuracy weights heavily in terms of usability: they are a cute thing but not terribly reliable (and when they understand you wrong, they are very annoying). Alexa is more at 98%, which could be more or less the threshold for genuine usability. But worry not, Apple, Microsoft and the rest of the pack will eventually catch up in this area.

What makes Alexa Voice Services unique? Openness. Not meaning that it’s open source (it isn’t by all means), but because Amazon is making big efforts in getting folks to embed AVS in their applications, at both ends. This is represented by the Context Diagram which accompanies this post.

Alexa Voice Services can hook up with any application

Amazon has published the Alexa Skills Kit, a software development kit to help people create skills that can be used via an Alexa-powered voice device. I will devote another post to ASK on its own, so I won’t delve into it now. They have also published a specific SDK to control household applicances (lightbulbs, blinds, etc.) called Smart Home Skill API (which I haven’t tried yet).

Makers’ paradise: you can use Alexa Voice Services with your own hardware

This also deserves a full blown post, but for a hint of what’s possible, here’s the Alexa Lambda Linux (ALL) Reference Design by a kind soul who is as good at managing projects as at documenting them. Or you could check out Amazon’s how-to for building an Echo-like device with a Raspberry Pi. Now just think again about Internet of Things (IoT). Stuff not only will be connected, but you can also talk to things!

Other competitors (Google, for instance) have recently announced their take on household intelligent speakers (Google Home), but no sign of beta developers plan or SDK in the horizon.

Are you in the US? Do you own an Amazon Echo device? What do you use it for?

Meet Alexa

Amazon has shaken the world of tech, again. In December last year, it launched Amazon Echo. Is it just a loudspeaker? No. It’s the avatar (as in embodiment) of Alexa, an AI powered assistant that can crack the best jokes and control your lights. But wait, there’s also Siri! There’s Cortana! There’s also the anonymous Google voice assistant! Why is Alexa different? Well, for starters, because at the time of writing (June 2016), over 3 million Amazon Echo devices have been sold in the US. But the true beauty is that Amazon has released the Alexa Skills Development Kit (ASK) as well as a wealth of training resources (on Udemy; on Big Nerd Ranch) which makes Alexa Voice Services, the AI part of all this, available to all kind of use cases. In other words: Amazon wants developers around the world to teach Alexa stuff. To teach her skills. I, a classic T-shaped professional who has done it all and has become a rusty developer in the process, have managed to teach Alexa to query people’s knowledge about my hometown, and I am also helping Alexa become a Wine Expert. Here, you will read how I’m doing, what puzzles me, how Amazon is building up the ecosystem as we, the first 1000 developers, crack on with the platform, and what I think about it all.