The World Speaks Many Languages

One thing that has been put front and centre with the release of the new Apple HomePod: Siri’s limitations. The common complaint from the predominantly North American and British tech reporting is that it is not easily extensible in the same way that something like the Alexa is. On the Alexa you can add skills and link these to hard coded phrases for interpretation relatively easily.

What everybody seems to have missed with Apple’s approach to SiriKit and the idea of Intents is that they are doing all of the hard work of abstracting away the source language to something that a developer can actually use without any knowledge of the language being used. Anyone who has done localization for an application knows the complexity involved in simply getting all of the text strings lined up to work properly in multiple languages, but this is an order of magnitude more difficult when you’re trying to interpret speech. A French-speaking person from France, Switzerland, Belgium or Québec will all have the same basic understanding of the contents of a newspaper article written in French from any one of those places. Trying to pull apart regional accents and modern colloquialisms used in speech adds another layer that is considerably more dynamic and much less well documented.

For some context about how much each of these players has decided to bite off with their voice interfaces, let’s look at the supported language list from each.

Google Home:

  • English (Australia, Canada, United Kingdom, United States)

  • French

  • German

  • Japanese

Amazon Alexa:

  • English (Australia)

  • English (Canada)

  • English (India)

  • English (UK)

  • English (US)

  • German

  • Japanese

Apple Siri:

  • Australia

  • Austria

  • Belgium (Dutch, French)

  • Brazil

  • Canada (English, French)

  • Chile

  • China (Cantonese, Mandarin)

  • Denmark

  • Finland (Finnish)

  • France

  • Germany

  • Hong Kong (Cantonese)

  • India (English)

  • Ireland (English)

  • Israel (Hebrew)

  • Italy

  • Japan

  • Malaysia (Malay)

  • Mexico

  • Netherlands

  • New Zealand

  • Norway

  • Republic of Korea

  • Russia

  • Saudi Arabia (Arabic)

  • Singapore (English)

  • South Africa (English)

  • Spain

  • Sweden

  • Switzerland (French, German, Italian)

  • Taiwan (Mandarin)

  • Thailand

  • Turkey

  • United Arab Emirates (Arabic)

  • UK

  • USA (English, Spanish)

Here, Apple is clearly the one making the largest effort to ensure that their products get the same level of ability across the world. Take a look at this developer doc from Amazon about building multi-language skills and imagine doing this level of localization for an App in the iOS App Store for all of the potential languages supported by Apple. And this only gets you basically speaking command-line text commands, not the fluidity of SiriKit’s adaptive natural language semantic analysis.

“Call a cab,” “Call me a cab,” “Order a cab,” “Get me a cab,” “I need a taxi” are all semantically equivalent and would need to be hard-coded into an app on a language by language basis if you wanted to try and approximate the functions of SiriKit. The difference is that Sirikit gives the developer the resulting Intent, while with Alexa, the developer has to design and localize each Intent themselves.

There are two key things to remember here, if the platform doesn’t support the language or dialect (and this is important! French TV News frequently puts subtitles on French-speaking Quebecers to give you an idea of the potential drift within a single language) you can’t even build your multi-language skill. And if you could, then you (as the developer) be responsible for getting the correct verbal localization for each of these regions, noting that verbal communications patterns are not necessarily the strings you’ll get going to Google Translate which is predominantly based on written communications.

By taking on the responsibility of designing and implementing Siri Intents, Apple has taken on the hard work of abstracting away the complexities of interacting with (currently) 42 distinct locales. This is not to give Apple a free pass for Siri’s failings and confusing differences between platforms, lack of ability to identify individuals for shared devices like the HomePod, but just to point out the insanely ambitious scope of the project on a global scale as opposed to the relatively modest implementations by Google and Amazon who each currently only address 7 locales. Noting that these are the folks with the vaunted web scale back end for indexing and analysis and AI.

So if you’re thinking that Apple is behind in the speech interface game, I’d just like to point out that you can measure the point score in this game by some different scoring methods. The question I have going forward is how Google and Amazon move to integrate new languages since it’s up to each individual developer to add the necessary localization, assuming that they have the time, resources, bandwidth, access to native speakers or competent translation services, etc. So the value of an Echo coming to France will be starting at ground zero with only the integrations that come built-in by Amazon until the developers update their apps. The developer impact of Apple adding Hindi to the list of supported languages is for practical purposes, nothing, and may even enable users to interact via SiriKit even if the on-screen UI hasn’t been localized since that abstraction is being handled by Apple.

The counterpoint is obviously that this is a huge undertaking, and Apple’s efforts are going to be significantly slower in adding new Intents to SiriKit than someone who’s just working in English.