November 24, 2014

Exploring the boundaries of digital assistants

Future vision storytelling

“What do internet searches look like in 2017?”

—Our design team, 2014

Every once in awhile, studio managers will gather up a handful of designers with the hopes of peering into the future. The goals vary from year to year by a function of the competitive landscape, available technology, and studio leadership. In general, there are a few common goals which I think are fairly constant:

Obviously we want to be relevant and utilizing the latest technologies blended with experience patterns that are easily used and not outdated.
We aim to find a balance between fantasy and reality. Too much science fiction is unattainable, while too much reality is uninspiring and boring.
We want to inspire and keep designers, developers, product managers, and organizational budget controllers motivated. It’s always a challenge to effectively communicate these somewhat abstract ideas to such a variety of perspectives.

Squares, circles, and arrows

Goals aside, the actual process is always messy—but it’s a wonderful mess to be a part of! It’s refreshing for designers to step out of their typical roles, use a different typeface, throw out their grids, let their hair down, and tap into their raw, creative problem-solving talents.

Whiteboard session for the Gnome on the Range vignette

Within our group, we all brainstormed and storyboarded key concepts. Then we went back to our desks, wrote the scripts, and comped up some visuals. Once we iterated, edited, and refined to a point, we handed everything off to a local production company to film and edit.

Photo composition of phone and tablet with future concepts of image-based search engine

Gnome-selector app (left) and Unicorn-powered visual search engine—landscaping mode (right)

At the time, the promise of A.I.-fueled digital assistants to ease the tension between complex human-computer interactions was just getting started. We highlighted a few alternate input scenarios. People using their voice and cameras really seemed to teeter on the sci-fi side of things but have since become ubiquitous in our lives.

Gnome on the Range product vignette

Aside from the purely technical restraints of voice and image input modalities, I was most fascinated by the potential contextual awareness these devices had. For example, in the Found in Translation script I wrote, the actor uses her phone for a real-time language translation. That’s definitely cool, but she’s out of her element and about to make a significant cultural faux pas by gifting a clock to an older colleague. The device she’s using to translate is the same device connected to an array of semantic databases that has the ability to give her contextually relevant information despite her not explicitly asking for it.

A few years later, I worked on Cortana and got into the nitty-gritty of implicit and explicit information delivery. It’s a lot more nuanced and complicated than it seems, but still a worthy pursuit and shows great promise in easing that tension between humans and computers.

Found in Translation product vignette

We see a female business traveller in another country who will meet her client face to face for the first time—they’ve Skyped many times, so they have a considerable relationship. She’s using Bing maps (in the traditional sense) to navigate from her hotel building to the office building.

On the way, she raises her phone to get her bearings and she notices all the Chinese characters, including prices in yuan, translated into english and currency automatically converted to USD. Basically she’s a stranger in a strange land but she’s comfortable because she understands her environment.

Leaving the hotel way too early, she’s got some spare time to explore. She holds her phone up again to scan the shop signs.

“Jade and Pearl shop”—she saves with a simple gesture. “Shanghai Authentic Crab Dumplings”—saved! Then she comes across an interesting antique shop. She decides to go in and finds a really beautiful desk clock. This would make a nice gift for her client, or so she thinks. Since there’s no price tag on it, she uses Cortana/Bing translator to ask the shopkeeper for the price.

Traveller:
input: “how much is this clock?”
output: “Mandarin text and audio”

Shopkeeper:
input: “六百元”
output: “600 RMB or $98.22 USD”

Cortana tells her, “Hey, typically in China you are supposed to bargain for the final price. Sometimes it's disrespectful to pay the price on the tag.”

Traveller: “that’s too expensive for a gift for my client, what about 400 RMB?”

Shopkeeper: “You must really hate your client! In China, we don’t give clocks as a gift because it means you are counting the minutes and hour until they die.”

Horrified and thankful, she asks the shopkeeper to recommend an alternate gift.

Cortana gives her a gentle reminder, “Your business meeting is in 30 minutes and you are 10 minutes walking distance away, don’t be late.”

Original script for Found in Translation

My mother-in-law is Chinese. The first time she came to spend some time with us in New York, I wanted to make her feel at home in the guest room. I also wanted to buy her a gift. So, I somehow ended up frantically trying to find a gift for someone who I didn’t really know too well. Long story short, I came across a really nice-looking clock in a shop. I was so proud of my purchase—who doesn’t need another clock, right? After I get home to show off my prize purchase to my wife, she covered her smiling mouth and explains how there is no way I am giving this clock to her mother.

Live and learn.

⏰

* * *

Work