Rplayr

(pronounced "re-player")

A Social Engagement Application
to Crowdsource Voice Data.

PROBLEM

Voice Technology of today is ridden with different types of shortcomings and one of the major issues includes the lack of understanding of speech from people of different ethnicities and cultures. Voice technology also lacks the capability to understand human emotion and respond appropriately through conversational means. Current efforts to improve voice are either under the closed doors of big corporations or through crowdsourcing efforts that get very little attention from a diverse group of contributors.

SOLUTION

Some parts of voice technology, namely speech recognition and speech synthesis are heavily dependent on speech data and machine learning. Our solution is a social engagement mobile app that brings people in from diverse backgrounds, cultures, and ethnicity to contribute their voices by allowing them to recreate or dub their favorite video clips from movies or other media, label speech contributed by others motivated by the app's gamification aspect, and get entertained while doing so.

OVERVIEW

Rplayr is a social engagement platform to collect voice data. Through Rplayr, you can recreate or dub your favorite video clips from movies or similar media by providing your voice and likeness. You can share it with your friends or the world. You can also watch recreated videos based on your interests, answer some simple questions, and score points to climb the leaderboard. By doing so, you enable us to create a robust dataset full of voices contributed by you, the user, to make voice technology of tomorrow more inclusive and empathetic.

ROLE

User Research

Problem Definition

Ideation

Prototyping

Usability Evaluation

Pitching

Video Production

MEET THE TEAM

46152831_2609944022356845_40160307093857

37823693_10215648755030036_1041915527926

Proshonjit Mitra

Manasi Kulkarni

Ajinkya Sheth

PROCESS & TIMELINE

DESIGN SPACE

USER RESEARCH

We conducted did some preliminary research about the design space that helped us understand how our design space looked like and who were the stakeholders in this space. Following that, we decided to conduct contextual inquiries with two different types of stakeholders.

CITIZEN SCIENTIST WORKING ON VOICE

Works on Mozilla's Common Voice project (a project to create a voice dataset through crowdsourcing).
Is a native-English speaker, American
Thinks working on Common Voice is addictive
The project does not get as much attention in languages other than English
Believes that they should focus on collecting voices from non-native English speakers, people from other cultures, and indigenous people.
"It's important that we understand each other"
Video link here

PASSIVE USER OF VOICE ASSISTANTS

Is a tech professional but does not like to use voice assistive technology
Also is a person of color and comes from India
Thinks Voice technology can never replace humans
Voice tech lacks the ability to recognize human emotions and thereby lacks empathy
Conversations with voice assistive technology are very superficial and highly inefficient. More than half the time they don't recognize her voice
"If there was a physical person to talk to, who looked and felt more human, it would be worth talking to"
Video link here

DESIGN PROBLEM DEFINITION

Having seen the stakeholder perspectives, we framed our problem as such

Reduce biases in voice technology (especially cultural bias) and tackle the issue of lack of empathy in applications of voice technology, by creating engagement among a larger, more diverse set of people to facilitate their participation in crowdsourcing initiatives
around voice technology.

IDEATION

We started to brainstorm about this design intervention by pinning all our ideas on the whiteboard and evaluating the choices to determine if the solution addressed the problems considered above. We used a timeboxed approach as that really works well for coming up with unique ideas. We timeboxed ourselves to 30 minutes and came up with approximately 25 ideas. We then mapped these ideas on a Now-How-Wow Matrix to measure each idea’s originality versus its feasibility.

For idea selection, we decided to use siloed dot voting. All three of us got 3 votes and we each had to rank our three votes, but the caveat was that we had to decide without telling each other to avoid group-think. We then dot-voted on our ideas to find our top 3 ideas. We also discussed the pros and cons of each of the top 3 ideas, then combined the positives of each of the three ideas as they together were able to negate the challenges of the individual ideas.

Based on the above analysis, we came up with the idea of ‘Rplayr’.

DESIGN INTERVENTION

Rplayr (etymology comes from the word role-player but is pronounced as ‘re-player’) is a social media platform in the form of a Mobile app that lets the user enact popular scenes from movies and tv/web series.

The app provides users a database of video clips (usually less than 10 seconds long) that they can recreate from. Each video clip provided is muted to make sure that users do not try to imitate the original clip, as the purpose of this application is to capture as many diverse and natural ways of saying the same words/phrases as is possible. A transcript is provided in the form of subtitles so that they know when to say what. They can record their recreation of the video clips or dub existing video clips with their voice to share it amongst friends. The idea is to increase engagement by making it fun for the people so that there is a voluntary donation of data in a social network setting.

USER PERSONAS

Based on existing research about the kind of population we wanted to target, we took inspiration from real people that we know to be our target audience and came up with two user personas based on their case studies. Our two user personas are named as Shirley Chan and Varun Panicker.

USER JOURNEY MAPPING

We mapped out all possible features we wanted in our app based on our research, problem definition, and ideation. We made connections between all these features. We then used a scenario that one of our user personas, Varun, might take and decided to focus on that flow for this initial series of prototyping and testing.

JOURNEY MAPPING THROUGH WIREFRAMES

Varun begins with logging into the app using either email or social options

On his feed, he sees video-clips uploaded by his friends. He provides social commentary and anonymous feedback on speech quality.

He taps the Video icon to find a video clip that he himself can recreate.

To view the full report with all wireframes and screens, click here.

EVALUATION ROUND #1

For this evaluation, we conducted two sets of evaluations. This section talks about the first round of usability tests.

Cognitive Walkthrough using Paper Prototypes:

By printing out the current mockups and following the conventional methods of paper prototyping we tested out the ways the user uses our system. We asked the user to use the “Think Out Loud” by explaining to them what it means. One of us acted as moderator and the other person acted as the system manager. We tested out the following aspects during this test: Flow & Engagement. Each participant signed a consent form at the beginning of the test.

Tasks to perform:

1. Give feedback on a video shared by a friend

2. Select and recreate a video using your own style

3. (If time persists) Provide consent to sign up for the first time

Number of User Testers: 5
Prototype Tested: Lo-Fi Wireframes

User Recruitment Demographic Information

Divergent on:

Nationalities: Taiwanese, Japanese, Indian, Middle Eastern
Age range: 19-33 years old
Gender: 5 Female, 5 Male

Convergent on:

Experience using multiple social media platforms in the form of mobile apps
Experience using entertainment applications
No background in user experience/HCI

EVALUATION ROUND #2

User Testing using Think-Aloud Technique:

Prior to beginning the test, each participant would provide consent for video release. If they did not provide their consent, we did not record their video. We began with a pre-test interview to determine their familiarity and previous experience with social media systems such as ours. We conducted the conventional user test using an iPhone (with the InVision app). This prototype was a clickable prototype which looked and felt closer to the final version of the app. Aspects we focused during this test: Engagement & Ability to Make Distinction between different types of feedback. We primarily only tested our chosen user flow, and if time permits, test out a secondary user flow (consent form).

Tasks to perform:

1. Give feedback on a video shared by a friend

2. Select and recreate a video using your own style

3. (If time persists) Provide consent to sign up for the first time

Number of User Testers: 5
Prototype Tested: High fidelity mobile prototype built using Sketch + InVision

User Recruitment Demographic Information

Divergent on:

Nationalities: Taiwanese, Japanese, Indian, Middle Eastern
Age range: 19-33 years old
Gender: 5 Female, 5 Male

Convergent on:

Experience using multiple social media platforms in the form of mobile apps
Experience using entertainment applications
No background in user experience/HCI

Major Findings:

A summary of all the major findings of our usability evaluations can be found below:

Most users had a very hard time understand the anonymous feedback system because of the way it was presented.
Most users wanted a more visually appealing way to be able to even take a look at the privacy policy.
Users barely understood the gamification aspect of the app and hence did not care about it either.
Search & Find Video pages were not very well received because of limited functionality and inconsistent visuals.
The “Aha” moment of the research was that almost all the users found the entire process of recreating a video simple, intuitive and very unique.

HIGH FIDELITY PROTOTYPE

Based on the findings from the first round of testing, we went into the next phase of prototyping by creating high fidelity mockups using Sketch and then stitching them together into a clickable mobile prototype using InVision's Craft Manager.
I chose these tools over several others because Sketch has a really intuitive learning curve and the rest of my team members had no experience with prototyping tools prior to this project.

On his feed, Varun sees video-clips uploaded by his friend. He provides social commentary and anonymous feedback on speech quality by using the slider and emoji scale.

He taps the Video icon to find a video clip that he himself can recreate. He uses categories to browse and find a video clip.

He eventually finds a video clip and he can compare it with other people who have recreated the video clip. He can now choose to either dub this video or recreate this video.

To view the full report with all wireframes and screens, click here.

ITERATIVE DESIGN

Based on the results of our usability evaluation, we iterated on the pain points and the areas of confusion to redesign certain aspects of the application completely over the next two iterations.

Video Icon and the Search Button: The video icon in the navigation bar was confusing to some users. When presented with a task that said ‘Search a video and recreate it’, the common behaviour of the users was to look for a search bar/button. To address this assumption, we placed a search button in the navigation bar as well. A global search lets users search about absolutely anything on Rplayr platform.

Iteration 1: Video button and search

Iteration 2: Video and global search

Iteration 3: Friendlier video page

Feedback system:

The anonymous feedback system was one of the key features in the earlier versions of Rplayr, but it was integrated with the feed. When we did our paper prototyping, we found out that users could not visually tell the difference between the social feedback section and the speech feedback section. Assuming that the problem was in the way it was displayed, we tried to make a clear distinction between the two sections in the high fidelity iteration.

Here are the key things we tackled in the second iteration:

Allowing the user to change and submit their feedback
Removing the option of providing self-feedback
Removing the information button and stating the instructions explicitly

However, our usability testing still raised some confusing aspects about the feedback system. By trying to make it more distinct than the social feedback section, users started thinking that it was not related to the video itself. For those who understood that it was related to the video, kept rating the videos based on the fact that they were friends and that one shouldn’t say bad things about friends in public. To tackle this two-fold issue, we decided to disconnect the feedback system from the social circle, and make the videos displayed as anonymous stories on top as a completely new section called “Reaxn” (pronounced “reaction”).

Iteration 1: Feedback

Iteration 2: Feedback with confirmation

Iteration 3: Separate feedback feature as stories

Iteration 3: Simple questions

Gamification:

An aspect of this application assumes every user to rate the accuracy of the speech honestly. Because of other popular applications, some of the users thought that they would rather like to change the words in the transcript or use puns/mispronunciations to add an element of fun. While this could elevate the entertainment factor, it beats our purpose and defeats our assumption that users would genuinely read out the transcripts to make the video. We tackled this problem by including an aspect of gamification in our design.

Iteration 1: Profile

Iteration 2: Profile with scores

Iteration 3: Scoring system explained

Dub Video vs Recreate Video:

Our users were confused about that the ‘Recreate the video’ button. Most of them thought that it was the ‘Dub video’ feature. We decided to address this in our second iteration by adding a feature of dubbing the video. We thought while the Recreate the video gave us the idea about the tone, accent and emotions; the ‘Dub Video’ can also help us achieve exactly the same thing, so why stop the users from doing what they want to, especially since some users could be camera shy (as were a few of our user testers).

Iteration 1: Recreate a clip

Iteration 2: Dub & recreate clip

Iteration 3: Updated icons and scroll interaction added

LINK TO CLICKABLE PROTOTYPE

Here is the link to the clickable prototype:

https://projects.invisionapp.com/prototype/cjt5j5vmd0022sg01np7h3cmp/play

FEASIBILITY & FUTURE SCOPE

Feasibility Evaluation from a Voice AI Developer:

As part of another round of evaluation, we approached an Alexa machine learning engineer and asked him about the feasibility of our project. While he really loved the idea, he questioned the technical feasibility of it. He mentioned that with the scale that we’re thinking about makes it exponentially difficult to implement something like the “next, better Alexa” or a “JARVIS” once you have the kind of data Rplayr promises to collect. He recommended us to first decide what we want to do after collecting this data, and then plan backwards to research everything about what we would need to do to at least try to make it happen. “Without a solid, through and through plan, this fantastic idea, too, will fail.”

That discussion gives us another dimensionality to think about Rplayr holistically as a product/ecosystem and not just a UX Design project. Based on that here is a plan on the things we could do in the future:

Interview Data Ethics & Legal experts to tackle the problem of data ethics and privacy to create a better terms and conditions page and an overall ethical privacy experience considering the nature of the app.
Create a study plan for gamification and design the best gamification approach based on the results of that study.
Redesign the search flow based on user feedback from our usability tests.
Keep each user flow limited to a maximum of 6 user steps (as is the case of recreating a video).
Create and test first-time-user flows such as sign up flow, preferences flow, and tutorials to ensure a smooth onboarding experience for new users.
Create a technical study plan and talk to as many machine learning and Voice AI experts to understand the technical difficulties in trying to create a new voice assistant, and if things don’t seem feasible, pivot to make better use of the data being collected from Rplayr.