This year, GIPHY has been hard at work building Clips (GIFs with Sound). Clips are short-form videos that people can use to communicate and express themselves. We see a lot of potential for Clips to become huge and want to make sure that we build a foundation that allows all of our users to communicate with Clips.
As a company, we are committed to fostering an inclusive app environment, and that means building products and tools that support our diverse community. For Clips, closed captioning is a critical feature to ensure that our offering is inclusive and user friendly for the hearing impaired and non-native features. It was also one of the most-requested features from our integration partners.
Our goal for the project was to create and provide caption data for our Clips product, which has several benefits:
- Make our Clips offering more inclusive and user-friendly.
- Help our integration partners provide an inclusive experience to their users without any additional work on their part.
- Improve our search to help users find the Clips they want.
The closed caption project involved lots of product teams throughout different stages of the product development life cycle and each team contributed an important piece of the project. It wouldn’t have been possible without tight collaboration. Below, here’s what some of the product owners have to say about this project.
Extracting valuable metadata from GIPHY’s media library that can be used for search and analysis is core to our team’s mission. We primarily focus on visual (GIFs) and textual (queries) modalities, so figuring out the most efficient way to transcribe the speech within a Clip to a text-based representation was one of our first forays into audio. Luckily, speech-to-text is a fairly mature field in machine learning so we knew we had a sturdy foundation upon which to solve this problem.
Our primary goal was to benchmark commercial speech-to-text services vs open-source options to see which approach would best satisfy the larger product requirements, such as accuracy, inference speed, multiple language support, and ease-of-maintenance. We found that the combination of Google’s Speech-to-Text and Video Intelligence services provided the highest quality results across multiple languages while only requiring a minimal amount of work to integrate into GIPHY’s media pipeline.
— Nick Hasty, VP Product, Machine Learning
Content Tools Team
After initial research, Content Tools decided on the approach to convert Clips’ existing Google transcriptions to subtitle formats. The subtitle formats we support are as follows:
- WebVtt captions, to support in html5 video players.
- SRT files, which is probably the most common and widely supported caption/subtitle file format for digital video distributed on the internet.
- Captions as part of the MP4 files, since it is the easiest way to provide captions to our mobile clients, especially those using the native iOS video player.
We provide caption file URLs in the metadata response and render them when clients request them. This enables us to iterate and improve our algorithm that converts the transcriptions to captions so that we can ensure the product’s accessibility.
After captions are created for Clips, they are transcribed (and sometimes translated) and presented to our Trust & Safety team for review prior to finalizing moderation ratings.
— Brooke Goldfarb, Product Manager, Content Tools
Our search algorithm employs numerous metadata signals to serve content for a given search query. The speech-to-text model allows us to search within the audio of a Clip, making it easier for users to find their favorite quote. We partial match phrases within the transcription/caption against a user’s query to return results.
- Clip result for not all of us are michael freaking scott
— Alex Anderson, Product Manager, Search
For the beta phase of the captions project, the API’s job was simple: get the captions and send them to the client. But we knew at scale with hundreds of partners in dozens of countries that we would need to plan ahead to support future complexity in how we design the captions as part of the API:
- We support video players on different OS’s and devices, so we need to support multiple captions file types.
- Though we don’t support multiple languages now, we have set up the API to allow for future support of multiple languages.
— Nick Greene, Product Manager, API and Developer Portal
SDK & Embed Player Team
GIPHY’s Clips embed player empowers users to share our content all over the web, so it was important that we increase accessibility for viewers of our Clips content. Adding closed captions enables those with difficulty hearing and non-native speakers to gain additional context for the content. Our goal was to make captions easy to use without cluttering the experience. To keep the design slick, we introduced a kabob menu on hover and with just two clicks, a user can turn on closed captions within our Clips embed player. We hope this new feature will help more people enjoy GIPHY Clips!
— Dan Burke, Director of Product, Developer Products
In the coming months, we’ll be rolling out closed captions support on all of our owned and operated Clips players. They’re already live in the app and we expect them to be available more widely soon. We’ll be closely following search requests from our users in order to support the most requested languages going forward.
The closed caption project aligns with GIPHY’s value of Inclusivity. To learn more about GIPHY’s values, check out our recently relaunched GIPHY About page which details all of our values and philosophies.