It’s new, so you may think Visual Description jobs are difficult.  Actually, they can be fun. It is like doing a puzzle while transcribing.

Let’s back up and look at why we are doing Visual Description jobs in the first place. Have you ever been in a room when everyone else is laughing, but you don’t get the joke? This experience is something that those with blindness or visual impairments experience daily during their video consumption. Traditional TV or movie viewing does not accommodate those who can’t see what is happening on screen. This is where the Visual Description comes in.

Visual Description is a vital accessibility resource for those with blindness or visual impairments. These groups can fully experience movies, TV shows, YouTube videos, and other visual content by describing the visual elements playing out on screen.

How to Transcribe Visual Description Jobs

So let’s get into the nitty-gritty of How to Do Visual Description jobs. Remember, the purpose of the VD jobs is to provide video descriptions to blind and low vision viewers. Imagine what you would like to know about the video itself and start there. Now- here is the fun part, we have to do all that but fit the description into the blank portion of the natural sound and speaker pauses within the video.

Timing tip: Insert logical descriptions within the blank audio space provided.

Example Video for Visual Description

1 Using Natural Pauses for Visual Descriptions

Visual Description 1For example, in our video above that we are using for demonstration purposes, there is a pause for our purposes with music at the beginning of the video. This allows us to set the stage about what is happening using Visual Description. In our example, the description can be lengthy because of the pause (indicated in red on the image to the right).

Visual Description example: “A school-age boy sits down to lunch at a full cafeteria table, and another boy walks up.”

2 Adding a [CROSS_MEDIA] Tag After Visual Descriptions

Continuing in our example, as the dialogue begins again after the Visual Description is added, we insert a [CROSS_MEDIA] tag.  This is an important step. Adding the [CROSS_MEDIA] tag helps aid in timing for the whole video (indicated in blue on the image below ). This ensures that the visual description pieces fit into the blank audio. We do our best, that is all that is required. This will help low vision viewers, so when in doubt, imagine what would best support the viewer within the limited time you have to provide descriptions.

3. How to Determine the Visual Description Length

A little further down in the video, the audio stops at a natural pause after some dialogue. Again we have an opportunity to add more description of the action within the video. At this point, the first boy stands up angrily. So we literally add this to the description.

Visual Description example: “First boy stands up angrily.”

Visual Description 2There is more information on the screen that we can add for additional details at this point and there is a longer natural pause. So our Visual Description file can be more robust where we are able to insert longer Visual Description. This is not always or even usually the case.  It really comes down to your judgment in using the space provided to the best of your ability in our efforts to provide quality Visual Descriptions (indicated in red on the image to the right).

Revised Visual Description example:  The first boy stands up angrily. The screen splits with black and white images of both students, the first one angry, the second one scared. The screen fades to black and displays the “Second Step” logo.”

The Final Visual Description Job

The audio sample below illustrates the experience of a visually impaired viewer.  Playing the audio-only track really drives the message home.

Listen to the final Visual Description product from the video above

Listening to an audio-only version of the video soundtrack you can really see the Visual Descriptions bring the viewing experience to life for those that have visual impairments.


That, in a nutshell, is how to do Visual Description. The judgment of how detailed you get in your description text depends on the amount of blank audio space provided to insert details. The most important thing is to do your best. A National Health Interview Survey conducted in 2018 reveals that over 32.2 million Americans, about 10% of the population 18 and older, reported experiencing loss of vision, making Visual Description increasingly important.

Thank you from all of us to all of you for doing our part in making the video available to all people.