Skip to main content

Stage 2. Working with the sound

How to work with the generated sound and edit the tracks

Updated over 6 months ago

Please note that if you are working on your first task, you first finish working on the first 100 segments, then our editor checks them, and gives you feedback and corrections to make and comments to have in mind while working on the rest of the sound. After receiving feedback you make changes needed and proceed to work with the rest of the project. Then, after you submit the whole project, the editor will check it, and give you the feedback and ask to make corrections.

The editor will be giving you feedback and ask for corrections during the time you learn how to work with the sound till they feel that you comply with all our requirements.

! Please note that the platform works best in Google Chrome, in other browsers some errors might occur.

Overview of the interface

This is an overview of the current interface we work with (on the platform). TTS - text to speech, the technology we are working with. And there was an update after Galina already recorded this video - the sound wave changes after you regenerate the track, so now it will be easier to work with.

You can filter the segments in your projects using these buttons:

All blocks - show all segments

Selected only - show only the segments that were selected by ticking the box on the right upper side of the segment

High def only - show only the red segments

Edited only - show the segments where you edited the text and didn't regenerate the audio

Resynthesized only - show only regenerated segments

"Scissors" Tool (Audio Trimming)

An important update: we introduced the "Scissors" Tool (Audio Trimming) in order not to use Audacity and other programs so much.

To access it, you need to click on the scissors icon below the synthesis wave in the "Edit" mode. This new feature allows you to quickly cut audio clips right on the platform — no need for Audacity or other external editors!

You'll see a smaller window with cutting options. Click and drag to select an area on the wave. In the screenshot below, it's highlighted in blue.

Key Tool Functions:

- Play (Space) – Listen to the selected segment. Using shortcut "Space" you can listen to the audio on top of the window - either the whole audio, or the selected bit.

- Reset Selected – Clear the current selection. Pressing "Esc" will reset the selection so that you can start over.

Next, you'll see two trimming options:

1. Trim (shortcut "T") - allows you to delete everything EXCEPT the area you selected. So, here you keep just the selected part.

Example

Original "I love... you" → Select "I love" → Click Trim → Result: "I love".

2. Cut (shortcut "C") - allows you to delete the selected area. So, here you keep everything except the selected part.

Example

Original "I love... you" → Select "I love" → Click Cut → Result: "you".

3. Mute (shortcut M) - allows you to mute the selected audio fragment. It is a useful feature for getting rid of some artificial noises or clicks that need muting.

Don't forget to click on the green "Save" button in order to save the result.

NB:

DO NOT cut out any gaps (silence) that may be generated at the beginning or end of tracks. This will be done by the editing team at the next stage of the project.

DO NOT cut the track in places where the character is speaking (right in the middle of the word).

Change the text in the segment according to the final version of the audio track to avoid a situation where there is more text written in the block than is in the audio.

Shortcut T in the regeneration window

Now you can use shortcut T by pressing the button T on your keyboard in the regeneration window to regenerate the track.

How to work with the bugs of the neural network

The video "Bugs" was recorded long ago, so that is why Galina uses a bot instead of platform, at that time we didn't have it.

How to work with red segments

We try to maintain a phrase length difference of 30% even in phrases that are not highlighted (excluding subtitles, cut phrases, and similar cases where it is not possible), and we pay special attention to red blocks.

Blocks highlighted in red are those that are longer than the original by 30% or shorter than the original by any percentage. There is no need to be scared; it is not necessary to correct every red block. It is important to monitor the audio wave and try to match the waveform of the synthesized track to that of the original. If the waveform of a red block is similar to the original one, you can leave it as is, just leaving the comment about it when submitting the task

The main reasons why a block may remain red and it is acceptable are:

1. The original audio is cut (not complete)

2. Silence at the beginning/end of the block, causing the block to be longer by percentage

3. The block is entirely laughter/sound/shout/moan, which the neural network cannot generate well; these blocks should be deleted

4. The character is off-screen, and their lips are not visible, the current length will fit

5. The block contains a sound/laughter/shout/moan that cannot be well generated by the neural network, the length without it corresponds to the video

6. The block contains a pause that does not match the original in length. It needs to be extended/shortened during editing by the editing team. The length of the parts corresponds to the original.

Basically, what you look at are the sound wave and pauses. They are the factor that helps you to decide whether to change it or leave it. For example, here there's a pause in the original, and it is absent in the generated audio, so we should correct it.

In this example, the tracks seemingly have the same length, but it's not the case: in fact, the second track is shorter, and the apparent small percentage is acquired due to the blank space (silence) at the end of the generated wave.

We don't need to eliminate the blank space per se (it's not doable), but we have to keep an eye on speech length and the role of this blank space in each track: if the length of speech in the original and in the translation is the same, and the blank space just creates some extra length - don't do anything, it's totally acceptable; if, like in this example, the speech in the resulting audio is shorter than that of the original, and the blank space creates an illusion that the two tracks are equal in length - please, rephrase the translation to make it longer and revoice the text anew.

This, for example, is basically fine, as there's a silence in the end which makes it longer.

But this one is not, as we can see on the audiowave that the parts are too long here, so they should be shortened

This one is fine, because we can visually see that if we take out the silence, it will be similar in length to the audiowave of the original

This one is also fine, because what makes it shorter is the length of the pause, but the editing team will take out the silence in the beginning and the end, and will make the pause longer, so it will fit

But this one should be shortened, as you can see, because it is too long, twice as long as the original audio wave

Short FAQ:

What happens when I start editing the text box? - Nothing, you should regenerate the track by clicking on the icon with arrows in circle for the track to be regenerated. If you just change the text, the sound will not change without regeneration

When you regenerate a track, does the system update the length ratio of the tracks? - Yes

Original audio is stopping before it really finishes in the film, what to do? - Original audios could be a bit shorter than they should be when the phrases overlap in the video (when a character is interrupted by another or another one starts speaking while the first one hasn’t finished the phrase). I mean, we make the subtitles, and they can't overlap (because the system cutting audios from the film will crush in that case). And if a person starts speaking while the other haven't finished, well, we can't cut the original tracks that way. And in that case I suggest listening to the video to get the idea of the length we are aiming for. And please write about those segments to your coordinator

When there are super short phrases, the ratio can skyrocket. 80-120% but actually there's very little that can be done. - Don't look at the ratio when the phrase is short, just look at the audio wave. It should be similar to the one that is in the original

How does the ratio compare the length of the original to the length of the translated? - It has + and -. So +95% means that the generated one is 195% of the original. If it says -45%, then it is 45% shorter than the original.

I'm starting to find different voice IDs for the characters. - We have different ones for one character depending on the emotions he or she expresses. So there's a voice id for laughter, anger, cry, etc.

The voice of one character differs in some segments a lot from the others. - Yeah, we also discovered that. If it differs A LOT and regenerating doesn't help, please contact your coordinator, and they will help you with this. PLEASE DO NOT CHANGE EMOTIONS WITHOUT TALKING TO YOUR COORDINATOR FIRST.

Instructions for your specific language

Please see your language section to see the most common issues that arise while working with this specific language:

Did this answer your question?