ELI5: Explain Like I'm 5

Speaker diarisation

Speaker diarisation is when an adult in charge of a computer wants to understand who is talking in an audio recording. Just like how a teacher knows which student is speaking in a classroom, speaker diarisation uses special computer programs to recognize different speakers in a recording.

Imagine a bunch of toys in your toy box. Each toy has a different color, shape, or sound. Just like toys, people's voice have different patterns or characteristics. These characteristics help the computer program to tell one speaker from another.

For example, let's say you are listening to a recording of a storytime. You will hear different voices reading the story. The computer program will carefully listen and look for different voice patterns, like the way someone's voice pitches up when they ask a question, or the way someone else's voice sounds when they speak quickly.

Once the program recognizes these patterns, it can "label" each voice with a different color, shape or name. You can then see how many people spoke during the recording, and for how long they talked.

Overall, speaker diarisation helps grown-ups to understand who is talking in a recording, just like how an adult in charge of a school bus knows which student is sitting in which seat.