VOICE Scheduled Testing Session #2

Sunday 21 July, 8:00 UTC

Candidate: Jami (previously known as GNU Ring)

Previous Session: Test Session #1

Comments by: @Naughtylus@fosstodon.org

VOICE (VOICE Organized Investigation of Chat Engines) is an informal app testing group, trialing free code apps to see how well they handle voice chat, especially with groups. We aim to have a group chat testing session at least once a month, on a Sunday, starting at 8:00 UTC, with the first Sunday of the month as the default. We are currently using a Matrix room to confirm the timing of testing sessions, as well as for discussion about available apps and related topics: #voicechat:matrix.org

For this second instance of our scheduled test sessions, we tried the distributed text, voice, and video chat app Jami. Jami is part of the GNU project, was previously known as Ring, and used SIP technology for voice and video calls. As of late, though, Savoir-faire Linux has begun shifting the technology stack of the app from centralised (using servers) to distributed (using peer-to-peer technologies), and rebranding it as Jami. The app currently features one-to-one text, video, and audio chat, as well as audio and video conferencing (but not text as of yet). Everything is encrypted by default (there's not even an option to turn it off) and the only servers used are the bootstrap nodes for the DHT and those used to lookup users from their username, but both are configurable.

So we set out to test the audio conference feature, we would have liked to try the video as well, but one of us was staying at a hotel and didn't have the bandwidth for it. While an audio call is simple enough to place in Jami (there are big buttons where you'd expect them), an audio conference (with more than two participants) is an other beast entirely. If @AmarOk@mastodon.social (one of the Jami devs) hadn't asserted the feature was implemented I'm not sure we would have found it.

To set up a conference in Jami, first call one of the intended participants, then once the call is established, call a second one. That will put the first call on hold. At this point you have two ongoing calls, if you resume the first one, you'll be able to hear and speak to the two other participants, but they won't hear each other. That's not what we want, so what you actually have to do, is drag and drop the first call onto the second one (or the other way around, we still haven't figured that one yet). Now your UI should show two ongoing calls, but everyone is able to hear each other. On their end, though, their UI should display only an ongoing call with you, which prompts me to think your node is actually acting as relay and there is no direct connection between the other two participants.

In this experiment the longest conversation we've sustained was 22min long, with a 5min monologue with no noticeable delay and very few skips in audio, so an overall quality on par with what we previously tested with Jitsi in Riot. It should also be noted that some of us were 22 000 km apart from each other and that the major source of instability in the audio was the poor hotel wifi.

We also tried to mess around with the UI to see if and how we could break it, and it really wasn't that difficult. First, if you're hosting the call (that is if you're the one that did the drag and drop voodoo), you can't mute your audio, clicking mute on any of the two ongoing calls updates the UI but doesn't do anything. (I'm not sure if that's intentional, but I suspect it has to do with the host acting as a relay and piping the audio of the other participants. So muting would in effect mute them as well.) Second, putting any of the calls on hold when you're hosting flat out breaks the whole conference, you can't resume anything after that. Just don't do it.

There was only three of us, so this is about the extent of what we were able to test, and I'm curious to see how it plays out with more participants.

So overall, impressive quality for a peer-to-peer solution, but the UI/UX could use some improvement.