2020-07-30 Streaming Rusty Days

So it somehow happened that I ended up being responsible for streaming the Rusty Days 2020 online conference. It turns out, it's an... interesting topic. Let's talk about it a bit, so that you can do it too, if you want.

On the surface, it sounds simple—there are already videoconferencing solutions, there are also streaming platforms, so it can't be that hard. But first, let's review our requirements:

We're streaming live.
The speaker isn't with us in person.
We want to show both the speaker's face, and their presentation at once.
We don't want to force speakers to install or configure any special software.
The speaker needs to take questions from the autdience, possibly during the talk.
All of this has to be recorded.
We want to stream where people want to watch us, so we want to output to Youtube, Twitch and Facebook.

A typical setup for streaming is to use OBS to take a camera stream, combine it with an overlay, maybe a second camera, and push it into the internet. With our requirements like that, though, we immediately hit two issues: * First, we don't have a physical camera to capture the stream from, because the speaker isn't physically nearby. So we need to get the video stream in another way. * Second, OBS can't output to more than one destination at once.

Capturing video

My first thought was to somehow get hold of a raw video stream and receive it with OBS. Except that OBS doesn't know how to receive video streams. There's a plugin for that, but I haven't been able to get it to work. Besides, getting a rtmp stream from the speaker wasn't going to be easy. We could probably prepare a software package to run and give us the output we wanted, but our speakers used different operating systems, and supporting this would be a nightmare. So the only remaining solution was to do something in a web browser.

So I tried to find a meeting app that would just export a stream, but didn't find any. After much searching I found Jibri, which... spawns a browser in a virtual framebuffer in a headless VM, and captures it's contents. Ugly. It became obvious that the state-of-the-art way to go is capturing browser windows. Also ugly, but it works. So I tried Zoom. It somewhat worked, but didn't let me display the camera/presentation on the whole screen, and I had to find something else. Finally, I settled on Google Meet.

While trying zoom, another challenge became apparent: our overlay had a slot for the presentation with a fixed aspect ratio. And it means we have to make sure of two things:

The speaker outputs an image with 16:9 proportions,
and we capture it like that.

The first thing was mostly a problem with humans—we've had to ask each speaker to verify that, then to make their window fullscreen, or recompile their presentation, or whatever. In extreme cases, we just trimmed the edges, and it was mostly fine. The second thing, though, was more problematic. Most meeting software resizes the shared image based on the number of participants, and the aspect ratio of whatever the speaker had shared. On the other hand, we wanted the most stable setup possible, and this was way too many moving parts. Fortunately, Meet allows pinning a screen, which also conveniently makes it span the full window. Now, only to set the correct window size, so that OBS always gets the same image dimensions.

Being on Linux, I used xdotool:

xdotool search --onlyvisible --name "Mozilla Firefox" windowsize 1280 $((720 + 152))

And it resized a random Firefox window on the current desktops. Yay. In hindsight, I should have used xdotool selectwindow instead of search, but well. The size itself was dictated by the fact that Meet outputs at most 1280x720, so it didn't make sense to capture larger images. The additional 152 vertical pixels is the space between the top of the browser window, and the top of the video.

So now I had to capture two windows with OBS, during which I learned that capturing windows from other desktops doesn't seem to be possible. And that two Meet windows use almost 100% of my CPU. And streaming with OBS too, but that was solved quickly by switching to gpu encoding. Multiple outputs

So now for the other problem: OBS can have only one output. Again, there are some plugins to change that, but I didn't manage to get them to work. So, being a sysadmin, I thought: I'll just proxy the traffic, let's find a proxy. And nginx has a module for that. It's kinda weird, but eh, I don't have the time for that.

The setup was fairly simple, install nginx-mod-rtmp, put load_module /usr/lib/nginx/modules/ngx_rtmp_module.so; on top of nginx.conf, then the stream config somewhere in the file:

rtmp_auto_push on;

rtmp {
    max_message 100M;
    server {
        listen 1935;
        chunk_size 4096;

        application live {
            live on;
            record off;
            push rtmp://x.rtmp.youtube.com/live2/STREAM_KEY;
            push rtmp://live.twitch.tv/app/STREAM_KEY;
            push rtmp://127.0.0.1:2207/STREAM_PATH;
        }
    }
}

And, yeah, it couldn't be that simple. Facebook doesn't support unencrypted rtmp, and the module doesn't support rtmp over tls. What to do? Another proxy, of course! In this case, stunnel:

[fb-live]
client = yes
accept = 127.0.0.1:2207
connect = live-api-s.facebook.com:443

Now we can stream.

Capturing video... again

Except that in one trial the cpu usage of Meet was so high that the stream was unstable. Windows it is, then. As much as I don't like using non-free software, Meet on windows simply doesn't consume any significant resources, and was way more stable because of that. The only problem, I wasn't going to install nginx and stunnel on Windows. Well, it's a good thing I have two computers. Now, it would be possible to do it on one, but why bother?

Anyway, this pushed me back to the window resizing problem, but it turned out AutoHotkey on Windows can do that with a simple script:

^j::
WinMove, A, , 0, 0, 1280, 720+154

Removing UI clutter

One thing left was to make sure no Meet UI elements would be visible on the screen. Of course, all the class names in Meet are random, so it wasn't obvious, but there's nothing that a bunch of javascript code in the console wouldn't fix:

findActiveVideo = () => {
    for(let node of document.querySelectorAll("video"))
        if(node.style.display == "")
            return node
}
hideAllElse = (node) => {
    if(node == document.body) return
    for(let n of node.parentNode.children)
        if(n != node)
            n.style.display = 'none'
    hideAllElse(node.parentNode)
}
hideAllElse(findActiveVideo())

This code hides everything except the first visible video stream, so it's important to pin it first. A few hours before the first talk we've realized something terrible: Meet has sounds, and we really didn't want them on our stream. Fortunately, they all share the same url prefix, which can be easily blocked. That was way easier than I expected.

Finally, we were ready for the conference. While the technical part was mostly done, the organizational challenges remained.

Presentation workflow

Our speaker obviously couldn't watch three streams while presenting, so we needed to forward questions to them. And none of us could monitor three streams at once either, so we've had a team of helpers. They copied the questions from twitch/youtube/facebook, then forwarded them to a shared channel. From there, a person designated to communicate with the speaker pasted them to the Meet chat. This is a good solution, but requires the speaker to either share a window, not the entire screen, or have two screens, so that the chat window isn't visible on the stream. One might think that we could just trim it out, but Meet outputs at most 720p, so any trimming would drastically lower the image resolution.

About half an hour before the presentation, the I would join the meeting from two windows: one for the presentation, one for the camera. Remember—pinning the streams is the only way to ensure a stable picture. Then the speaker would join, and another person from the org team that would communicate with the speaker. There were some things that were necessary to check:

That the speaker had the correct aspect ratio in the shared window.
That they knew that they must not stop sharing the window (or camera) until given a signal that they are out of view.
That they know about the delay between what they're saying, and what appears on the stream.
That they are either in a timezone where it's the middle of the day, or they are not using natural lighting.
That they can switch slides without the window capture freezing.
That they can read the chat while also sharing the screen/window.

One everything was set up, I removed the UI from my Meet windows, and all that remained was frantically looking for imperfections for the next 90 minutes. If you watch the streams carefully, you can see me moving the presentation windows by a few pixels, so there's no black border. And a lot of mistakes I made, if you know what to look for.

Next time, I'll just write software to deal with all that. Hopefully, there won't be a next time, and we can just return to meeting in real life.