The GStreamer Recorder Pipeline
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The main pipeline is a 'GstRec' element, which is a GstThread. It contains
entrance points for the video or audio source (which isn't necessarily part
of the actual GstRec element itself), contains the encoders, muxer and output
element. Outside the GstRec element *could* live a video or audio source (but
they could also be inside the GstRec element), and (if the sources do not
implement this themselves) interface emulators.

Sources
-------
GStreamer Recorder (from now on: gst-rec) contains at most two sources: one
video (videotestsrc, v4lsrc, v4l2src, ...) and one audio source (osssrc,
alsasrc, sinesrc, ...). The video source can implement an XOverlay, a Tuner
and a ColorBalance interface. The audio source can implement a Mixer inter-
face. Both can implement the PropertyProbe interface. If absent, some of the
interfaces will be emulated. For specific documentation on each of those, see
their own documentation.
The interface emulators are videobalance (ColorBalance), ximagesink or
xvimagesink (XOverlay) and volume + alsasink/osssink (Mixer). Long-term, we're
aiming at supporting directFB and sound server outputs, too; this will require
autodetection of the output element. Currently, this doesn't work.

Video source
------------
A video source has two purposes: video capture and video display. Video
display happens through the XOverlay interface. If the source supports the
interface itself (most TV cards in v4lsrc/v4l2src), this is used. Else
(videotestsrc, webcam in v4lsrc/v4l2src), a small separate pipeline part is
created for this purpose, with ximagesink or xvimagesink as output. This
leads to several caveats, for each of which we have solutions integrated in
gst-rec. The first one is that, since display starts before the actual
recording, timestamps of the audio source and video source in the capture
process might be off. That's why we apply 'timestamp shifting' in this case.
We have a separate element that makes sure that the timestamp *in the capture
pipeline* starts at 0. The second is that after EOS (when recording stops),
the capture should stop, but display shouldn't. For this purpose, we add
the element to a thread that is not part of our main pipeline, but operates
independently. This way, EOS in the pipeline doesn't shut down display. Note
that both of these fallbacks are only needed in the case of interface
emulation. Normally, they're not needed. Since most (all) source elements
are continuous data emitters with no stream EOS (like an end-of-file), we
also implement a EOS-handler, which sets the element to EOS at a given
moment. The element doing both the timestamp shifting and the EOS handling
is called the assistant. The source and assistant are, together, added into
a bin called 'manager'. This is required, because the assistant will emit
EOS, but the source itself won't. The manager will respond to EOS in the
assistant and set the whole bin to EOS, even though the source didn't emit
EOS itself.

In the easiest case, the source part pipeline looks like this:
.-------------------------------GstRec-.
|.-----------------------Thread-.      |
||.------------Manager-.        |      |
|||[Source]![Assistant]|![Queue]|![...]|
||'--------------------'        |      |
|'------------------------------'      |
'--------------------------------------'
The source implements the XOverlay interface and handles display and capture.
The assistant+manager handle EOS. The thread makes sure that this all runs at
a high priority so that we don't lose capture data (drop frames, skip audio,
etc.).

In the hardest case, the source part pipeline looks like this:
.--------------------------------------------Thread-.
|                             .-------------Thread-.|
|                             |![Queue]![Videosink]||
|                             /--------------------'|
|[Source]![VideoBalance]![Tee]                      |
'-----------------------------\---------------------'
                               \-------------------------------GstRec-.
                               |\-----------------------Thread-.      |
                               ||\------------Manager-.        |      |
                               |||![Queue]![Assistant]|![Queue]|![...]|
                               ||'--------------------'        |      |
                               |'------------------------------'      |
                               '--------------------------------------'
The 'video source' as given to GstRec is actually a dummy element. The outer
thread always runs, and takes care of display. The Queue inside the GstRec
element fetches buffers and handles timestamps. Note that we first take care
of display and then of capture. The reason is that, once display is done, we
can change the timestamp without affecting the display timing. The other way
around is - unfortunately - not possible. The outermost thread and GstRec are
not contained within each other, they operate independently.

Note that in both cases, there's a filter caps between the assistant and its
downstream subsequent queue element. This filter defines size of the video.

Audio source
------------
Like the video source, the audio source has two purposes: audio capture and
hearing the audio while capture is going on, through the Mixer interface or
through audio output. If the mixer interface is not available on the source
(e.g. sinesrc, silencesrc), we add a volume emulation element and an audio
output (osssink/alsasink). If it is, we use the capabilities of the element
itself (e.g. osssrc, alsasrc). The idea is pretty similar to the video part
explained above.

In the easiest case, the source part pipeline looks like this:
.-------------------------------GstRec-.
|.-----------------------Thread-.      |
||.------------Manager-.        |      |
|||[Source]![Assistant]|![Queue]|![...]|
||'--------------------'        |      |
|'------------------------------'      |
'--------------------------------------'
"Hey, that looks just like the video one!" Yes, it does. In fact, it is 100%
similar, except for the other type of source element. The source element
implements the mixer interface.

In the hardest case, the source part pipeline looks like this:
.--------------------------------------Thread-.
|                       .-------------Thread-.|
|                       |![Queue]![Audiosink]||
|                       /--------------------'|
|[Source]![Volume]![Tee]                      |
'-----------------------\---------------------'
                         \-------------------------------GstRec-.
                         |\-----------------------Thread-.      |
                         ||\------------Manager-.        |      |
                         |||![Queue]![Assistant]|![Queue]|![...]|
                         ||'--------------------'        |      |
                         |'------------------------------'      |
                         '--------------------------------------'
Again, pretty similar to the video case, except that we now use volume to
emulate volume handling and audiosink for the audio output.

Also, like in the video case, there's a filter caps between the assistant and
its downstream subsequent queue element. It determines audio samplerate and
number of channels.

In the remainder, the GstRec-parts of these pipelines (everything inside the
Manager) will be referred to as VideoSource and AudioSource. The rest is
omitted to keep things simple.

Encoding, Muxing, Output
------------------------
An encoder is optional for both video and audio. If no encoder is specified,
the source-provided will be processed to the output without modification.
this can be raw YUV or MJPEG for video, or PCM audio. If an encoder is used,
the pipeline will look like this:
.-------------------------------------Gstrec-.
|.-----------------------------Thread-.      |
||.---------Thread-.                  |      |
|||[Source]![Queue]|![Encoder]![Queue]|![...]|
||'----------------'                  |      |
|'------------------------------------'      |
'--------------------------------------------'
Without, the pipeline looks like:
.-----------------GstRec-.
|.---------Thread-.      |
||[Source]![Queue]|![...]|
|'----------------'      |
'------------------------'
[...] is the output in both cases. Assuming that we capture both video and
audio, that part of the pipeline will look like this (where the sources can
contain the encoders, for simplicity):
.-----------------------------------------Gstrec-.
|.---------Thread-.                              |
||[Source]![Queue]|!video_00 \                   |
|'----------------'                              |
|                              [Muxer]![Filesink]|
|.---------Thread-.                              |
||[Source]![Queue]|!audio_00 /                   |
|'----------------'                              |
'------------------------------------------------'
Example encoders are lame, vorbisenc (audio), xvidenc, jpegenc (video). Some
examples of muxers include avimux, asfmux, matroskamux. Output is currently
limited to filesink, but streaming recording (to network) is intended for
the long-term future.

So, that includes the little introduction to the gst-rec pipeline. Don't get
confused, but I know it's hard. Gst-rec just isn't for wheenies. ;).

--
Ronald Bultje <rbultje@ronald.bitfreak.net> (Dec. 28th, 2003)
