On the Fly HLS Transcoding with Golang - Part II
Welcome back (if you’re here after reading the first part). In Part I, we laid the groundwork with an overview of transcoding, the benefits of dynamic (on-the-fly) transcoding for local media servers, and how FFmpeg can convert original content into HLS segments.
Now, let’s dive straight into building a Go server that can dynamically transcode and stream original content.
An HLS media player initially requests an HLS master manifest file. This file contains information about different quality variants of the original media. The master manifest then points to media manifests, which detail the segments of each HLS stream. For simplicity, we will focus on a single-quality transcoding, meaning there will be no master manifest, only a media manifest.
With this, we could theoretically begin the transcoding and segmentation process when a user starts streaming a video and requests the initial manifest. But there is a problem with this approach,
- We would have to wait for the entire video to be transcoded and segmented before being able to serve the initial manifest file.
- This approach defies our original intent for a dynamic backend system, as we don’t want the entire video to be transcoded upon the initial request.
Since HLS has a fixed segment duration, we can serve a manifest for the requested file directly, without waiting for the complete transcoding and segmentation. However, to construct the HLS manifest, we still need to determine the actual duration of the requested video.
Ffprobe to get video duration:
Ffprobe is a tool used to extract meta data such as resolution, bit rate, frame rate, duration etc from multimedia files. It comes bundled with Ffmpeg.
We can use Ffprobe to extract the actual duration of the requested video without transcoding, which is crucial to construct a HLS manifest file for the request video.
The following command can be used to extract the metadata of a video file in JSON format:
ffprobe -v quiet
-print_format json
-show_format
input_video.mp4
This command will output detailed video metadata in JSON format:
{
"format": {
"filename": "vids/sample.mp4",
"nb_streams": 2,
"nb_programs": 0,
"nb_stream_groups": 0,
"format_name": "mov,mp4,m4a,3gp,3g2,mj2",
"format_long_name": "QuickTime / MOV",
"start_time": "0.000000",
"duration": "7990.931667",
"size": "1047145289",
"bit_rate": "1048333",
"probe_score": 100,
"tags": {
"major_brand": "isom",
"minor_version": "1",
"compatible_brands": "isom",
"creation_time": "2016-02-12T09:55:32.000000Z"
}
}
}
HLS Manifest Anatomy:
Since we have to manually construct the HLS media manifest without relying on Ffmpeg, understanding its structure is essential.
Here is a sample media manifest file.
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:2
#EXT-X-PLAYLIST-TYPE:VOD
#EXTINF:2.000000,
/stream/sample/0.ts
#EXTINF:2.000000,
/stream/sample/1.ts
#EXTINF:2.000000,
/stream/sample/2.ts
#EXTINF:2.000000,
/stream/sample/3.ts
#EXT-X-ENDLIST
Let’s get to it line by line.
- #EXTM3U - Header, denotes that the file is a manifest/playlist.
- #EXT-X-VERSION - To specify the HLS protocol version. Version 3 is a widely supported version.
- #EXT-X-MEDIA-SEQUENCE - To specify a start number for the sequence of HLS segments.
- #EXT-X-TARGETDURATION - To specify the fixed duration of each HLS segments.
- #EXT-X-PLAYLIST-TYPE - There are two most common playlist types. VOD (Video On Demand), this denotes pre recorded videos and the playlist will contain all segments from start until end. EVENT, this is commonly used during Live streaming, when the video content is not pre-recorded.
- #EXTINF - Provides the segment information. Duration of the segment. And followed by the URI (relative path) of the segment in the next line.
- #EXT-X-ENDLIST - To denote the end of playlist.
Having covered the theoretical groundwork, the code below should now be fairly intuitive. Below is the function that returns the video duration using Ffprobe.
func ffprobeDuration(videoFilePath string) (float64, error) {
type FFProbeFormat struct {
Duration string `json:"duration"`
}
type FFProbe struct {
Format FFProbeFormat `json:"format"`
}
if _, err := os.Stat(videoFilePath); err != nil {
return -1, err
}
cmd := exec.Command(
"ffprobe",
"-v",
"quiet",
"-print_format",
"json",
"-show_format",
videoFilePath,
)
probeOutput, probeError := cmd.Output()
if probeError != nil {
return -1, probeError
}
var probe FFProbe
if err := json.Unmarshal(probeOutput, &probe); err != nil {
return -1, err
}
duration, err := strconv.ParseFloat(probe.Format.Duration, 64)
if err != nil {
return -1, err
}
return duration, nil
}
And below is the http handler function for initial media manifest request. It accepts videoFileName from request URL path and passes a relative path of the video file to ffprobeDuration method to compute the video duration. It simply constructs the media manifest file in a byte buffer and writes it to the response.
func HLSHandler(w http.ResponseWriter, r *http.Request) {
videoFileName := chi.URLParam(r, "videoFileName")
if videoFileName == "" {
w.WriteHeader(400)
return
}
videoFilePath := filepath.Join("vids", fmt.Sprintf("%s.mp4", videoFileName))
duration, err := ffprobeDuration(videoFilePath)
if err != nil {
w.WriteHeader(400)
return
}
segmentLength := 2
var buf bytes.Buffer
fmt.Fprint(&buf, "#EXTM3U\n")
fmt.Fprint(&buf, "#EXT-X-VERSION:3\n")
fmt.Fprint(&buf, "#EXT-X-MEDIA-SEQUENCE:0\n")
fmt.Fprintf(&buf, "#EXT-X-TARGETDURATION:%d\n", int(segmentLength))
fmt.Fprint(&buf, "#EXT-X-PLAYLIST-TYPE:VOD\n")
segmentCounter := 0
for duration > 0 {
currSegmentLength := float64(segmentLength)
if duration < currSegmentLength {
currSegmentLength = duration
}
fmt.Fprintf(&buf, "#EXTINF:%f,\n", currSegmentLength)
fmt.Fprintf(&buf, "/stream/%s/%d.ts\n", videoFileName, segmentCounter)
duration -= currSegmentLength
segmentCounter++
}
fmt.Fprint(&buf, "#EXT-X-ENDLIST\n")
w.Header().Set("Content-Type", "application/vnd.apple.mpegurl")
w.WriteHeader(200)
w.Write(buf.Bytes())
}
Once the client player downloads the initial media manifest, it proceeds to fetch each individual segment using the URI information provided within that manifest. Let’s now proceed to write the handler functions necessary for serving these segments.
The hlsTranscode
method accepts two inputs: videoFilePath
and the segment
integer. It then returns a file path of the transcoded segment. This method uses ffmpeg
to transcode only the specific segment requested.
To transcode only a specific segment, we need to precisely control three key arguments within Ffmpeg:
-ss
seek to position from where we need to transcode. (It’s usually segment int multiplied by segment duration)-t
: duration to which we need to transcode. In this case we only need to transcode a specific segment, so we just pass the fixed segment duration.-start_number
: start number is the start of the sequence of subsequent segments, since we are not transcoding the video from the beginning we cannot pass a typical 0, in our case we need it to be the segment int.
func hlsTranscode(videoFilePath string, segment int) (string, error) {
seek := float64(segment * 2)
segmentPath := filepath.Dir(videoFilePath)
cmd := exec.Command(
"ffmpeg",
"-ss",
fmt.Sprint(seek),
"-i",
videoFilePath,
"-c:v", "libx264", "-crf", "23", "-preset", "veryfast",
"-c:a", "aac", "-b:a", "128k",
"-hls_time", "2",
"-hls_segment_filename", filepath.Join(segmentPath, "segment_%d.ts"),
"-hls_playlist_type", "vod",
"-start_number", fmt.Sprint(segment),
"-t", fmt.Sprint(2),
filepath.Join(segmentPath, "manifest.m3u8"),
)
_, probeError := cmd.Output()
if probeError != nil {
return "", probeError
}
return filepath.Join(segmentPath, fmt.Sprintf("segment_%d.ts", segment)), nil
}
The HLSSegmentHandler
is an HTTP handler function designed to serve segment requests. It extracts both the videoName and the segment number from the URL path of the incoming request. These values are then passed to the hlsTranscode
method, which is responsible for generating or fetching the transcoded segment. Finally, the handler serves this static, transcoded segment back to the client.
func HLSSegmentHandler(w http.ResponseWriter, r *http.Request) {
videoFileName := chi.URLParam(r, "videoName")
segment := chi.URLParam(r, "segment")
segmentInt, err := strconv.ParseInt(segment, 10, 64)
if err != nil {
w.WriteHeader(400)
return
}
videoFilePath := filepath.Join("vids", fmt.Sprintf("%s.mp4", videoFileName))
segmentPath, transcodeErr := hlsTranscode(videoFilePath, int(segmentInt))
if transcodeErr != nil {
w.WriteHeader(400)
return
}
w.Header().Set("Content-Type", "video/MP2T")
http.ServeFile(w, r, segmentPath)
}
Endpoints to handle manifest and segment request.
func main() {
r := chi.NewRouter()
r.Use(middleware.Logger)
r.Get("/{videoFileName}/stream.m3u8", HLSHandler)
r.Get("/stream/{videoName:[a-zA-Z0-9_-]+}/{segment:[0-9]+}.ts", HLSSegmentHandler)
http.ListenAndServe(":8080", r)
}
We can now launch the program and access http://localhost:8080/sample/stream.m3u8
in Safari. We should see the sample video start playing. Now, on observing the file system, we can see video segemts being generated dynamically as playback continues.
For other browsers, a direct stream won’t work. We will need to develop a client-side player, likely using a JavaScript library, to facilitate playback.
This is a naive implementation of transcoding on the fly. A lot can be improved from here, but I guess this naive implementation can be a foundation for further learning and improvements.
However, I’m particularly intersted in using Go’s language features such as goroutines and channels to improve further on this solution.
Thanks for reading along!