Autor: Gregor Lang

whisper-ctranslate2 – Installation für Linux Mint

Installation unter Linux:
Überprüfen, ob Python installiert ist:

Im Terminal eingeben:
python3 -V
Mögliche Antwort des Rechners: Python 3.10.12

Mit der Version 3.10.12 funktioniert faster-Whisper.
Stand 11.05.2024 lautet die Mindestanforderung: „Python 3.8 oder größer“

pip installieren:
Im Terminal eingeben:
sudo apt install python3-pip

faster-Whisper installieren:
Im Terminal eingeben:
pip install -U faster-Whisper
Ob dieser Schritt wirklich notwendig ist, kann ich nicht mit Sicherheit sagen. Jedenfalls funktioniert das Ganze am Ende.

whisper-ctranslate2 installieren:
Im Terminal eingeben:
pip install -U whisper-ctranslate2

Jetzt den Rechner neu starten.

8. August 2025
whisper-ctranslate2 – Anwendung

Für Englisch:

whisper-ctranslate2 –model large-v3 –model_dir models –language English –device auto –output_format all –pretty_json True „input.mp4“

Für Deutsch:

whisper-ctranslate2 –model large-v3 –model_dir models –language German –device cpu –output_format all –pretty_json True „input.mp4“

Für Portugiesisch:

whisper-ctranslate2 –model large-v3 –model_dir models –language Portuguese –device cpu –output_format all –pretty_json True „input.mp4“

Weitere Sprachen (Stand 8. August 2025):

[–language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,yue,zh,Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,Bosnian,Breton,Bulgarian,Burmese,Cantonese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian,Icelandic,Indonesian,Italian,Japanese,Javanese,Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,Macedonian,Malagasy,Malay,Malayalam,Maltese,Mandarin,Maori,Marathi,Moldavian,Moldovan,Mongolian,Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,Sundanese,Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba}]

Aufrufen der Hilfefunktion:

whisper-ctranslate2 –help

usage: whisper-ctranslate2 [-h]
[–model {tiny,tiny.en,base,base.en,small,small.en,medium,medium.en,large-v1,large-v2,large-v3,distil-large-v2,distil-large-v3,distil-medium.en,distil-small.en}]
[–model_directory MODEL_DIRECTORY]
[–model_dir MODEL_DIR]
[–local_files_only LOCAL_FILES_ONLY]
[–output_dir OUTPUT_DIR]
[–output_format {txt,vtt,srt,tsv,json,all}]
[–pretty_json PRETTY_JSON]
[–print_colors PRINT_COLORS] [–verbose VERBOSE]
[–highlight_words HIGHLIGHT_WORDS]
[–max_line_width MAX_LINE_WIDTH]
[–max_line_count MAX_LINE_COUNT]
[–max_words_per_line MAX_WORDS_PER_LINE]
[–device {auto,cpu,cuda}] [–threads THREADS]
[–device_index DEVICE_INDEX]
[–compute_type {default,auto,int8,int8_float16,int8_bfloat16,int8_float32,int16,float16,float32,bfloat16}]
[–task {transcribe,translate}]
[–language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,yue,zh,Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,Bosnian,Breton,Bulgarian,Burmese,Cantonese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian,Icelandic,Indonesian,Italian,Japanese,Javanese,Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,Macedonian,Malagasy,Malay,Malayalam,Maltese,Mandarin,Maori,Marathi,Moldavian,Moldovan,Mongolian,Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,Sundanese,Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba}]
[–temperature TEMPERATURE]
[–temperature_increment_on_fallback TEMPERATURE_INCREMENT_ON_FALLBACK]
[–prompt_reset_on_temperature PROMPT_RESET_ON_TEMPERATURE]
[–best_of BEST_OF] [–beam_size BEAM_SIZE]
[–patience PATIENCE]
[–length_penalty LENGTH_PENALTY]
[–suppress_blank SUPPRESS_BLANK]
[–suppress_tokens SUPPRESS_TOKENS]
[–initial_prompt INITIAL_PROMPT]
[–condition_on_previous_text CONDITION_ON_PREVIOUS_TEXT]
[–compression_ratio_threshold COMPRESSION_RATIO_THRESHOLD]
[–logprob_threshold LOGPROB_THRESHOLD]
[–no_speech_threshold NO_SPEECH_THRESHOLD]
[–word_timestamps WORD_TIMESTAMPS]
[–prepend_punctuations PREPEND_PUNCTUATIONS]
[–append_punctuations APPEND_PUNCTUATIONS]
[–repetition_penalty REPETITION_PENALTY]
[–no_repeat_ngram_size NO_REPEAT_NGRAM_SIZE]
[–hallucination_silence_threshold HALLUCINATION_SILENCE_THRESHOLD]
[–vad_filter VAD_FILTER]
[–vad_threshold VAD_THRESHOLD]
[–vad_min_speech_duration_ms VAD_MIN_SPEECH_DURATION_MS]
[–vad_max_speech_duration_s VAD_MAX_SPEECH_DURATION_S]
[–vad_min_silence_duration_ms VAD_MIN_SILENCE_DURATION_MS]
[–version] [–hf_token HF_TOKEN]
[–speaker_name SPEAKER_NAME]
[–live_transcribe LIVE_TRANSCRIBE]
[–live_volume_threshold LIVE_VOLUME_THRESHOLD]
[–live_input_device LIVE_INPUT_DEVICE]

positional arguments:
audio audio file(s) to transcribe (default: None)

options:
-h, –help show this help message and exit
–version show program’s version number and exit

Model selection options:
–model {tiny,tiny.en,base,base.en,small,small.en,medium,medium.en,large-v1,large-v2,large-v3,distil-large-v2,distil-large-v3,distil-medium.en,distil-small.en}
name of the Whisper model to use (default: small)
–model_directory MODEL_DIRECTORY
directory where to find a CTranslate2 Whisper model
(e.g. fine-tuned model) (default: None)

Model caching control options:
–model_dir MODEL_DIR
the path to save model files; uses
~/.cache/huggingface/ by default (default: None)
–local_files_only LOCAL_FILES_ONLY
use models in cache without connecting to Internet to
check if there are newer versions (default: False)

Configuration options to control generated outputs:
–output_dir OUTPUT_DIR, -o OUTPUT_DIR
directory to save the outputs (default: .)
–output_format {txt,vtt,srt,tsv,json,all}, -f {txt,vtt,srt,tsv,json,all}
format of the output file; if not specified, all
available formats will be produced (default: all)
–pretty_json PRETTY_JSON, -p PRETTY_JSON
produce json in a human readable format (default:
False)
–print_colors PRINT_COLORS
print the transcribed text using an experimental color
coding strategy to highlight words with high or low
confidence (default: False)
–verbose VERBOSE whether to print out the progress and debug messages
(default: True)
–highlight_words HIGHLIGHT_WORDS
underline each word as it is spoken in srt and vtt
output formats (requires –word_timestamps True)
(default: False)
–max_line_width MAX_LINE_WIDTH
the maximum number of characters in a line before
breaking the line in srt and vtt output formats
(requires –word_timestamps True) (default: None)
–max_line_count MAX_LINE_COUNT
the maximum number of lines in a segment in srt and
vtt output formats (requires –word_timestamps True)
(default: None)
–max_words_per_line MAX_WORDS_PER_LINE
(requires –word_timestamps True, no effect with
–max_line_width) the maximum number of words in a
segment (default: None)

Computing configuration options:
–device {auto,cpu,cuda}
device to use for CTranslate2 inference (default:
auto)
–threads THREADS number of threads used for CPU inference (default: 0)
–device_index DEVICE_INDEX
device ID where to place this model on (default: 0)
–compute_type {default,auto,int8,int8_float16,int8_bfloat16,int8_float32,int16,float16,float32,bfloat16}
Type of quantization to use (see
https://opennmt.net/CTranslate2/quantization.html)
(default: auto)

Algorithm execution options:
–task {transcribe,translate}
whether to perform X->X speech recognition
(‚transcribe‘) or X->English translation (‚translate‘)
(default: transcribe)
–language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,yue,zh,Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,Bosnian,Breton,Bulgarian,Burmese,Cantonese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian,Icelandic,Indonesian,Italian,Japanese,Javanese,Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,Macedonian,Malagasy,Malay,Malayalam,Maltese,Mandarin,Maori,Marathi,Moldavian,Moldovan,Mongolian,Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,Sundanese,Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba}
language spoken in the audio, specify None to perform
language detection (default: None)
–temperature TEMPERATURE
temperature to use for sampling (default: 0)
–temperature_increment_on_fallback TEMPERATURE_INCREMENT_ON_FALLBACK
temperature to increase when falling back when the
decoding fails to meet either of the thresholds below
(default: 0.2)
–prompt_reset_on_temperature PROMPT_RESET_ON_TEMPERATURE
resets prompt if temperature is above this value. Arg
has effect only if condition_on_previous_text is True
(default: 0.5)
–best_of BEST_OF number of candidates when sampling with non-zero
temperature (default: 5)
–beam_size BEAM_SIZE
number of beams in beam search, only applicable when
temperature is zero (default: 5)
–patience PATIENCE optional patience value to use in beam decoding, as in
https://arxiv.org/abs/2204.05424, the default (1.0) is
equivalent to conventional beam search (default: 1.0)
–length_penalty LENGTH_PENALTY
optional token length penalty coefficient (alpha) as
in https://arxiv.org/abs/1609.08144, uses simple
length normalization by default (default: 1.0)
–suppress_blank SUPPRESS_BLANK
suppress blank outputs at the beginning of the
sampling (default: True)
–suppress_tokens SUPPRESS_TOKENS
comma-separated list of token ids to suppress during
sampling; ‚-1‘ will suppress most special characters
except common punctuations (default: -1)
–initial_prompt INITIAL_PROMPT
optional text to provide as a prompt for the first
window. (default: None)
–condition_on_previous_text CONDITION_ON_PREVIOUS_TEXT
if True, provide the previous output of the model as a
prompt for the next window; disabling may make the
text inconsistent across windows, but the model
becomes less prone to getting stuck in a failure loop
(default: True)
–compression_ratio_threshold COMPRESSION_RATIO_THRESHOLD
if the gzip compression ratio is higher than this
value, treat the decoding as failed (default: 2.4)
–logprob_threshold LOGPROB_THRESHOLD
if the average log probability is lower than this
value, treat the decoding as failed (default: -1.0)
–no_speech_threshold NO_SPEECH_THRESHOLD
if the probability of the <|nospeech|> token is higher
than this value AND the decoding has failed due to
logprob_threshold, consider the segment as silence
(default: 0.6)
–word_timestamps WORD_TIMESTAMPS
(experimental) extract word-level timestamps and
refine the results based on them (default: False)
–prepend_punctuations PREPEND_PUNCTUATIONS
if word_timestamps is True, merge these punctuation
symbols with the next word (default: „‚“¿([{-)
–append_punctuations APPEND_PUNCTUATIONS
if word_timestamps is True, merge these punctuation
symbols with the previous word (default:
„‚.。,，!！?？:：”)]}、)
–repetition_penalty REPETITION_PENALTY
penalty applied to the score of previously generated
tokens (set > 1 to penalize) (default: 1.0)
–no_repeat_ngram_size NO_REPEAT_NGRAM_SIZE
prevent repetitions of ngrams with this size (set 0 to
disable) (default: 0)
–hallucination_silence_threshold HALLUCINATION_SILENCE_THRESHOLD
When word_timestamps is True, skip silent periods
longer than this threshold (in seconds) when a
possible hallucination is detected (default: None)

VAD filter arguments:
–vad_filter VAD_FILTER
enable the voice activity detection (VAD) to filter
out parts of the audio without speech. This step is
using the Silero VAD model
https://github.com/snakers4/silero-vad. (default:
False)
–vad_threshold VAD_THRESHOLD
when vad_filter is enabled, probabilities above this
value are considered as speech. (default: None)
–vad_min_speech_duration_ms VAD_MIN_SPEECH_DURATION_MS
when vad_filter is enabled, final speech chunks
shorter min_speech_duration_ms are thrown out.
(default: None)
–vad_max_speech_duration_s VAD_MAX_SPEECH_DURATION_S
when vad_filter is enabled, Maximum duration of
speech chunks in seconds. Longer will be split at the
timestamp of the last silence. (default: None)
–vad_min_silence_duration_ms VAD_MIN_SILENCE_DURATION_MS
when vad_filter is enabled, in the end of each
speech chunk time to wait before separating it.
(default: None)

Diarization options:
–hf_token HF_TOKEN HuggingFace token which enables to download the
diarization models. (default: )
–speaker_name SPEAKER_NAME
Name to use to identify the speaker (e.g. SPEAKER_00).
(default: SPEAKER)

Live transcribe options:
–live_transcribe LIVE_TRANSCRIBE
live transcribe mode (default: False)
–live_volume_threshold LIVE_VOLUME_THRESHOLD
minimum volume threshold to activate listening in live
transcribe mode (default: 0.2)
–live_input_device LIVE_INPUT_DEVICE
Set live stream input device ID (see python -m
sounddevice for a list) (default: None)

8. August 2025

Videodateien – Standardformate

Es gibt eine Reihe von standardisierten Video- und Bildformaten, die sich in Pixelmaßen (Auflösung) und Seitenverhältnis (Aspect Ratio) unterscheiden. Je nach Anwendung (Web, PC, Mobile, Streaming) sind bestimmte Formate gängiger als andere.

📺 Gängige Formate nach Seitenverhältnis und Pixelgröße

▶️ 1. 16:9 – Das Standardformat für Video & PC

Sehr verbreitet in Monitoren, Laptops, YouTube, Streaming etc.

Bezeichnung	Auflösung (px)	Verwendung
HD (720p)	1280 × 720	YouTube, ältere Monitore
Full HD (1080p)	1920 × 1080	Standard für Video, Monitore
QHD (1440p)	2560 × 1440	High-End-Monitore
4K UHD	3840 × 2160	Fernseher, High-End-Video
8K UHD	7680 × 4320	Zukunftstechnologie, Profi-Video

📷 2. 4:3 – Früherer PC-Standard / klassische Kameras

Wird noch bei alten Geräten oder bestimmten Bildformaten verwendet.

Bezeichnung	Auflösung (px)	Verwendung
VGA	640 × 480	Alte Monitore
SVGA	800 × 600	Präsentationen, alte Projektoren
XGA	1024 × 768	Ältere Laptops

📱 3. 1:1 – Quadratisch

Sehr beliebt bei Instagram und Social Media.

Bezeichnung	Auflösung (px)	Verwendung
1080 × 1080	1:1	Instagram Posts, Audio-Videos
1500 × 1500	1:1	Druckfähiger quadratischer Content

📱 4. 9:16 – Hochformat für Smartphones

Standard für Reels, TikToks, YouTube Shorts.

Bezeichnung	Auflösung (px)	Verwendung
720 × 1280	9:16 (vertikal)	Stories, Reels, TikTok
1080 × 1920	9:16 (Full HD vertikal)	Mobile-Videos

🖼️ 5. 3:2 – Kamera- & Fotoformat

Klassisches Format aus der Fotografie.

Bezeichnung	Auflösung (px)	Verwendung
6000 × 4000	DSLR-Auflösung	Fotografie, nicht für Video

📊 6. 21:9 – Ultrawide

Für Kinofilme oder spezielle PC-Monitore.

Bezeichnung	Auflösung (px)	Verwendung
2560 × 1080	Ultrawide Full HD	Breite Monitore, Filmproduktion
3440 × 1440	UWQHD	High-End-Monitor

🧭 Orientierung: Was sollte man verwenden?

Anwendung	Empfohlene Auflösung	Seitenverhältnis
YouTube Video (Standard)	1920 × 1080	16:9
Instagram Post	1080 × 1080	1:1
Instagram Story / TikTok	1080 × 1920	9:16
Web-Bild (Blog etc.)	1200 × 675 oder 800 × 600	16:9 oder 4:3
Fullscreen auf PC	1920 × 1080 oder 2560 × 1440	16:9
Kinofilm	3840 × 1600	ca. 2.4:1

Hier ist eine vollständige Liste der gängigen 16:9-Videoformate mit ihren Pixelmaßen (Breite × Höhe), geordnet nach Auflösung:

📺 Gängige 16:9 Videoformate

Bezeichnung	Pixelgröße (Breite × Höhe)	Kurzinfo
SD (Standard Definition)	640 × 360	Niedrige Online-Auflösung
nHD (1/9 von Full HD)	640 × 360	Mobilgeräte, kleine Vorschaubilder
qHD (Quarter HD)	960 × 540	Kleine Displays, ältere Geräte
HD Ready (720p)	1280 × 720	YouTube HD, mobile Geräte
HD+	1600 × 900	Zwischenstufe, manche Monitore
Full HD (1080p)	1920 × 1080	Standard für Video und TV
2K (DCI 2K cropped)	2048 × 1152	DCI-2K auf 16:9 angepasst
QHD / WQHD (1440p)	2560 × 1440	High-End-Monitore, YouTube
Ultra HD / 4K (2160p)	3840 × 2160	4K-Fernsehen, Streaming
5K UHD (16:9)	5120 × 2880	Sehr hohe Auflösung, selten
8K UHD (4320p)	7680 × 4320	Zukunftsstandard, Profi-Video

📌 Hinweis:

DCI-Formate (wie 2K und echtes 4K) haben ein etwas breiteres Seitenverhältnis (ca. 17:9), müssen also für echtes 16:9 beschnitten oder angepasst werden.
Manche 16:9-Formate (wie HD+ oder 5K) sind inoffiziell, aber technisch verbreitet (z. B. durch Monitore oder Kameras).

8. August 2025

ffmpeg – Videodatei skalieren mit schwarzen Balken

Wenn du beim Umwandeln auf 1920×1080 das Seitenverhältnis korrekt erhalten willst, hast du zwei Hauptoptionen:

✅ Option 1: Skalieren mit schwarzen Balken (Letterboxing oder Pillarboxing)

Diese Methode erhält das Seitenverhältnis, füllt aber den Rest mit schwarzen Rändern auf.

ffmpeg -i input.mp4 -vf "scale=w=1920:h=1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2" -c:a copy output.mp4

Erklärung:

scale=w=1920:h=1080:force_original_aspect_ratio=decrease: skaliert so, dass das Seitenverhältnis nicht verzerrt wird und das Bild nicht größer als 1920×1080 wird.
pad=1920:1080:(ow-iw)/2:(oh-ih)/2: füllt die fehlenden Pixel mit schwarzen Rändern (horizontal oder vertikal, je nachdem).
-c:a copy: Audio unverändert übernehmen.

✅ Option 2: Zuschneiden (Cropping)

Diese Methode füllt den Frame exakt mit 1920×1080, aber schneidet dafür Bildinhalte ab.

ffmpeg -i input.mp4 -vf "scale=1920:1080:force_original_aspect_ratio=increase,crop=1920:1024" -c:a copy output.mp4

Erklärung:

scale=1920:1080:force_original_aspect_ratio=increase: skaliert so, dass das Bild komplett die Fläche ausfüllt – aber evtl. zu groß.
crop=1920:1080: schneidet alles ab, was übersteht – das Video hat exakt 1920×1080.

Was solltest du wählen?

Ziel	Methode	Empfehlung
Nichts abschneiden, Balken okay	`scale + pad`	✅ Sicherste Option
Kein Balken, Bild darf beschnitten werden	`scale + crop`	⚠️ Für zentrierte Motive gut
Einfaches Skalieren ohne Rücksicht	`scale=1920:1024`	⚠️ Kann Seitenverhältnis verzerren

Hier ist ein kleines Bash-Skript für Linux (z. B. Linux Mint), das alle .mp4-Dateien im aktuellen Ordner auf 1920×1080 skaliert, dabei das ursprüngliche Seitenverhältnis erhält und schwarze Balken hinzufügt, wo nötig:

📜 `scale_with_bars.sh`

#!/bin/bash

# Zielauflösung
WIDTH=1920
HEIGHT=1080

# Erstelle Ausgabeverzeichnis (optional)
mkdir -p converted

# Schleife über alle MP4-Dateien
for file in *.mp4; do
  # Ausgabe-Dateiname
  outfile="converted/${file%.*}_scaled.mp4"

  echo "Verarbeite: $file → $outfile"

  ffmpeg -i "$file" \
    -vf "scale=w=${WIDTH}:h=${HEIGHT}:force_original_aspect_ratio=decrease,pad=${WIDTH}:${HEIGHT}:(ow-iw)/2:(oh-ih)/2" \
    -c:a copy "$outfile"
done

echo "Fertig!"

🛠️ So benutzt du das Skript:

Öffne ein Terminal im Ordner mit deinen .mp4-Dateien.
Erstelle das Skript: nano scale_with_bars.sh
Füge den obigen Code ein, speichere mit Strg + O, beende mit Strg + X.
Mache es ausführbar: chmod +x scale_with_bars.sh
Starte es: ./scale_with_bars.sh

📁 Die skalierten Videos werden im Unterordner converted/ abgelegt, und haben das Namensmuster dateiname_scaled.mp4.

8. August 2025

ffmpeg – Hochauflösende Videodatei herunterskalieren
Wenn du eine sehr hochauflösende Videodatei herunterskalieren möchtest auf 1920×1080, lautet der grundlegende ffmpeg-Befehl:
```
ffmpeg -i input.mp4 -vf scale=1920:1080 -c:a copy output.mp4
```
Erklärung der Optionen:
- -i input.mp4: Gibt die Eingabedatei an.
- -vf scale=1920:1080: Wendet einen Video-Filter (-vf) an, der das Video auf die gewünschte Auflösung skaliert.
- -c:a copy: Kopiert die Audio-Spur unverändert (spart Zeit und erhält die Originalqualität).
- output.mp4: Der Name der Ausgabedatei.
Optional: Qualität erhalten

Wenn du eine bessere Qualität beim Skalieren möchtest, kannst du zusätzlich den Videocodec und die Bitrate manuell angeben:
```
ffmpeg -i input.mp4 -vf scale=1920:1080 -c:v libx264 -crf 20 -preset slow -c:a aac -b:a 192k output.mp4
```
Zusätzliche Optionen:
- -c:v libx264: Nutzt den H.264-Encoder (Standard, sehr kompatibel).
- -crf 20: Qualitätsfaktor (0 = beste Qualität, 23 = Standard, kleiner = besser).
- -preset slow: Bessere Kompression (langsamer, aber effizienter).
- -c:a aac -b:a 192k: Wandelt Audio in AAC um mit 192 kbit/s (wenn du es nicht nur kopieren willst).
Wenn du zwei Dateien automatisch in 1920×1024 umwandeln willst:
```
for f in *.mp4; do
  ffmpeg -i "$f" -vf scale=1920:1080 -c:a copy "scaled_$f"
done
```
Dies skaliert alle .mp4-Dateien im Ordner und speichert sie als scaled_*.mp4.
8. August 2025
ffmpeg – Grundlegende Befehle
🔁 1. Video konvertieren (Format ändern)
```
ffmpeg -i input.avi output.mp4
```
Konvertiert ein Video von AVI nach MP4 (Standard-Codec: H.264 + AAC).

🔊 2. Nur Audio extrahieren
```
ffmpeg -i input.mp4 -q:a 0 -map a output.mp3
```
Extrahiert die Tonspur als MP3 in bester Qualität.

🎞️ 3. Video skalieren (Auflösung ändern)
```
ffmpeg -i input.mp4 -vf scale=1920:1080 output.mp4
```
Skaliert das Video auf Full HD (1920×1080).

🕒 4. Video schneiden (z. B. von 00:01:00 bis 00:02:00)
```
ffmpeg -ss 00:01:00 -i input.mp4 -t 00:01:00 -c copy output.mp4
```
Schneidet einen 1-minütigen Abschnitt ohne Neukodierung.

🖼️ 5. Einzelbilder aus Video extrahieren
```
ffmpeg -i input.mp4 -vf fps=1 frame_%04d.png
```
Speichert ein Bild pro Sekunde aus dem Video als PNG-Dateien.

📸 6. Bildsequenz zu Video zusammenfügen
```
ffmpeg -framerate 25 -i frame_%04d.png -c:v libx264 -pix_fmt yuv420p output.mp4
```
Erzeugt aus Bilddateien ein Video mit 25 FPS.

🎛️ 7. Video mit Balken auf festes Seitenverhältnis bringen
```
ffmpeg -i input.mp4 -vf "scale=1920:1024:force_original_aspect_ratio=decrease,pad=1920:1024:(ow-iw)/2:(oh-ih)/2" output.mp4
```
Erhält Seitenverhältnis und fügt bei Bedarf schwarze Balken hinzu.

✂️ 8. Video beschneiden (croppen)
```
ffmpeg -i input.mp4 -filter:v "crop=1280:720:100:50" output.mp4
```
Schneidet einen Bereich der Größe 1280×720 aus, Beginn bei (x=100, y=50).

🔇 9. Ton entfernen
```
ffmpeg -i input.mp4 -an output.mp4
```
Entfernt die Audio-Spur.

🔁 10. Video stumm abspielen und dann Audio ersetzen
```
ffmpeg -i video.mp4 -i audio.mp3 -c:v copy -map 0:v:0 -map 1:a:0 output.mp4
```
Ersetzt die Original-Tonspur durch eine neue Audiodatei.

📝 Hinweis:

Wenn du etwas häufig machst, lohnt sich ein Shell-Skript oder alias in der ~/.bashrc. Beispiel:
```
alias mp3extract='ffmpeg -i "$1" -q:a 0 -map a "${1%.*}.mp3"'
```
Dann kannst du einfach schreiben:
```
mp3extract video.mp4
```
7. August 2025
ffmpeg – Beschreibung
ffmpeg ist ein freies, plattformübergreifendes Kommandozeilen-Tool zur Verarbeitung von Audio- und Videodateien. Es gehört zu den mächtigsten und flexibelsten Werkzeugen in diesem Bereich.

🔧 Was kann ffmpeg?
- Videos konvertieren (z. B. von AVI nach MP4)
- Audios extrahieren oder umwandeln (z. B. MP3 aus MP4)
- Videos skalieren, schneiden, zusammenfügen
- Bildsequenzen in Videos umwandeln (oder umgekehrt)
- Livestreams aufzeichnen oder umcodieren
- Untertitel einbetten oder extrahieren
- Filter anwenden (z. B. Schärfen, Logo einfügen)
🧱 Technischer Hintergrund

ffmpeg basiert auf der libavcodec-Bibliothek, die viele bekannte Codecs unterstützt, z. B.:
- H.264 / H.265 (Video)
- AAC / MP3 / FLAC (Audio)
- VP9, AV1, ProRes u. v. m.
🖥️ Anwendung

Die Bedienung erfolgt über die Kommandozeile (Terminal), z. B.:
```
ffmpeg -i eingabe.avi -c:v libx264 -crf 23 ausgabe.mp4
```
→ Konvertiert ein AVI-Video nach MP4 mit H.264-Encoding.

🟢 Vorteile
- Sehr flexibel & leistungsfähig
- Unterstützt extrem viele Formate
- Ideal für Automatisierung und Skripte
- Kostenlos und quelloffen (Open Source)
7. August 2025

Autor: Gregor Lang

Für Englisch:

Für Deutsch:

Für Portugiesisch:

Weitere Sprachen (Stand 8. August 2025):

Aufrufen der Hilfefunktion:

whisper-ctranslate2 –help

📺 Gängige Formate nach Seitenverhältnis und Pixelgröße

▶️ 1. 16:9 – Das Standardformat für Video & PC

📷 2. 4:3 – Früherer PC-Standard / klassische Kameras

📱 3. 1:1 – Quadratisch

📱 4. 9:16 – Hochformat für Smartphones

🖼️ 5. 3:2 – Kamera- & Fotoformat

📊 6. 21:9 – Ultrawide

🧭 Orientierung: Was sollte man verwenden?

📺 Gängige 16:9 Videoformate

📌 Hinweis:

✅ Option 1: Skalieren mit schwarzen Balken (Letterboxing oder Pillarboxing)

Erklärung:

✅ Option 2: Zuschneiden (Cropping)

Erklärung:

Was solltest du wählen?

📜 scale_with_bars.sh

🛠️ So benutzt du das Skript:

Erklärung der Optionen:

Optional: Qualität erhalten

Zusätzliche Optionen:

🔁 1. Video konvertieren (Format ändern)

🔊 2. Nur Audio extrahieren

🎞️ 3. Video skalieren (Auflösung ändern)

🕒 4. Video schneiden (z. B. von 00:01:00 bis 00:02:00)

🖼️ 5. Einzelbilder aus Video extrahieren

📸 6. Bildsequenz zu Video zusammenfügen

🎛️ 7. Video mit Balken auf festes Seitenverhältnis bringen

✂️ 8. Video beschneiden (croppen)

🔇 9. Ton entfernen

🔁 10. Video stumm abspielen und dann Audio ersetzen

📝 Hinweis:

🔧 Was kann ffmpeg?

🧱 Technischer Hintergrund

🖥️ Anwendung

🟢 Vorteile

📜 `scale_with_bars.sh`

🕒 4. Video schneiden (z. B. von 00:01:00 bis 00:02:00)