如何提取mp4字幕文件视频文件的字幕

[转载]从TS文件中提取DVB&Subtitle字幕的有效方法
Subtitling标准的多语种可消隐字幕已大量运用于欧洲数字电视广播系统,而我国大陆地区在此方面的运用还未见报道。DVB
Subtitling系统操作较为复杂,昂贵,笔者已找到一种将最简单的Srt字幕文件封装到TS流中传输、解码、显示的方法,所有功能与欧洲DVB
Subtitling标准基本一致,可惜暂不兼容欧洲标准;而我国DVB字幕标准未见正式颁布,在此,先推出一篇短文,介绍DVB
Subtitling字幕的提取办法,供有兴趣者参考。在适当的时候,笔者将介绍另外一种很简单的适合国情的DVB字幕系统的解决方案。
关键词:DVB
Subtitling& Close
Caption& 可消隐字幕
Subtitling3S4000MHzDVB
SubtitlingClose
CaptionOpen
Subtitles属可消隐字幕,而可消隐字幕在DVD光盘时代就已经很常见了,它可以保持图像字幕的相互独立性,便于节目交流及视频内容的再利用,比传统的开放型内嵌字幕要先进。
DVBSubtitles可同时封装多条字幕轨道,由用户自由选择打开或关闭,可满足观众不同语言字幕的需要。
DVB SubtitlingDVB
SubtitlesTSDVB
图3、用码流分析软件可看出TS流中有两条DVB
Subtitles字幕。
VLC media player Potplayer
DVB SubtitlesDVB
IDX/SUB DVBProjectXDVBSub2TextBDSup2subDVBSUBDVBSub2Tex100%SrtBDSup2subProjectXDVB
SubtitlesDVB
使用DVBSub2Text提取英文或拉丁文字幕
DVBSub2Text,是一款免费的DVB转文本字幕的软件
DVBSub2TextDVBSub2TextOCR
DVBSub2Text主界面
MPEG -TSDVB
图6、选择要分离的DVB
TS流,打开
TSDecoded&
Image”()DVB
SubtitlesDVB
图7、 可以看到左边框中出现视频音频轨道及三条字幕轨道,字幕轨道的
PID分别为845、846(实际上第二条与第三条是一样的,都是英文字幕),中间框变为绿色,说明有有效的DVB
Subtitles字幕。
图8、将不需要的字幕轨道的勾选框取消
图9、在要保留的字幕勾选框里打钩,再点击Start,即开始进行OCR识别与转换
图10、OCR识别系统开始工作,中间屏幕出现字幕,右边白色方框内即出现提取出来的时间轴与台词字幕。大家可以发现,这完全是Srt格式的字幕。
图11、字幕识别结束后,按Save将提取出来的字幕存为Srt文件。
图12、保存字幕文件到指定文件夹
图13、用记事本直接打开提取出来的字幕,可以看到它们是按Srt格式排列
用记事本直接打开,可以看到我们已经得到一个完整的Srt文件,英文字母的识别率接近100%(当然也偶有错误,故最好检查一遍),但有一些特殊符号,如分隔行的“-”,识别出来的却变成,我们需要用替换的办法把它替换为“-”。在记事本中点击编辑-替换-将替换为-,最后点击“全部替换”,即可得到一个合格的Srt字幕文件。如果替换效果不佳,还需要人工检查一下,将乱码的符号删除或替换掉。
图14、对乱码进行替换处理
图15、至此,一个合格的DVB英文字幕Srt文件就很快提取出来了。
DVB SubtitlesDVBSrtSrt
SrtSrtEdit2012
使用ProjectX提取中文字幕
DVBSub2TextOCR
DVBSub2TextDVBProjectX_0.91.0JAVAJAVA
ProjectX_0.91.0.zip
ProjectX_0.91.0
图16、进入ProjectX文件夹中,单击ProjectX.jar,用右键选打开方式为JAVA,即可成功启动ProjectX。不要直接双击,否则可能会出现错误。
图17、图上声明告诉我们这是一款免费软件,我们点击“I
agree(同意)”即可进入下一步。
图18、点击左上角File&Add(添加),在电脑中的指定位置选择我们要提取字幕的TS流
图19、选择需要提取字幕的TS流,按“Select”.
图20、打开后在下方出现加载的TS流的文件名,再点击这个文件,上方框内即出现TS流中所包含的视频、音频和字幕轨道。我们可以看出一条字幕轨道为CHI(实际上是繁体中文),另一条为CHS(实际上是英文)
点击FilterControl,进入下一选项
图22、因为我们只需要分离字幕文件,而不需要分离其他视频及音频文件,故在右侧只选“Subpicture”(图形字幕),其他选项的勾选取消。然后点击“more
settings”(更多设置)
图23、按图例点击左侧“subtitil”(字幕),再勾选右下角“additional
export as Vob
Sub(idx+sub)”,将字幕文件转换为idx+sub文件。Idx内含时间轴,sub为对应的字幕图像。读者如果有时间和兴趣,还可以改变其它选项,提取出来的字幕效果可能更好。
图24、关闭设定,点击左上角QuickStart(快速开始)
图25、开始提取字幕
图26、在同一个文件夹中即可看到提取出来的两条轨道的字幕文件和附件。其中,对我们有用的是sub及idx文件。Idx为时间戳文件,sub为对应的图形字幕文件。Idx+sub文件字幕可以被一些播放软件直接识别与解码,但它的字形及颜色往往比较难看,所以我们还是要再花点精力继续下一步转换工作。
OCRIdxSubOcrOCROffice20032007MODI
OfficesMODIOffice27OfficeOCR
图27,在安装Office时,在Office工具选项中,从子选项Microsoft
Office Document
Imaging中,选“从本机运行全部程序”,然后进入下一步安装。安装全部结束后,还需要到微软官方网站免费下载SP1或SP2、SP3补丁,否则仍不能正确OCR中文字幕(很麻烦,是吗?)。另外,IdxSubOcr对英文字幕的支持不是很好(主要是空格往往不能正确识别,要花较长时间校对),故不推荐使用它来识别英文字幕。另外,它对朝鲜语(或者说是韩文)的识别也很差,有待改进。
IdxSubOcr软件
IdxSubOcrsub.idxOCR
图29、选择我们要打开的idx文件。这里有两个文件,一个是繁体中文,一个是英文。我们只需转换中文字幕文件,英文字幕虽然也可以OCR,但识别率较低,速度也较慢,不建议采用此方法来识别DVB
Subtitles的英文字幕。
图30、选择需要转换的字幕,按确定进入下一步。
图31、点击“是”,进入下一步
图32因为我们事先得知是繁体中文字幕,故在OCR语言类选中文繁体。
,OCR开始,将图形字幕识别后转变为文本格式字幕,并分离出时间轴。
图34、根据提取出来的Srt字幕,重新制作新的字幕,可以做得比原版字幕漂亮多了,可以随心所欲制作出各种特效、各种颜色的字幕。
DVBTSTStsMuXerGuiTSDVB
使用BDSup2sub将DVB
Subtlties字幕转换为可再利用的图形字幕
SubtitlingDVB
SubtitlingDVB
Subtitling
BDSup2subProjectX,将其从DVB
Subtitles流中IDX+SubSupPGSTSDVB
DVB Subtitles
图35、该方案提取出来的DVB
Subtitles字幕,直接使用原版字幕的图形,与原版字幕的效果接近(见图1),对发烧友家庭使用还是绰绰有余的。这是一种简单的DVB
Subtitles字幕再利用方法,转换非常快捷,不需校正字幕,特别适用于要求不是很高的场合使用(其实作为备选字幕,在专业广播电视领域使用也未尝不可)。而对难以OCR和编辑的日语韩国语泰语等小语种字幕,这种字幕提取和再利用的方法是特别简单有效的,效率很高。
&下面,我们来介绍一下怎样用BDSup2sub软件,把ProjectXDVB
Subtitles流中IDX+SubSup
图36、打开BDSup2sub软件,选“File(文件)”—“Load(载入)”
图37、选择ProjectX从DVB的TS流中提取出来的后缀为idx字幕文件。
图38、载入文件
图39、系统会提示你设置字幕大小。如果节目源为PAL制式标清,选720&576,高清则选1080P。这个选项可以改变字幕的大小,读者不妨多测试几种参数,看哪个更适合。特别注意的是,FPS
Target一栏,默认电视节目选25,但如果播放的是电影节目往往要选23.976,否则字幕的时间轴越往后越被拖长,容易导致字幕失去同步。
图40、字幕显示出来了,但颜色不好(我也奇怪为何在电视上显示正常的DVB
Subtitling白色字幕,提取出来后颜色变得乱七八糟的),作为字幕播出需要进行调整。请按此图上面板各选项的配置,当然你有时间也可以选其他配置玩玩,会有不同效果,可能字幕会变得更漂亮,也可能更丑陋。
一般对于同一家DVB广播电视运营商来说,字幕参数是一致的,我们选择好各选项参数,使展现出来的字幕为最漂亮的状态后,保存起来,以后一直使用该参数转换即可。
特别需要注意的是,如果我们要把提取出来的字幕用于DVB广播或蓝光字幕制作上,那么输出格式(output
Format)一定要选蓝光专用的Sup(BD),也就是PGS格式字幕。
图42、点击左上角编辑(Edit),选Edit
DVD Frame这个栏目(调色板),可以对图形字幕做简单调整
图43、Color1为字幕背景色,直接在右侧Alp(颜色深度)栏上选00,即为透明背景(难道有人需要不透明的背景?不解);
图44、Color2为字体边色,按需要选(一般选黑色边),Alp(颜色深度)自选;Color3为字体边色厚度,自由选择;Color4最重要,为字体底色,一般选白色,也可以选其他喜欢的颜色。但如果用在DVB广播中,最好只选白色,因为很多机顶盒字幕不支持彩色,会显示为灰色字幕,很难看。有时不同的电视节目字幕颜色层会不一样,那就只好一个一个试了。
图45、选好颜色后再点击“Set
选择全部字幕,将它们都调成白色(或其他颜色),最后点击OK,字幕颜色和大小即改过来了。
目前笔者接触到的,只有香港地面数字电视广播上的DVB
Subtitles,它们的字幕参数基本上是一样的,设置一次就行了。不知其他地方不同运营商的字幕是否也是如此?如果不一样,以上参数还是需要另行调整,有兴趣者自己去摸索吧。
图46、将调整好的字幕进行存储,存为Export文件
图47、文件名后缀为.sup_exp.sup,这个文件是包含了时间轴的图形字幕文件,可应用于蓝光字幕或DVB字幕封装。
图48、转换过程很快。
图49、我们在目标文件夹中看到多出来的后缀为.sup_exp.sup的文件,就是我们转换过来的DVB
Subtitles文件。这个包含了时间轴的图形字幕文件可以直接用于DVB字幕封装或蓝光字幕封装,也可以作为外挂的图形字幕使用。
Potplayer播放软件是韩国人编写的一套免费软件,几乎全部兼容显示目前所知的各种流行文本及图形字幕,是目前所知字幕兼容功能最好的播放软件,比VLC、DVB
Viewer等主流软件效果好多了。我们在测试中,以Potplayer播放软件为演示主体。
图50,用Potplayer等播放软件打开视频后,加载sup_exp.sup字幕文件,可收看到我们封装的图形字幕了,跟原版的DVB
Subtitles效果非常接近了(参看图1,当然,跟图34我们重新制作的字幕相比,还是有些差距。)。其实别说是用在家庭环境,就是用在广播电视领域也是过得去的,它的优点是转换效率很高。
我们提取的英文图形字幕显示效果,跟图2的DVB
Subtitling原版字幕相比,区别已经不大了。
DVB SubtitleDVB
SubtitleDVB
Subtitlesup_exp.supTSDVB
SubtitlingTSDVBSrtAssDVBTSDVB-CDTMBDVB
Subtitling
以上网友发言只代表其个人观点,不代表新浪网的观点或立场。1366人阅读
基础知识(9)
& 上篇文章《VB编写程序实现视频外挂SRT字幕》(),只写了解决SRT字幕文件。这次把三种格式(SRT,SSA,ASS)都柔和在一起,没有分别编写,在上篇文章基础上直接改的。SSA和ASS特效代码有很多相同的地方,其中ASS特效代码80%与SSA相同,只提取VB能够处理的基本信息,其余的特效代码VB不好办,全部扔掉。在试验中发现有的字幕文件编码是Uncode格式,有的是UTF-8编码格式,有的是Ansi编码格式,在这方面作了识别,以便VB能够正确处理。下面是解析这三种字幕文件的全部程序:
Option Explicit
Private Declare Function MultiByteToWideChar Lib "kernel32.dll" (ByVal CodePage As Long, ByVal dwFlags As Long, ByRef lpMultiByteStr As Any, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
Private Const UTF8FT As Long = 65001 '代码页
'SRT,SSA,ASS字幕提取,SubtitlesFilesName为字幕文件名
Public Function SubtitlesFileAnalysis(ByVal SubtitlesFilesName As String) As Long
Dim GetFileExtendName As String '取得文件的扩展名
Dim FileByte() As Byte '存放字幕文件字节
Dim strBuffer As String, UTF8Size As Long, BufferSize As Long, UTF8Result As Long
Dim LineCount As Long, FileNumber As Long, TimeLenth As Long, TimeNumber As Long
Dim Stitle As String, TimeLabel(1 To 2) As String, St As String, StC As String
Dim Fbyte() As Byte, Temp() As String, sTime() As String, TT() As String, sTT() As String '临时数组
Dim tStyle() As String, sStyle() As String '缺省的字幕信息和自定义字幕信息
Dim V4StyleFormat As String, EventsFormat As String '缺省的字幕格式和对话事件格式
Dim sName As String, sFontname As String, sFontsize As String, sPrimaryColour As String, sBold As String, sItalic As String, sUnderline As String '从V4+Styles中取出的字体信息
Dim DefaultFontIfo As String '缺省的字体信息
Dim StartPosition As Long ', EndPo As Long, StylePo As Long, TextPo As Long '记录开始时间,结束时间,样式,字幕文本起始位置
Dim sHour As Long, sMunite As Long, sSecond As Long, s1 As Long, s2 As Long, s3 As Long, s4 As Long
Dim GotTime As Boolean
On Error Resume Next
'缺省的字体信息:fn字体类型,fs字体大小,fc字体颜色,fp字体位置,fi斜体,fu下划线,fb粗体
DefaultFontIfo = "[fn=黑体][fs=24][fc=&HFFFFFF][fp=2][fi=0][fu=0][fb=0] "
sUnderline = "0"
If Dir(SubtitlesFilesName) = "" Or FileLen(SubtitlesFilesName) = 0 Then SubtitlesFileAnalysis = 0: Exit Function '字幕不存在即退出
GetFileExtendName = Mid$(SubtitlesFilesName, 1 + InStrRev(SubtitlesFilesName, "."), Len(SubtitlesFilesName) - InStrRev(SubtitlesFilesName, "."))
Debug.Print GetFileExtendName
'确定字幕文件编码格式,并将字幕文本导入到临时数组TT中
FileNumber = FreeFile
Open SubtitlesFilesName For Binary As #FileNumber
ReDim FileByte(LOF(FileNumber) - 1), Fbyte(LOF(FileNumber) - 1)
Get #FileNumber, , FileByte
Close #FileNumber
If (Hex$(FileByte(0)) = "FF" And Hex$(FileByte(1)) = "FE") Or (Hex$(FileByte(0)) = "FE" And Hex$(FileByte(1)) = "FF") Then
If Hex$(FileByte(0)) = "FF" Then
'字幕文件为Unicode(Little Endian)编码
Stitle = StrConv(FileByte, vbNarrow)
If Hex$(FileByte(0)) = "FE" Then
'字幕文件为Unicode Big Endian编码
For s1 = 0 To UBound(FileByte)
If s1 Mod 2 = 0 Then
Fbyte(s1) = FileByte(s1 + 1)
Fbyte(s1) = FileByte(s1 - 1)
Stitle = StrConv(Fbyte, vbNarrow)
Mid$(Stitle, 1, 1) = " ": Stitle = Trim(Stitle)
If (Hex$(FileByte(0)) = "EF" And Hex$(FileByte(1)) = "BB" And Hex$(FileByte(2)) = "BF") Then
'字幕文件为UTF-8编码
UTF8Size = UBound(FileByte) + 1
BufferSize = MultiByteToWideChar(UTF8FT, 0&, FileByte(0), UTF8Size, 0&, 0&)
strBuffer = String$(BufferSize, vbNullChar)
UTF8Result = MultiByteToWideChar(UTF8FT, 0&, FileByte(0), UTF8Size, StrPtr(strBuffer), BufferSize)
If UTF8Result & 0 Then
Stitle = Left$(strBuffer, UTF8Result)
Mid$(Stitle, 1, 1) = " ": Stitle = Trim(Stitle)
Exit Function
'字幕文件为Ansi编码,或其它非Unicode、UTF-8编码格式
Stitle = StrConv(FileByte, vbUnicode)
Erase FileByte, Fbyte '释放动态数组所使用的内存
Temp = Split(Stitle, vbCrLf) '导出的全部字幕文本
'扫描对话字幕个数并定义字幕数组
LineCount = 1: s2 = 0
If LCase(GetFileExtendName) = "srt" Then
For s1 = 0 To UBound(Temp)
If InStr(1, Temp(s1), "--&") & 0 Then LineCount = LineCount + 1
SubtitlesFileAnalysis = LineCount - 1 '确定srt字幕个数
ReDim Subtitles(1 To LineCount - 1) '定义srt字幕数组
'ssa,ass字幕
If LCase(GetFileExtendName) = "ssa" Or LCase(GetFileExtendName) = "ass" Then
For s1 = 0 To UBound(Temp)
If InStr(1, Temp(s1), "Dialogue:") & 0 Then LineCount = LineCount + 1
If InStr(1, Temp(s1), "Style:") & 0 Then s2 = s2 + 1
If InStr(1, Temp(s1), "Format:") & 0 Then
If InStr(1, Temp(s1), "Encoding") & 0 Then
V4StyleFormat = Trim(Replace(Temp(s1), "Format:", Space(7)))
If InStr(1, Temp(s1), "Marked") & 0 Or InStr(1, Temp(s1), "Layer") & 0 Then
EventsFormat = Temp(s1)
SubtitlesFileAnalysis = LineCount - 1 '确定ssa,ass字幕个数
ReDim Subtitles(1 To LineCount - 1) '定义ssa,ass字幕数组
ReDim sStyle(1 To s2)
'定义style预设样式的个数
ReDim tStyle(1 To LineCount - 1) '字幕中使用的预设样式名
If SubtitlesFileAnalysis = 0 Then Exit Function
'取得ssa,ass字幕预设样式
If LCase(GetFileExtendName) = "ssa" Or LCase(GetFileExtendName) = "ass" Then
For s1 = 0 To UBound(Temp)
If InStr(1, Temp(s1), "Style:") & 0 Then s2 = s2 + 1: sStyle(s2) = Trim(Replace(Temp(s1), "Style:", Space(6)))
If InStr(1, Temp(s1), "[Events]") & 0 Then Exit For
'提取srt对话字幕和显示时间
If LCase(GetFileExtendName) = "srt" Then
LineCount = 1: GotTime = False
For s1 = 0 To UBound(Temp)
Stitle = Trim(Temp(s1)) '提取srt字幕标签
If GotTime Then '取到一个时间标签后,记录当前时间标签下的所有字幕文本
If Len(Stitle) = 0 Or IsNumeric(Stitle) = True Then '碰到一空行或代表字幕个数的数字时
'进入下一个字幕之前,记录当前字幕
GotTime = False '已经提取完第LineCount个字幕
If Len(St) = 0 Then St = Space(10) ': Debug.Print Stitle
Subtitles(LineCount) = TimeLabel(1) & " [StartTime] " & TimeLabel(2) & " [EndTime] " & St '加上时间标签头
Subtitles(LineCount) = Trim(Left(Subtitles(LineCount), Len(Subtitles(LineCount)) - 6)) '去掉最后一个&-CRLF-&
St = "": LineCount = LineCount + 1 '记录字幕序号
If Len(Stitle) && 0 Then St = St & (Stitle & "-CRLF-") 'CRLF回车换行
'判断并提取时间标签
TimeLenth = InStr(1, Stitle, "--&")
If TimeLenth & 0 Then
TT = Split(Stitle, "--&")
TimeLabel(1) = TT(0) '字幕显示开始时间
sTime = Split(TimeLabel(1), ":")
sHour = CLng(sTime(0)) * 3600000
sMunite = CLng(sTime(1)) * 60000
sSecond = CLng(sTime(2))
TimeLabel(1) = CStr(sHour + sMunite + sSecond)
TimeLabel(2) = TT(1) '字幕显示结束时间
sTime = Split(TimeLabel(2), ":")
sHour = CLng(sTime(0)) * 3600000
sMunite = CLng(sTime(1)) * 60000
sSecond = CLng(sTime(2))
TimeLabel(2) = CStr(sHour + sMunite + sSecond)
GotTime = True '已经取到一个时间标签
'提取最后一个字幕
Subtitles(LineCount) = TimeLabel(1) & " [StartTime] " & TimeLabel(2) & " [EndTime] " & Subtitles(LineCount) & St '加上时间标签头
Subtitles(LineCount) = Trim(Left(Subtitles(LineCount), Len(Subtitles(LineCount)) - 6))
'提取ssa,ass对话字幕和显示时间
If LCase(GetFileExtendName) = "ssa" Or LCase(GetFileExtendName) = "ass" Then
TT = Split(EventsFormat, ",")
LineCount = 0
For s1 = 0 To UBound(Temp)
If InStr(1, Temp(s1), "Dialogue:") & 0 Then
LineCount = LineCount + 1
sTT = Split(Temp(s1), ",")
For s2 = 0 To UBound(TT)
If Trim(TT(s2)) = "Start" Then
TimeLabel(1) = Trim(sTT(s2)) '字幕显示开始时间
sTime = Split(TimeLabel(1), ":")
sHour = CLng(sTime(0)) * 3600000
sMunite = CLng(sTime(1)) * 60000
sSecond = CLng(Val(sTime(2)) * 1000)
TimeLabel(1) = CStr(sHour + sMunite + sSecond)
If Trim(TT(s2)) = "End" Then
TimeLabel(2) = Trim(sTT(s2)) '字幕显示结束时间
sTime = Split(TimeLabel(2), ":")
sHour = CLng(sTime(0)) * 3600000
sMunite = CLng(sTime(1)) * 60000
sSecond = CLng(Val(sTime(2)) * 1000)
TimeLabel(2) = CStr(sHour + sMunite + sSecond)
If Trim(TT(s2)) = "Style" Then
tStyle(LineCount) = Trim(sTT(s2)) '字幕预设样式
If Trim(TT(s2)) = "Text" Then
St = Temp(s1)
Mid$(St, 1, InStr(1, Temp(s1), sTT(s2)) - 1) = Space(InStr(1, Temp(s1), sTT(s2)) - 1)
St = Trim(St)
Subtitles(LineCount) = tStyle(LineCount) & " [Style] " & TimeLabel(1) & " [StartTime] " & TimeLabel(2) & " [EndTime] " & St '加上时间标签头
Erase Temp, sTime, TT, sTT, tStyle '释放动态数组所使用的内存
'检查SRT,SSA,ASS字幕特效代码
For LineCount = 1 To UBound(Subtitles)
'检查换行符:&br&,/N;空格符:/n,/h。这样的符号一行字幕可能有多个。
s1 = InStr(1, Subtitles(LineCount), "/N")
If s1 & 0 Then
Temp = Split(Subtitles(LineCount), "/N")
For s2 = 0 To UBound(Temp)
If Len(Trim(Temp(s2))) && 0 Then St = St & (Temp(s2) & "-CRLF-")
Subtitles(LineCount) = St
Subtitles(LineCount) = Trim(Left(Subtitles(LineCount), Len(Subtitles(LineCount)) - 6)) '去掉最后一个&-CRLF-&
s1 = InStr(1, Subtitles(LineCount), "&br&")
If s1 & 0 Then
Temp = Split(Subtitles(LineCount), "&br&")
For s2 = 0 To UBound(Temp)
If Len(Trim(Temp(s2))) && 0 Then St = St & (Temp(s2) & "-CRLF-")
Subtitles(LineCount) = St
Subtitles(LineCount) = Trim(Left(Subtitles(LineCount), Len(Subtitles(LineCount)) - 6)) '去掉最后一个&-CRLF-&
s1 = InStr(1, Subtitles(LineCount), "/n")
If s1 & 0 Then
Temp = Split(Subtitles(LineCount), "/n")
For s2 = 0 To UBound(Temp)
If Len(Trim(Temp(s2))) && 0 Then St = St & Temp(s2)
Subtitles(LineCount) = Trim(St)
s1 = InStr(1, Subtitles(LineCount), "/h")
If s1 & 0 Then
Temp = Split(Subtitles(LineCount), "/h")
For s2 = 0 To UBound(Temp)
If Len(Trim(Temp(s2))) && 0 Then St = St & Temp(s2)
Subtitles(LineCount) = Trim(St)
Erase Temp '释放动态数组所使用的内存
'初始化缺省的字体信息DefaultFontIfo,其余特效代码一律扔掉。
If InStr(1, Subtitles(LineCount), "-CRLF-") & 0 Then
Temp = Split(Subtitles(LineCount), "-CRLF-")
s2 = InStr(1, Temp(0), "[EndTime]")
St = Left$(Temp(0), s2 + 8) & DefaultFontIfo & Right$(Temp(0), Len(Temp(0)) - s2 - 8)
For s1 = 1 To UBound(Temp)
St = St & (" -CRLF- " & DefaultFontIfo & Temp(s1)) '& " -CRLF- ")
s2 = InStr(1, Subtitles(LineCount), "[EndTime]")
If Len(Subtitles(LineCount)) - s2 - 8 &= 0 Then
St = Left$(Subtitles(LineCount), s2 + 8) & DefaultFontIfo & Right$(Subtitles(LineCount), Len(Subtitles(LineCount)) - s2 - 8)
Subtitles(LineCount) = St: Erase Temp '初始化完毕Subtitles(LineCount),'释放动态数组Temp所使用的内存
If LCase(GetFileExtendName) = "ssa" Or LCase(GetFileExtendName) = "ass" Then
TT = Split(V4StyleFormat, ",")
For s1 = 1 To UBound(sStyle)
sTT = Split(sStyle(s1), ",")
For s2 = 0 To UBound(TT)
If Trim(TT(s2)) = "Name" Then sName = Trim(sTT(s2)) '取出样式名称
If Trim(TT(s2)) = "Fontname" Then sFontname = Trim(sTT(s2)) '取出字体名称
If Trim(TT(s2)) = "Fontsize" Then sFontsize = Trim(sTT(s2)) '取出字体大小
If Trim(TT(s2)) = "PrimaryColour" Then sPrimaryColour = "&H" & Hex$(CLng(sTT(s2))) '取出主体字体颜色
If Trim(TT(s2)) = "Bold" Then sBold = CStr(Abs(sTT(s2))) '取出字体粗体设置
If Trim(TT(s2)) = "Italic" Then sItalic = CStr(Abs(sTT(s2))) '取出字体斜体设置
If Trim(TT(s2)) = "Underline" Then sUnderline = CStr(Abs(sTT(s2)))
'取出字体下划线设置
StartPosition = InStr(1, Subtitles(LineCount), "[Style]")
St = Mid$(Subtitles(LineCount), 1, StartPosition - 1)
If (sName Like St) = True Or (InStr(1, St, sName) & 0) Then
Subtitles(LineCount) = Inifontifo(Subtitles(LineCount), DefaultFontIfo, "[fn=", sFontname)
Subtitles(LineCount) = Inifontifo(Subtitles(LineCount), DefaultFontIfo, "[fs=", sFontsize)
Subtitles(LineCount) = Inifontifo(Subtitles(LineCount), DefaultFontIfo, "[fc=", sPrimaryColour)
Subtitles(LineCount) = Inifontifo(Subtitles(LineCount), DefaultFontIfo, "[fb=", sBold)
Subtitles(LineCount) = Inifontifo(Subtitles(LineCount), DefaultFontIfo, "[fi=", sItalic)
Subtitles(LineCount) = Inifontifo(Subtitles(LineCount), DefaultFontIfo, "[fu=", sUnderline)
Mid$(Subtitles(LineCount), 1, StartPosition + 7) = Space(StartPosition + 7)
'Debug.Print "Subtitles(" & LineCount & ") =" & Trim(Subtitles(LineCount))
'检查所用字体、字号、颜色、位置等信息
For LineCount = 1 To UBound(Subtitles)
If InStr(1, Subtitles(LineCount), "-CRLF-") & 0 Then
'以下是多行字幕情形
Temp = Split(Subtitles(LineCount), "-CRLF-")
For s1 = 0 To UBound(Temp)
St = "/fn": StC = "[fn="
Temp(s1) = Insert_String(Temp(s1), St, StC, 5)
St = "/fs": StC = "[fs="
Temp(s1) = Insert_String(Temp(s1), St, StC, 5)
St = "/c": StC = "[fc="
Temp(s1) = Insert_String(Temp(s1), St, StC, 11)
St = "/a": StC = "[fp="
Temp(s1) = Insert_String(Temp(s1), St, StC, 4)
St = "/i": StC = "[fi="
Temp(s1) = Insert_String(Temp(s1), St, StC, 4)
St = "/u": StC = "[fu="
Temp(s1) = Insert_String(Temp(s1), St, StC, 4)
St = "/b": StC = "[fb="
Temp(s1) = Insert_String(Temp(s1), St, StC, 4)
For s1 = 0 To UBound(Temp)
St = Temp(0)
For s3 = 1 To UBound(Temp) '合并字幕
St = St & (" -CRLF- " & Temp(s3))
Subtitles(LineCount) = St
'以下是单行字幕情形
St = "/fn": StC = "[fn="
Subtitles(LineCount) = Insert_String(Subtitles(LineCount), St, StC, 5)
St = "/fs": StC = "[fs="
Subtitles(LineCount) = Insert_String(Subtitles(LineCount), St, StC, 5)
St = "/c": StC = "[fc="
Subtitles(LineCount) = Insert_String(Subtitles(LineCount), St, StC, 11)
St = "/a": StC = "[fp="
Subtitles(LineCount) = Insert_String(Subtitles(LineCount), St, StC, 4)
St = "/i": StC = "[fi="
Subtitles(LineCount) = Insert_String(Subtitles(LineCount), St, StC, 4)
St = "/u": StC = "[fu="
Subtitles(LineCount) = Insert_String(Subtitles(LineCount), St, StC, 4)
St = "/b": StC = "[fb="
Subtitles(LineCount) = Insert_String(Subtitles(LineCount), St, StC, 4)
Erase Temp '释放动态数组所使用的内存
Subtitles(LineCount) = GetXMLCodeValue(Subtitles(LineCount), ByVal "&i&")
Subtitles(LineCount) = GetXMLCodeValue(Subtitles(LineCount), ByVal "&u&")
Subtitles(LineCount) = GetXMLCodeValue(Subtitles(LineCount), ByVal "&b&")
Subtitles(LineCount) = GetXMLCodeValue(Subtitles(LineCount), ByVal "color=")
Subtitles(LineCount) = DeleEffectCode(Subtitles(LineCount), ByVal "&", ByVal "&")
Subtitles(LineCount) = DeleEffectCode(Subtitles(LineCount), ByVal "{", ByVal "}")
Form1.List1.AddItem "Subtitles(" & LineCount & ")=" & Subtitles(LineCount)
'Form1.Text1.Text = "用时:" & Format$((CDbl(t2 - t1) / 1000), "0.000") & "秒"
End Function
'初始化字体信息
Private Function Inifontifo(ByVal TitleText As String, ByVal DefaultFontIfo As String, ByVal Font1 As String, ByVal Font2 As String) As String
'TitleText:传入的字幕文本;DefaultFontIfo:传入的自定义缺省字体信息
'Font1:传入的自定义符号;Font2:传入从V4+Styles中读取的字体信息
Dim i As Long, p1 As Long, p2 As Long, St As String, Temp() As String
If Len(TitleText) = 0 Then Inifontifo = "": Exit Function
p1 = InStr(1, DefaultFontIfo, Font1): p2 = InStr(p1 + 1, DefaultFontIfo, "]")
St = Mid$(DefaultFontIfo, p1, p2 - p1 + 1)
Temp = Split(TitleText, St): Inifontifo = ""
For i = 0 To UBound(Temp)
If i && UBound(Temp) Then
Inifontifo = Inifontifo & (Temp(i) & Font1 & Font2 & "]")
Inifontifo = Inifontifo & Temp(i)
End Function
'取出字符串中SRT、SSA、ASS特效字符串,插入自定义特效字符串
Private Function Insert_String(ByVal SourceSubtitles As String, ByVal SrtEffectCode As String, ByVal SrtCode As String, ByVal LEffect As Long) As String
Dim s1 As Long, s2 As Long, s3 As Long, St As String, S As String, Temp() As String
Dim p1 As Long, p2 As Long
If Len(Trim(SourceSubtitles)) = 0 Then Insert_String = Space(10): Exit Function
On Error Resume Next
s1 = InStr(s1 + 1, SourceSubtitles, SrtEffectCode)
If SrtEffectCode = "/b" And s1 & 0 Then
'不处理边角模糊、字体加宽
If Mid$(SourceSubtitles, s1, 3) = "/be" Or Mid$(SourceSubtitles, s1, 5) = "/bord" Then GoTo Cjl
If SrtEffectCode = "/fs" And s1 & 0 Then
'不处理字体缩放、字间距
If Mid$(SourceSubtitles, s1, 4) = "/fsc" Or Mid$(SourceSubtitles, s1, 4) = "/fsp" Then GoTo Cjl
If SrtEffectCode = "/a" And s1 & 0 Then
'不处理alpha透明度
If Mid$(SourceSubtitles, s1, 6) = "/alpha" Then GoTo Cjl
s3 = Len(SrtEffectCode)
If s1 & 0 Then
s2 = InStr(s1 + 1, SourceSubtitles, "/") 's2是紧接SrtEffectCode下一个"/"的位置
If s2 & 0 Then
St = Mid$(SourceSubtitles, s1 + s3, s2 - s1 - s3)
If InStr(1, St, "{") & 0 Then Mid$(St, InStr(1, St, "{"), 1) = " "
If InStr(1, St, "}") & 0 Then Mid$(St, InStr(1, St, "}"), 1) = " "
St = Trim(St)
s2 = InStr(s1 + 1, SourceSubtitles, "}")
If s2 & 0 Then
St = Mid$(SourceSubtitles, s1 + s3, s2 - s1 - s3)
p1 = InStr(1, SourceSubtitles, SrtCode)
p2 = InStr(p1 + 1, SourceSubtitles, "]")
S = Mid$(SourceSubtitles, p1, p2 - p1 + 1)
Temp = Split(SourceSubtitles, S)
Insert_String = Temp(0) & SrtCode & St & "]" & Temp(1)
Insert_String = SourceSubtitles
End Function
'取得字幕中类似XML代码的值,srt字幕中常见,ssa,ass字幕中不常见
Private Function GetXMLCodeValue(ByVal SourceSubtitles As String, ByVal SrtXMLcode As String) As String
Dim LineCount As Long, sSrt() As String
Dim p1 As Long, p2 As Long, P3 As Long, CO As String, St As String
Dim Pc1 As Long, Pc2 As Long
Dim i As Long, j As Long, K As Long
On Error Resume Next
If InStr(1, SourceSubtitles, "-CRLF-") & 0 Then
If InStr(1, SourceSubtitles, SrtXMLcode) & 0 Then
sSrt = Split(SourceSubtitles, "-CRLF-")
'ReDim sSrt_1(UBound(sSrt))
For LineCount = 0 To UBound(sSrt)
p1 = InStr(1, sSrt(LineCount), SrtXMLcode)
St = Mid$(LCase(SrtXMLcode), 2, 1): p2 = InStr(1, sSrt(LineCount), "&/" & St & "&")
If LCase(SrtXMLcode) = "color=" Then
p2 = InStr(1, sSrt(LineCount), "&/font&")
If InStr(1, sSrt(LineCount), "color=") & 0 Then
Pc1 = InStr(1, sSrt(LineCount), "color="): Pc2 = InStr(Pc1 + 6, sSrt(LineCount), "&")
CO = Mid$(sSrt(LineCount), Pc1 + 6, Pc2 - Pc1 - 6)
If InStr(1, CO, Chr$(34)) & 0 Then CO = "&H" & Mid$(CO, 2, Len(CO) - 2)
If InStr(1, CO, "#") & 0 Then CO = "&H" & Right$(CO, Len(CO) - InStr(1, CO, "#"))
K = LineCount
If p1 & 0 Then
St = Mid$(LCase(SrtXMLcode), 2, 1): P3 = InStr(1, sSrt(K), "f" & St & "="): Mid$(sSrt(K), P3 + 3, 1) = "1"
If LCase(SrtXMLcode) = "color=" Then
Pc1 = InStr(1, sSrt(K), "fc="): Pc2 = InStr(1, sSrt(K), "][fp")
sSrt(K) = Left$(sSrt(K), Pc1 + 2) & CO & Right$(sSrt(K), Len(sSrt(K)) - Pc2 + 1)
If p2 & 0 Then
For i = LineCount + 1 To UBound(sSrt)
St = Mid$(LCase(SrtXMLcode), 2, 1): j = InStr(1, sSrt(i), "&/" & St & "&")
If LCase(SrtXMLcode) = "color=" Then j = InStr(1, sSrt(i), "&/font&")
St = Mid$(LCase(SrtXMLcode), 2, 1): P3 = InStr(1, sSrt(K), "f" & St & "="): Mid$(sSrt(K), P3 + 3, 1) = "1"
If LCase(SrtXMLcode) = "color=" Then
Pc1 = InStr(1, sSrt(K), "fc="): Pc2 = InStr(1, sSrt(K), "][fp")
sSrt(K) = Left$(sSrt(K), Pc1 + 2) & CO & Right$(sSrt(K), Len(sSrt(K)) - Pc2 + 1)
If j & 0 Then Exit For
'执行下一次循环
St = sSrt(0)
For i = 1 To UBound(sSrt) '合并字幕
St = St & (" -CRLF- " & sSrt(i))
Erase sSrt
GetXMLCodeValue = St
GetXMLCodeValue = SourceSubtitles
If InStr(1, SourceSubtitles, SrtXMLcode) & 0 Then
St = Mid$(LCase(SrtXMLcode), 2, 1): p1 = InStr(1, SourceSubtitles, "f" & St & "="): Mid$(SourceSubtitles, p1 + 3, 1) = "1"
If LCase(SrtXMLcode) = "color=" Then
p1 = InStr(1, SourceSubtitles, "color="): p2 = InStr(p1 + 6, SourceSubtitles, "&")
CO = Mid$(SourceSubtitles, p1 + 6, p2 - p1 - 6)
If InStr(1, CO, Chr$(34)) & 0 Then CO = "&H" & Mid$(CO, 2, Len(CO) - 2)
If InStr(1, CO, "#") & 0 Then CO = "&H" & Right$(CO, Len(CO) - InStr(1, CO, "#"))
p1 = InStr(1, SourceSubtitles, "fc="): p2 = InStr(1, SourceSubtitles, "][fp")
SourceSubtitles = Left$(SourceSubtitles, p1 + 2) & CO & Right$(SourceSubtitles, Len(SourceSubtitles) - p2 + 1)
GetXMLCodeValue = SourceSubtitles
GetXMLCodeValue = SourceSubtitles
End Function
'去掉所有SRT、SSA、ASS字幕特效代码
Private Function DeleEffectCode(ByVal SourceSubtitles As String, ByVal StartCh As String, ByVal EndCh As String) As String
Dim p1 As Long, p2 As Long, PS As Long
Dim S As String, St As String, L As Long, aTT() As String
On Error Resume Next
Do While InStr(PS, SourceSubtitles, StartCh) & 0
p1 = InStr(PS, SourceSubtitles, StartCh)
p2 = InStr(p1 + 1, SourceSubtitles, EndCh)
Mid$(SourceSubtitles, p1, p2 - p1 + 1) = String(p2 - p1 + 1, "^")
PS = p2 + 1
aTT = Split(SourceSubtitles, "^")
For L = 0 To UBound(aTT)
If Len(Trim(aTT(L))) && 0 Then St = St & aTT(L)
DeleEffectCode = Trim(St)
End Function
参考知识库
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
访问:63355次
积分:1060
积分:1060
排名:千里之外
原创:38篇
评论:49条
(5)(2)(5)(1)(2)(1)(1)(1)(1)(1)(1)(3)(3)(1)(2)(2)(1)(1)(1)(1)(1)(3)(1)(3)(1)}

我要回帖

更多关于 提取字幕文件 的文章

更多推荐

版权声明:文章内容来源于网络,版权归原作者所有,如有侵权请点击这里与我们联系,我们将及时删除。

点击添加站长微信