调整Microsoft Translator WAVE音量(译文)
By S.F.
本文链接 https://www.kyfws.com/news/adjusting-microsoft-translator-wave-volume/
版权声明 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
- 9 分钟阅读 - 4124 个词 阅读量 0调整Microsoft Translator WAVE音量(译文)
原文地址:https://www.codeproject.com/Articles/318881/Adjusting-Microsoft-Translator-WAVE-Volume
原文作者:Joel Ivory Johnson
译文由本站翻译
前言
如何调整Microsoft Translator WAVE音量
视频条目下载代码(32 KB) 本文中的代码受Windows Phone 7上一些问题的启发,但它的通用性足以在其他基于.NET的平台上使用.在Windows Phone AppHub论坛中,有一个问题有关更改Microsoft翻译服务返回的WAVE的音量文件.在StackOverflow论坛中,有一个问题关于将两个WAVE文件混合在一起.我开始研究有关体积问题的解决方案,当我退后一步来研究它时,我意识到我离另一个问题的解决方案并不遥远.因此,我用相同的代码实现了两个解决方案.在这第一篇文章中,我将展示我需要做些什么来改变来自Microsoft翻译服务的WAVE流的数量. 我对代码保持了足够的通用性,这样,如果您想对代码应用其他算法,就可以这样做.关于如何更好地处理声音数据的存储缓冲区,我有一些想法,可以在不将整个记录保存在内存中的情况下操纵大型录音,并且可以更轻松地更改录音的长度.但是,所提供的代码演示了三件事: 保存WAVE文件的代码是我前段时间演示的代码的修改版本,该代码用于为来自麦克风缓冲区的内容编写适当的WAVE文件.
先决条件
我假设您知道WAVE文件和示例是什么.我还假设您知道如何使用Microsoft Translator Web服务.
加载波形文件
WAVE文件的格式已被详细记录. WAVE文件中可以使用多种编码,但是我将重点放在PCM编码的WAVE文件上,现在将忽略所有其他可能的编码.我使用的文档可以在[here.]中找到.(https://ccrma.stanford.edu/courses/422/projects/WaveFormat/)在处理真实的WAVE文件和我将在稍后评论这些变体.通常,您在标头中发现的大多数内容都是8\16和32位整数和"字符串".我将整个标头读取到字节数组中,然后从该字节数组中提取信息为适当的类型.要从字节数组中提取"字符串",您需要知道"字符串"的起始索引及其包含的字符数.然后,您可以使用Encoding.UTF8.GetString提取字符串.如果您了解数字的编码方式(little endian),则进行解码他们很容易.如果您想更好地理解,请参阅有关编码的Wikipedia文章. 标头的长度至少应为``44'‘字节.所以我开始读取流的前44个字节. SubChunk1Size通常将包含值16.如果大于16,则标头大于" 44"个字节,其余的由我读取.我允许的标头大小最大为" 64"字节(比我遇到的要大得多).头大小大于" 44"个字节通常意味着在" SubChunk1"的末尾还有一个额外的参数.对于我正在执行的操作,其他参数的内容无关紧要.但是我仍然需要考虑它们消耗的空间以正确读取标题. 令我惊讶的是,标题中字段的内容并非总是填充.一些音频编辑器会将某些字段置零.我第一次尝试读取WAVE文件是使用来自开源音频编辑器Audacity的文件.在其他字段中," BitsPerSample"字段为零.我不确定格式是否允许这样做.当然,在我发现的任何规格表中都没有.但是,当我遇到这个问题时,我会假设值为" 16". 无论WAVE文件在读入时是包含8位,16位还是32位样本,我都将值存储在一个double数组中.我之所以选择这样做,是因为double对于我所想到的某些数学运算而言效果更好.
public void ReadWaveData(Stream sourceStream, bool normalizeAmplitude = false)
{
//In general I should only need 44 bytes.
//I'm allocating extra memory because of a variance I've seen in some WAV files.
byte[] header = new byte[60];
int bytesRead = sourceStream.Read(header, 0, 44);
if(bytesRead!=44)
throw new InvalidDataException(String.Format
("This can't be a wave file. It is only {0} bytes long!",bytesRead));
int audioFormat = ChannelCount = (header[20]) | (header[21] << 8);
if (audioFormat != 1)
throw new Exception("Only PCM Waves are supported (AudioFormat=1)");
#region mostless useless code
string chunkID = Encoding.UTF8.GetString(header, 0, 4);
if (!chunkID.Equals("RIFF"))
{
throw new InvalidDataException(String.Format
("Expected a ChunkID of 'RIFF'. Received a chunk ID of {0} instead.", chunkID));
}
int chunkSize = (header[4]) | (header[5] << 8) |
(header[6] << 16) | (header[7] << 24);
string format = Encoding.UTF8.GetString(header, 8, 4);
if (!format.Equals("WAVE"))
{
throw new InvalidDataException(String.Format
("Expected a format of 'WAVE'. Received a chunk ID of {0} instead.", format));
}
string subChunkID = Encoding.UTF8.GetString(header, 12, 4);
if (!format.Equals("fmt "))
{
throw new InvalidDataException(String.Format("Expected a subchunkID of
'fmt '. Received a chunk ID of {0} instead.", subChunkID));
}
int subChunkSize = (header[16]) | (header[17] << 8) |
(header[18] << 16) | (header[19] << 24);
#endregion
if (subChunkSize > 16)
{
var bytesNeeded = subChunkSize - 16;
if(bytesNeeded+44 > header.Length)
throw new InvalidDataException("The WAV header is larger than expected. ");
sourceStream.Read(header, 44, subChunkSize - 16);
}
ChannelCount = (header[22]) | (header[23] << 8);
SampleRate = (header[24]) | (header[25] << 8) |
(header[26] << 16) | (header[27] << 24);
#region Useless Code
int byteRate = (header[28]) | (header[29] << 8) |
(header[30] << 16) | (header[31] << 24);
int blockAlign = (header[32]) | (header[33] << 8);
#endregion
BitsPerSample = (header[34]) | (header[35] << 8);
#region Useless Code
string subchunk2ID = Encoding.UTF8.GetString(header, 20 + subChunkSize, 4);
#endregion
var offset = 24 + subChunkSize;
int dataLength = (header[offset+0]) | (header[offset+1] << 8) |
(header[offset+2] << 16) | (header[offset+3] << 24);
//I can't find any documentation stating that I should make the following inference,
//but I've seen wave files that have
//0 in the bits per sample field. These wave files were 16-bit, so
//if bits per sample isn't specified I will assume 16 bits.
if (BitsPerSample == 0)
{
BitsPerSample = 16;
}
byte[] dataBuffer = new byte[dataLength];
bytesRead = sourceStream.Read(dataBuffer, 0, dataBuffer.Length);
Debug.Assert(bytesRead == dataLength);
if (BitsPerSample == 8)
{
byte[] unadjustedSoundData = new byte[dataBuffer.Length / (BitsPerSample / 8)];
Buffer.BlockCopy(dataBuffer, 0, unadjustedSoundData, 0, dataBuffer.Length);
SoundData = new double[unadjustedSoundData.Length];
for (var i = 0; i < (unadjustedSoundData.Length); ++i)
{
SoundData[i] = 128d*(double)unadjustedSoundData[i];
}
}
if (BitsPerSample == 16)
{
short[] unadjustedSoundData = new short[dataBuffer.Length / (BitsPerSample / 8)];
Buffer.BlockCopy(dataBuffer, 0, unadjustedSoundData, 0, dataBuffer.Length);
SoundData = new double[unadjustedSoundData.Length];
for (var i = 0; i < (unadjustedSoundData.Length); ++i)
{
SoundData[i] = (double) unadjustedSoundData[i];
}
}
else if(BitsPerSample==32)
{
int[] unadjustedSoundData = new int[dataBuffer.Length / (BitsPerSample / 8)];
Buffer.BlockCopy(dataBuffer, 0, unadjustedSoundData, 0, dataBuffer.Length);
SoundData = new double[unadjustedSoundData.Length];
for (var i = 0; i < (unadjustedSoundData.Length); ++i)
{
SoundData[i] = (double)unadjustedSoundData[i];
}
}
Channels = new PcmChannel[ChannelCount];
for (int i = 0; i < ChannelCount;++i )
{
Channels[i]=new PcmChannel(this,i);
}
if (normalizeAmplitude )
NormalizeAmplitude();
}
单声道vs立体声
在单声道(单通道)文件中,采样是一个接一个地排序的,那里没有任何神秘之处.对于立体声文件,数据流将包含通道0的第一个样本,通道1的第一个样本,通道0的第二个样本,通道1的第二个样本,依此类推.每隔一个样本将用于左声道或右声道.样本数据以相同的方式存储在内存中.在名为" SampleData"的数组中.为了专门与一个或另一个频道一起使用,还有一个名为" Channels"(类型为" PcmChannel"的频道)的属性,可用于访问一个频道.
public class PcmChannel
{
internal PcmChannel(PcmData parent, int channel)
{
Channel = channel;
Parent = parent;
}
protected PcmData Parent { get; set; }
public int Channel { get; protected set; }
public int Length
{
get { return (int)(Parent.SoundData.Length/Parent.ChannelCount); }
}
public double this[int index]
{
get { return Parent.SoundData[index*Parent.ChannelCount + Channel]; }
set { Parent.SoundData[index*Parent.ChannelCount + Channel] = value; }
}
}
//The following is a simplified interface definition for how the PcmChannel
//data type is relevant to our PCM data. The actual PcmData class has more
//more members than what follows.
public class PcmData
{
public double[] SoundData { get; set; }
public int ChannelCount { get; set; }
public PcmChannel[] Channels { get; set; }
}
哪里有24位支持
是的,确实存在24位WAVE文件.我不支持它们(因为),因为需要更多代码来处理它们,并且我想到的大多数情况都将使用8位和16位文件.添加对32位文件的支持仅需要5行代码.我将在即将发布的代码中处理24位文件.
改变声音数据
对" SoundData []“数组中的值所做的更改将更改声音数据.在如何修改数据方面存在一些限制.由于我将其写入16位WAVE文件,因此可以写出的最大值和最小值是32768和-32767. " double"数据类型的范围远大于此范围.当准备将其写回文件时,可以使用属性” AdjustmentFactor"和" AdjustmentOffset"来更改声音数据.它们用于对声音数据进行线性变换(记住" y =mx + b?").通过" NormalizeAmplitude"方法为您找到正确的值.更改声音数据后调用此方法将导致选择适当的值.默认情况下,此方法将尝试将声音数据标准化为最大振幅的99%.您可以在0和1的值之间将参数传递给此方法,以获得其他振幅.
public void NormalizeAmplitude( double percentMax = 0.99d)
{
var max = SoundData.Max();
var min = SoundData.Min();
double rangeSize = max - min+1 ;
AdjustmentFactor = ((percentMax * (double)short.MaxValue) -
percentMax * (double)short.MinValue) / (double)rangeSize;
AdjustmentOffset = (percentMax * (double)short.MinValue) - (min * AdjustmentFactor);
int maxExpected = (int)(max * AdjustmentFactor + AdjustmentOffset);
int minExpected = (int)(min * AdjustmentFactor + AdjustmentOffset);
}
保存WAVE数据
为了保存WAVE数据,我使用了一些变体来保存来自麦克风的流.代码的原始形式存在一个错误,该错误会在处理具有多个通道的流时有所作为.微型电话产生一个单通道流,并且不受此错误的影响(但已在此处修复).用于编写wave的代码会根据给定的参数生成一个标头,然后将其写出WAVE数据. WAVE数据必须从" double []“数组转换为包含” little endian"格式的16位整数的" byte []“数组.
public class PcmData
{
public void Write(Stream destinationStream)
{
byte[] writeData = new byte[SoundData.Length*2];
short[] conversionData = new short[SoundData.Length];
//convert the double[] data back to int16[] data
for(int i=0;i<SoundData.Length;++i)
{
double sample = ((SoundData[i]*AdjustmentFactor)+AdjustmentOffset);
//if the value goes outside of range then clip it
sample = Math.Min(sample, (double) short.MaxValue);
sample = Math.Max(sample, short.MinValue);
conversionData[i] = (short) sample;
}
int max = conversionData.Max();
int min = conversionData.Min();
//put the int16[] data into a byte[] array
Buffer.BlockCopy(conversionData, 0, writeData, 0, writeData.Length);
WaveHeaderWriter.WriteHeader(destinationStream,writeData.Length,
ChannelCount,SampleRate);
destinationStream.Write(writeData,0,writeData.Length);
}
}
public class WaveHeaderWriter
{
static byte[] RIFF_HEADER = new byte[] { 0x52, 0x49, 0x46, 0x46 };
static byte[] FORMAT_WAVE = new byte[] { 0x57, 0x41, 0x56, 0x45 };
static byte[] FORMAT_TAG = new byte[] { 0x66, 0x6d, 0x74, 0x20 };
static byte[] AUDIO_FORMAT = new byte[] { 0x01, 0x00 };
static byte[] SUBCHUNK_ID = new byte[] { 0x64, 0x61, 0x74, 0x61 };
private const int BYTES_PER_SAMPLE = 2;
public static void WriteHeader(
System.IO.Stream targetStream,
int byteStreamSize,
int channelCount,
int sampleRate)
{
int byteRate = sampleRate * channelCount * BYTES_PER_SAMPLE;
int blockAlign = BYTES_PER_SAMPLE;
targetStream.Write(RIFF_HEADER, 0, RIFF_HEADER.Length);
targetStream.Write(PackageInt(byteStreamSize + 36, 4), 0, 4);
targetStream.Write(FORMAT_WAVE, 0, FORMAT_WAVE.Length);
targetStream.Write(FORMAT_TAG, 0, FORMAT_TAG.Length);
targetStream.Write(PackageInt(16, 4), 0, 4);//Subchunk1Size
targetStream.Write(AUDIO_FORMAT, 0, AUDIO_FORMAT.Length);//AudioFormat
targetStream.Write(PackageInt(channelCount, 2), 0, 2);
targetStream.Write(PackageInt(sampleRate, 4), 0, 4);
targetStream.Write(PackageInt(byteRate, 4), 0, 4);
targetStream.Write(PackageInt(blockAlign, 2), 0, 2);
targetStream.Write(PackageInt(BYTES_PER_SAMPLE * 8), 0, 2);
//targetStream.Write(PackageInt(0,2), 0, 2);//Extra param size
targetStream.Write(SUBCHUNK_ID, 0, SUBCHUNK_ID.Length);
targetStream.Write(PackageInt(byteStreamSize, 4), 0, 4);
}
static byte[] PackageInt(int source, int length = 2)
{
if ((length != 2) && (length != 4))
throw new ArgumentException("length must be either 2 or 4", "length");
var retVal = new byte[length];
retVal[0] = (byte)(source & 0xFF);
retVal[1] = (byte)((source >> 8) & 0xFF);
if (length == 4)
{
retVal[2] = (byte)((source >> 0x10) & 0xFF);
retVal[3] = (byte)((source >> 0x18) & 0xFF);
}
return retVal;
}
}
使用代码
一旦获得了成功,就只需要几行代码即可完成工作.对于示例程序,我正在从Microsoft翻译服务下载口头短语,将其放大,然后将原始版本和放大版本都写入文件.
static void Main(string[] args)
{
PcmData pcm;
//Download the WAVE stream
MicrosoftTranslatorService.LanguageServiceClient client = new LanguageServiceClient();
string waveUrl = client.Speak(APP_ID, "this is a volume test", "en", "audio/wav","");
WebClient wc = new WebClient();
var soundData = wc.DownloadData(waveUrl);
//Load the WAVE stream and let it's amplitude be adjusted to 99% maximum
using (var ms = new MemoryStream(soundData))
{
pcm = new PcmData(ms, true);
}
//Write the amplified stream to a file
using (Stream s = new FileStream("amplified.wav", FileMode.Create, FileAccess.Write))
{
pcm.Write(s);
}
//write the original unaltered stream to a file
using (Stream s = new FileStream("original.wav", FileMode.Create, FileAccess.Write))
{
s.Write(soundData,0,soundData.Length);
}
}
最终结果
该代码按设计工作,但是我发现了一些可能使其无效的方案.一种情况是并非所有电话的扬声器都具有相同的响应频率.在一个电话上响亮清晰的频率可能在另一个电话上听起来比较安静.另一种情况是,即使大多数其他样本可能远未达到相同的幅度水平,源文件中的样本也可能达到最大或最小读数.发生这种情况时,虚假样本将限制应用于文件的放大量.我以大胆的方式打开了原始的放大WAVE文件以查看结果,并且很高兴看到以大胆的方式查看放大的WAVE时,它看起来确实更大声.
第2部分-叠加Wave文件
该代码可以解决的另一个问题是以各种方式将wave文件组合在一起.我将在下一篇文章中提出.从现在开始到现在,我将在本周的亚特兰大Windows Phone开发人员会议上进行演示(如果您在亚特兰大地区,那就来吧!),并在演示后回到此代码. CodeProject
许可
本文以及所有相关的源代码和文件均已获得The Code Project Open License (CPOL)的许可。
C# Mobile Dev Intermediate Windows-Phone-7 新闻 翻译