Joe's
CWaveFile
Example Code set
I'm writing this guide because I couldn't find a good tutorial or guide
on
CWaveFile, which comes with the DirectX SDK. There were guides, but
they were really advanced and required you to have a lot of back
knowledge. So, in turn, I'm reverse-engineering the code in order to
figure out how it works. At the same time, I know too many APIs that if
I need to do .wav file processing, I'll stare at code for 30 mins
before I figure it out again. Making a tutorial gives me a nice
reference document so that I don't have to do the staring.
This is straight from SDKwavefile.h:
class CWaveFile
{
public:
WAVEFORMATEX*
m_pwfx; // Pointer to
WAVEFORMATEX structure
HMMIO m_hmmio;
// MM I/O handle for the WAVE
MMCKINFO
m_ck; //
Multimedia RIFF chunk
MMCKINFO m_ckRiff; //
Use in opening a WAVE file
DWORD m_dwSize; // The
size of the wave file
MMIOINFO m_mmioinfoOut;
DWORD m_dwFlags;
BOOL m_bIsReadingFromMemory;
BYTE* m_pbData;
BYTE* m_pbDataCur;
ULONG m_ulDataSize;
CHAR* m_pResourceBuffer;
protected:
HRESULT ReadMMIO();
HRESULT WriteMMIO( WAVEFORMATEX* pwfxDest );
public:
CWaveFile();
~CWaveFile();
HRESULT Open( LPWSTR strFileName, WAVEFORMATEX*
pwfx, DWORD dwFlags );
HRESULT OpenFromMemory( BYTE* pbData, ULONG
ulDataSize, WAVEFORMATEX* pwfx, DWORD dwFlags );
HRESULT Close();
HRESULT Read( BYTE* pBuffer, DWORD dwSizeToRead,
DWORD* pdwSizeRead );
HRESULT Write( UINT nSizeToWrite, BYTE* pbData,
UINT* pnSizeWrote );
DWORD GetSize();
HRESULT ResetFile();
WAVEFORMATEX* GetFormat()
{
return m_pwfx;
};
};
This class is part of Direct3d's utlities (DXUT). I decided to use this
because it nicely wraps .wav reading and writing. You will have to find
and compile the source to generate the library file, and include that
directory in Visual Studio. For me, they were in D:\Program Files
(x86)\Microsoft DirectX SDK (August 2009)\Samples\C++\DXUT\Core and
D:\Program Files (x86)\Microsoft DirectX SDK (August
2009)\Samples\C++\DXUT\Optional. Luckily, I just double-clicked the
.sln file, it opened in Visual Studio 2008, and compiled without any
headaches. Hopefully you will have the same experience as I did.
Assumptions about the reader
Now, I won't go through this line-by-line, however, I will try
to get you to open and modify .wav files and provide you with a few
examples on how
to use this API
I'll assume you know:
* C++ basics
* Basic API usage
* Basic Microsoft coding terminology
* How to use Visual Studio
* PCM WAV files and their characteristics (bits per
sample/channels/sample rate)
* Hexadecimal to a degree
I'll try to help out a little on the Microsoft code side as well since
it can be quite confusing.
Tips for using CWaveFile
itself:
#1: You may need to include windows.h and mmsystem.h
#2: You may need winmm.lib, DXUTOpt.lib, and DXUT.lib. DXUTOpt.lib and
DXUT.lib are the results of compiling the DirectX Samples (see above)
#3: As said above, you may have to set those build directories into
Visual Studio
#4: They changed the old .h file to SDKwavefile.h, so the file might
not be named that in a few years. Microsoft, if you change it again,
PLEASE CALL IT something like CWaveFile.h! There are no other classes
in those two files, so there's absolutely no reason why you should
confuse us.
Windows data types
Feel free to skip this part if you already know
BYTE: 8-bit unsigned integer. This is used in a lot of places,
especially in read and write buffers.
WORD: word, a 16-bit unsinged integet
DWORD: double word, a
32-bit unsinged integer
HRESULT: 32-bit data type that
returns a result of a function. You can use the SUCCEEDED() macro to
check for success. example: if ( SUCCEEDED(hResult) )
A lot of the other capped items in the .h file are multimedia API
functions (like MMIOINFO). The MSDN would better explain those than I
can. Why did Microsoft makes this mess? It's because C++'s data types
sometimes change on you. int has changed from 16-bit to 32-bit to
64-bit. So, WORD, DWORD, and other variable names were created to
guarantee data types to be 8-bit, 16-bit, and 32-bit. Microsoft needed
a way to guarantee the size of those values, especially if they create
a new OS on a different architecture. There's already reports that
Microsoft is looking at 128-bit processors, and we are still in the
32-bit to 64-bit transition!
PCM .WAV file notice
Microsoft desigend the PCM .WAV format so that the 8-bit kind is
unsigned and the 16-bit kind is signed. Basically, if you look at an
8-bit .wav file in a hex editor (which I would highly recommend you do.
XVI32 is one I have used in the past), you'll notice a lot of 126-128
numbers (0x7F and 0x80
respectively). All the 8-bit does is add 0x80 to whatever value. So, a
-1 would be 0x80 - 1. If you are generating audio from scratch and you
are not tight on space, you might as well try 16-bit first because it's
less stuff that can go wrong.
Recap:
8-bit PCM .WAV is unsigned, all data is shifted up by 128 or so (0x80
in hex). Silent audio is 128
16-bit PCM .WAV is signed, so silent audio is a 0
Anything above 16-bit is usually signed, though you may want to check
with an audio editor or audio player first to make sure.
PCM .WAV stereo data
Basically, a PCM .WAV file records the channel data and
alternates between left and right. So, basically it's
=================================================
| L | R | L | R | L | R | L | R | L | R | L | R |
=================================================
---0--- ---1--- ---2--- ---3--- ---4--- ---5---
I haven't played with more than 2 channels yet, but I expect it to be
similar. Also, I forget if it's (left, right, left, right) or ( right,
left, right, left), but you can find that out easily enough by playing
with the code. I basically numbered the above sames to show which
ones play at the same time.
24-bit audio
This was tricky. Basically you read in 3 BYTEs at a time, and then you
can combine the individual BYTEs. You then can later combine the 3
BYTEs into a 32-bit or higher int using bitshifts, or use a UNION to do
it, and then convert it back when you write. This is one way I did it,
which might not be that efficient. I also freaking love the Microsoft
typecasted variables of __int8 __int16 __int32 __int64. They are char,
short int, long int, and long long int, respectively.
union spread_int32
{
__int32
i; // definitely make sure to limit
ints to 32-bit in case it gets compiled for x64
unsigned
__int8 eight_bit [4]; // and I prefer using int8 over
char, even though I know it's a typecast
__int16
sixteen_bit [2];
};
.
.
.
spread_int32 si;
unsigned __int8 i8;
si.i = messaround;
i8 = si.eight_bit[0];
messbuffer[(3 *
r) ] = i8;
i8 = si.eight_bit[1];
messbuffer[(3 * r) + 1] = i8;
i8 = si.eight_bit[2];
messbuffer[(3 * r) + 2] = i8;
Function List that I have figured out
HRESULT
Open(
LPWSTR strFileName, WAVEFORMATEX* pwfx, DWORD dwFlags );
Returns
This function returns an HRESULT. Use SUCCEEDED(hresult) and
FAILED(hresult) to determine the outcome. If there's an error, it
returns DirectX errors. Unfortunately, the errors didn't amount to
much. I ended up copying over SDKwavefile.cpp and putting wcout
<< L"Here I AM!" in order to figure out what I was doing wrong.
This function opens a file for both reading and writing. It's messy,
but workable
For reading:
For reading, you need a
filename (in wide character format, wchar_t stuff[12345] for example),
and to set dwFlags to WAVEFILE_READ. I tried setting pwfx to NULL and
it still seemed to work. It turns out that it really doesn't seem to
use pwfx in this case.
hresult = source.Open(file,
NULL,
WAVEFILE_READ);
I expected it to set pwfx to the stats of the .WAV file, but it
turns out that pwfx ended up with garbage. Don't fret because you can
extract that info later using
GetFormat()
For writing:
For writing, you need the
filename again, but you need to have pwfx preset to stuff, otherwise it
throws an error. The dwFlags is here set to NULL to mark it
for read/write mode.
hresult =
destination.Open(file, pwfx, NULL)
WAVEFORMATEX is best described here:
http://msdn.microsoft.com/en-us/library/dd757720%28VS.85%29.aspx
(if the link is busted, go to msdn.net and search for WAVEFORMATEX)
Notes about WAVEFORMATEX:
nSamplesPerSec is horribly
described. It's basically the sampling rate of the audio. You convert
the khz to hz. Basically if you have 44.1 khz,
it's 44100, 22.05 khz, it's 22050, and so forth. Here's a bit of
configuration code I snagged from Microsoft at http://msdn.microsoft.com/en-us/library/ee419050%28VS.85%29.aspx
which explains how to make an uncompressed PCM .wav file
WAVEFORMATEX wfxInput;
ZeroMemory( &wfxInput, sizeof(wfxInput));
wfxInput.wFormatTag = WAVE_FORMAT_PCM;
wfxInput.nSamplesPerSec = 22050; // replace this with 11025 44100 96000 48000 if you like
wfxInput.wBitsPerSample = 8; // 8 = 8 bits / 16 = 16-bits. 24 would probably work as well
wfxInput.nChannels = 1; // 1 mono / 2 stereo
wfxInput.nBlockAlign =
wfxInput.nChannels * (wfxInput.wBitsPerSample / 8); // don't change this because it's calculated very nicely
wfxInput.nAvgBytesPerSec =
wfxInput.nBlockAlign * wfxInput.nSamplesPerSec; // same for this one as well
Which makes our life easier since the
tricky parts are figured out. You don't need to set cbSize because if
you are opening to write, it will start at 0 anyways.
WAVEFORMATEX* GetFormat()
This funciton gets the .WAV file format. It returns a pointer to
a WAVEFORMATEX* structure that's currently being used in the class.
Actually the class definition is pretty self-explanatory since the
implementation is right there. However, this may confuse some people.
Here's a few tricks to this function.
WAVEFORMATEX
destWaveFormat;
memcpy( &destWaveFormat,
source.GetFormat(), sizeof(sourceWaveFormat));
wcout << L"Channels = "
<< source.GetFormat()->nChannels << endl;
wcout << L"Samples/sec = "
<< destWaveFormat.nSamplesPerSec <<
endl;
This copies the format to another structure, which is handy if you want
to take a source .wav and store the results as another one.
DWORD GetSize();
Simply returns the size, in BYTEs, of the data portion of the audio. I
haven't tested this part for accuracy yet, and something seems off
about the calculation.
HRESULT CWaveFile::Read( BYTE*
pBuffer, DWORD dwSizeToRead, DWORD* pdwSizeRead )
//-----------------------------------------------------------------------------
// Name: CWaveFile::Read()
// Desc: Reads section of data from a wave file into pBuffer and returns
// how much read in pdwSizeRead,
reading not more than dwSizeToRead.
// This uses m_ck to determine
where to start reading from. So
// subsequent calls will be
continue where the last left off unless
// Reset() is called.
//-----------------------------------------------------------------------------
Whoa, it's actually commented. So, yeah, it works similar to the UNIX
read() function. Basically you take your array (doesn't have to be a
BYTE * array) and typecast it as BYTE* (so an array of short int would
work if you are working with 16-bit audio). Make sure you are using
sizeof() when specifying the dwSizeToRead. pwdSizeRead basically fills
that variable with the amount of bytes read
Assuming you are storing an array of short ints (aka 16-bit signed
audio):
DWORD sizeread;
HRESULT hr;
.
.
.
hr = waveFile.Read(
(BYTE*) buffer, buffer_size * sizeof(short int),
&sizeread);
where 8-bit would be
DWORD sizeread;
HRESULT hr;
.
.
.
hr = waveFile.Read(
(BYTE*) buffer, buffer_size * sizeof(unsigned
char), &sizeread);
Simple as that. I'll put some example code later to help you understand
this function.
HRESULT CWaveFile::Write( UINT
nSizeToWrite, BYTE* pbSrcData, UINT* pnSizeWrote )
//-----------------------------------------------------------------------------
// Name: CWaveFile::Write()
// Desc: Writes data to the open wave file
//-----------------------------------------------------------------------------
Less documented than read(), but it's very similar. Again, if you want
to write a short int buffer to disc, you would use:
DWORD writtenSize;
HRESULT hResult ;
.
.
.
hResult = waveFile.Write(buffer_size * sizeof(short int), (BYTE *)
soundBuffer, &writtenSize)
Generate
a
sine wave (for those good at math)
#pragma comment(lib, "winmm.lib")
#pragma comment(lib, "DXUTOpt.lib")
#pragma comment(lib, "DXUT.lib")
#include <windows.h>
#include <SDKWavefile.h>
#include <math.h>
// notice that this is a UNICODE build
//
// Update: I found out that wchar_t **argv might not work for
parameters.
// You may have to
change it out for char **argv and then convert it to
// UNICODE if you want
to rig to program to use different file names
int main(int argc, wchar_t **argv)
{
// declare the important stuffs
CWaveFile sinwave;
WAVEFORMATEX sinWaveFormat;
HRESULT hr;
const double soundLength = 3;
const double PI = 3.1415926;
// fill the struct with all 0s
ZeroMemory((void*) &sinWaveFormat,
sizeof(WAVEFORMATEX));
// now configure a 16-bit, 44100 hz, mono. PCM
(uncompressed .WAV file)
sinWaveFormat.wFormatTag = WAVE_FORMAT_PCM;
sinWaveFormat.nSamplesPerSec = 44100;
sinWaveFormat.wBitsPerSample =
16;
sinWaveFormat.nChannels =
1;
sinWaveFormat.nBlockAlign =
sinWaveFormat.nChannels *
(sinWaveFormat.wBitsPerSample / 8);
sinWaveFormat.nAvgBytesPerSec =
sinWaveFormat.nBlockAlign *
sinWaveFormat.nSamplesPerSec;
// open the file for writing
hr = sinwave.Open(L"Sinwave.wav", &sinWaveFormat,
NULL);
if ( SUCCEEDED(hr) == TRUE )
{
// all sorts of math I figured out
// how much to increment the period by
double increment = 2 * PI /
sinWaveFormat.nSamplesPerSec;
// counts where in the array to put stuff
unsigned int buffercount = 0;
// and how big the buffer you want
const unsigned int buffer_size = 1024;
// and the buffer. Since it's 16-bit, we
use a signed short int
short int soundBuffer
[buffer_size];
// and this keeps track of how much data
is written. I really should check it
// after it runs
unsigned int writtenSize;
// clear the sound buffer just in case.
This is a handy trick to debug if something went wrong as well
ZeroMemory(&soundBuffer, buffer_size
* sizeof(short int));
// generate the sin wave. The equation
is y(t) = A * sin ( angFreq * t + phase)
// A = amplitude / t is time /
phase is phase offset / ang freq is the angular frequency
// source: wikipedia
for ( double theta = 0; theta < (2 *
PI * soundLength); theta += increment )
{
soundBuffer[buffercount] = (short int)10000 * sin ( (double) (theta *
880) );
buffercount++;
// dump the buffer
if ( buffercount ==
buffer_size )
{
sinwave.Write(buffer_size * sizeof(short int), (BYTE *)
&soundBuffer, &writtenSize);
buffercount = 0;
}
}
// dump anything left
if ( buffercount > 0 )
{
sinwave.Write(buffercount * sizeof(short int), (BYTE *)
&soundBuffer, &writtenSize);
}
} // end if ( SUCCEEDED(hresult) == TRUE )
sinwave.Close();
system("pause");
return 42;
}
One note is that I originally
tried to use floats, but there
turned out to be precision errors! So, doubles worked out far better.
Copy
A good way to figure out an API like this one is to do a copy.
Basically you open a file for reading and a file for writing, and copy
the contents. If done correctly, the data parts will be 100% identical
binarialy. However, if the .wav file contains any header information,
it may be discarded in the copy. I found that out from using ding.wav
as a test file.
#pragma comment(lib, "winmm.lib")
#pragma comment(lib,
"DXUTOpt.lib")
#pragma comment(lib, "DXUT.lib")
#pragma comment(lib, "dxerr.lib")
#include <windows.h>
#include <SDKWavefile.h>
#include <Dxerr.h>
#include <math.h>
#define MAX_BUFFER_SIZE 10240
#include <iostream>
using std::cout;
using std::endl;
using std::wcout;
int main(int argc, char **argv)
{
if ( argc < 3 )
{
cout << "Please specify source and destination files.\n";
return 0; // this cuts out of the function early. I'm doing this to
avoid bracket hell
}
HRESULT hresult;
CWaveFile source;
WAVEFORMATEX
sourceWaveFormat;
WAVEFORMATEX
destWaveFormat;
CWaveFile dest;
BYTE
buffer[MAX_BUFFER_SIZE];
wchar_t file
[MAX_PATH]; // note: MAX_PATH is a constant defined in windows.h
wchar_t file2
[MAX_PATH]; // which is the maximum
amount of characters in a file path. Neat, huh?
size_t
convertedChars = 0;
DWORD amountRead = 0;
unsigned int
amountWrote = 0;
// Open the file for
reading
//
// note: the only
flag I found in the SDKWavefile.cpp is WAVEFILE_READ
// which tells the function to read
the file
//
// Otherwise, not specifying
WAVEFILE_READ looks like it will create
//
a
new file
// convert the data in argv[1] to UNICODE
mbstowcs(
file,
argv[1], MAX_PATH);
hresult =
source.Open(file, NULL, WAVEFILE_READ);
if ( FAILED(hresult)
== TRUE )
{
wchar_t error [10240];
cout << "Error when trying to open file for reading.\n";
wcscpy(error, DXGetErrorString(hresult));
wcout << error << endl;
wcscpy(error, DXGetErrorDescription(hresult));
wcout << error << endl;
system("pause");
return hresult;
}
memcpy(
&sourceWaveFormat, source.GetFormat(), sizeof(sourceWaveFormat));
wcout <<
L"Opened file with stats:\n\n";
wcout <<
L"sourceWaveFormat.cbSize = " << sourceWaveFormat.cbSize <<
endl;
wcout <<
L"sourceWaveFormat.nAvgBytesPerSec = " <<
sourceWaveFormat.nAvgBytesPerSec << endl;
wcout <<
L"sourceWaveFormat.nBlockAlign = " <<
sourceWaveFormat.nBlockAlign << endl;
wcout <<
L"Channels = " << source.GetFormat()->nChannels << endl;
wcout <<
L"Samples/sec = " << sourceWaveFormat.nSamplesPerSec <<
endl;
wcout <<
L"Bits/sample = " << sourceWaveFormat.wBitsPerSample <<
endl;
// copy the contents
between the two WAVEFORMATEX structures
memcpy(
&destWaveFormat, source.GetFormat(), sizeof(sourceWaveFormat));
// Open the file for
writing. Seems like it'll truncate the file to 0 if it exists
// In other words,
consider it overwritten
//
// Note that I
didn't specify a flag
mbstowcs( file2,
argv[2], MAX_PATH);
hresult =
dest.Open(file2, &destWaveFormat, NULL);
// file couldn't
open for some reason. Exit now but give the reason why
if (
SUCCEEDED(hresult) == FALSE )
{
cout << "Error when trying to open file for writing.\n";
wchar_t error [10240];
wcscpy(error, DXGetErrorString(hresult));
wcout << error << endl;
wcscpy(error, DXGetErrorDescription(hresult));
wcout << error << endl;
system("pause");
return hresult;
}
// We got to this
point. Now begin copying
do
{
// read into the buffer
hresult
= source.Read( buffer, MAX_BUFFER_SIZE, &amountRead );
// write to disc
hresult
= dest.Write( (unsigned int) amountRead, buffer,
&amountWrote );
} while ( amountRead
!= 0 );
source.Close();
dest.Close();
return 42;
}
Anyways, I hope this guide helps you figure out this nice set of code.
Personally, having this guide helps me later down the road after not
using it for a while, and I need to do audio manipulation, even if it
is for one of my own projects.
Made with Mozilla SeaMonkey
Copyright (C) 2009 by Joe Plante. Code extracted from SDKwavefile.cpp
and SDKwavefile.h are copyright by Microsoft
You are free to use any code in this page for your projects. If you
post this page on another website (which I don't mind since it's less
for me to host), please give the creators credit.
Other sources:
http://msdn.microsoft.com/en-us/library/ee419050%28VS.85%29.aspx