Joe's CWaveFile Example Code set

I'm writing this guide because I couldn't find a good tutorial or guide on CWaveFile, which comes with the DirectX SDK. There were guides, but they were really advanced and required you to have a lot of back knowledge. So, in turn, I'm reverse-engineering the code in order to figure out how it works. At the same time, I know too many APIs that if I need to do .wav file processing, I'll stare at code for 30 mins before I figure it out again. Making a tutorial gives me a nice reference document so that I don't have to do the staring.

This is straight from SDKwavefile.h:

class CWaveFile
{
public:
    WAVEFORMATEX* m_pwfx;        // Pointer to WAVEFORMATEX structure
    HMMIO m_hmmio;       // MM I/O handle for the WAVE
    MMCKINFO m_ck;          // Multimedia RIFF chunk
    MMCKINFO m_ckRiff;      // Use in opening a WAVE file
    DWORD m_dwSize;      // The size of the wave file
    MMIOINFO m_mmioinfoOut;
    DWORD m_dwFlags;
    BOOL m_bIsReadingFromMemory;
    BYTE* m_pbData;
    BYTE* m_pbDataCur;
    ULONG m_ulDataSize;
    CHAR* m_pResourceBuffer;

protected:
    HRESULT ReadMMIO();
    HRESULT WriteMMIO( WAVEFORMATEX* pwfxDest );

public:
            CWaveFile();
            ~CWaveFile();

    HRESULT Open( LPWSTR strFileName, WAVEFORMATEX* pwfx, DWORD dwFlags );
    HRESULT OpenFromMemory( BYTE* pbData, ULONG ulDataSize, WAVEFORMATEX* pwfx, DWORD dwFlags );
    HRESULT Close();

    HRESULT Read( BYTE* pBuffer, DWORD dwSizeToRead, DWORD* pdwSizeRead );
    HRESULT Write( UINT nSizeToWrite, BYTE* pbData, UINT* pnSizeWrote );

    DWORD   GetSize();
    HRESULT ResetFile();
    WAVEFORMATEX* GetFormat()
    {
        return m_pwfx;
    };
};

This class is part of Direct3d's utlities (DXUT). I decided to use this because it nicely wraps .wav reading and writing. You will have to find and compile the source to generate the library file, and include that directory in Visual Studio. For me, they were in D:\Program Files (x86)\Microsoft DirectX SDK (August 2009)\Samples\C++\DXUT\Core and D:\Program Files (x86)\Microsoft DirectX SDK (August 2009)\Samples\C++\DXUT\Optional. Luckily, I just double-clicked the .sln file, it opened in Visual Studio 2008, and compiled without any headaches. Hopefully you will have the same experience as I did.

Assumptions about the reader

Now, I won't go through this line-by-line, however, I will try to get you to open and modify .wav files and provide you with a few examples on how to use this API

I'll assume you know:
* C++ basics
* Basic API usage
* Basic Microsoft coding terminology
* How to use Visual Studio
* PCM WAV files and their characteristics (bits per sample/channels/sample rate)
* Hexadecimal to a degree

I'll try to help out a little on the Microsoft code side as well since it can be quite confusing.

Tips for using CWaveFile itself:

#1: You may need to include windows.h and mmsystem.h
#2: You may need winmm.lib, DXUTOpt.lib, and DXUT.lib. DXUTOpt.lib and DXUT.lib are the results of compiling the DirectX Samples (see above)
#3: As said above, you may have to set those build directories into Visual Studio
#4: They changed the old .h file to SDKwavefile.h, so the file might not be named that in a few years. Microsoft, if you change it again, PLEASE CALL IT something like CWaveFile.h! There are no other classes in those two files, so there's absolutely no reason why you should confuse us.

Windows data types

Feel free to skip this part if you already know

BYTE: 8-bit unsigned integer. This is used in a lot of places, especially in read and write buffers.
WORD: word, a 16-bit unsinged integet
DWORD: double word, a 32-bit unsinged integer
HRESULT: 32-bit data type that returns a result of a function. You can use the SUCCEEDED() macro to check for success. example: if ( SUCCEEDED(hResult) )

A lot of the other capped items in the .h file are multimedia API functions (like MMIOINFO). The MSDN would better explain those than I can. Why did Microsoft makes this mess? It's because C++'s data types sometimes change on you. int has changed from 16-bit to 32-bit to 64-bit. So, WORD, DWORD, and other variable names were created to guarantee data types to be 8-bit, 16-bit, and 32-bit. Microsoft needed a way to guarantee the size of those values, especially if they create a new OS on a different architecture. There's already reports that Microsoft is looking at 128-bit processors, and we are still in the 32-bit to 64-bit transition!

PCM .WAV file notice

Microsoft desigend the PCM .WAV format so that the 8-bit kind is unsigned and the 16-bit kind is signed. Basically, if you look at an 8-bit .wav file in a hex editor (which I would highly recommend you do. XVI32 is one I have used in the past), you'll notice a lot of 126-128 numbers (0x7F and 0x80 respectively). All the 8-bit does is add 0x80 to whatever value. So, a -1 would be 0x80 - 1. If you are generating audio from scratch and you are not tight on space, you might as well try 16-bit first because it's less stuff that can go wrong.

Recap:

8-bit PCM .WAV is unsigned, all data is shifted up by 128 or so (0x80 in hex). Silent audio is 128
16-bit PCM .WAV is signed, so silent audio is a 0

Anything above 16-bit is usually signed, though you may want to check with an audio editor or audio player first to make sure.

PCM .WAV stereo data

Basically, a PCM .WAV file records the channel data and alternates between left and right. So, basically it's

=================================================
| L | R | L | R | L | R | L | R | L | R | L | R |
=================================================
 ---0--- ---1--- ---2--- ---3--- ---4--- ---5---

I haven't played with more than 2 channels yet, but I expect it to be similar. Also, I forget if it's (left, right, left, right) or ( right, left, right, left), but you can find that out easily enough by playing with the code. I basically numbered the above sames to show which ones play at the same time.

24-bit audio

This was tricky. Basically you read in 3 BYTEs at a time, and then you can combine the individual BYTEs. You then can later combine the 3 BYTEs into a 32-bit or higher int using bitshifts, or use a UNION to do it, and then convert it back when you write. This is one way I did it, which might not be that efficient. I also freaking love the Microsoft typecasted variables of __int8 __int16 __int32 __int64. They are char, short int, long int, and long long int, respectively.

union spread_int32
{
            __int32   i;   // definitely make sure to limit ints to 32-bit in case it gets compiled for x64
   unsigned __int8    eight_bit [4]; // and I prefer using int8 over char, even though I know it's a typecast
            __int16   sixteen_bit [2];
};
.
.
.
spread_int32 si;
unsigned __int8 i8;

si.i = messaround;
i8 = si.eight_bit[0];
messbuffer[(3 * r)    ] = i8;
i8 = si.eight_bit[1];
messbuffer[(3 * r) + 1] = i8;
i8 = si.eight_bit[2];
messbuffer[(3 * r) + 2] = i8;

Function List that I have figured out

HRESULT Open( LPWSTR strFileName, WAVEFORMATEX* pwfx, DWORD dwFlags );

Returns

This function returns an HRESULT. Use SUCCEEDED(hresult) and FAILED(hresult) to determine the outcome. If there's an error, it returns DirectX errors. Unfortunately, the errors didn't amount to much. I ended up copying over SDKwavefile.cpp and putting wcout << L"Here I AM!" in order to figure out what I was doing wrong.

This function opens a file for both reading and writing. It's messy, but workable

For reading:

For reading, you need a filename (in wide character format, wchar_t stuff[12345] for example), and to set dwFlags to WAVEFILE_READ. I tried setting pwfx to NULL and it still seemed to work. It turns out that it really doesn't seem to use pwfx in this case.

hresult = source.Open(file, NULL, WAVEFILE_READ);

I expected it to set pwfx to the stats of the .WAV file, but it turns out that pwfx ended up with garbage. Don't fret because you can extract that info later using GetFormat()

For writing:

For writing, you need the filename again, but you need to have pwfx preset to stuff, otherwise it throws an error. The dwFlags is here set to NULL to mark it
for read/write mode.

hresult = destination.Open(file, pwfx, NULL)

WAVEFORMATEX is best described here: http://msdn.microsoft.com/en-us/library/dd757720%28VS.85%29.aspx (if the link is busted, go to msdn.net and search for WAVEFORMATEX)

Notes about WAVEFORMATEX:

nSamplesPerSec is horribly described. It's basically the sampling rate of the audio. You convert the khz to hz. Basically if you have 44.1 khz, it's 44100, 22.05 khz, it's 22050, and so forth. Here's a bit of configuration code I snagged from Microsoft at http://msdn.microsoft.com/en-us/library/ee419050%28VS.85%29.aspx which explains how to make an uncompressed PCM .wav file
    WAVEFORMATEX  wfxInput;

ZeroMemory( &wfxInput, sizeof(wfxInput));
wfxInput.wFormatTag = WAVE_FORMAT_PCM;
wfxInput.nSamplesPerSec = 22050; // replace this with 11025 44100 96000 48000 if you like
wfxInput.wBitsPerSample = 8; // 8 = 8 bits / 16 = 16-bits. 24 would probably work as well
wfxInput.nChannels = 1; // 1 mono / 2 stereo
wfxInput.nBlockAlign =
wfxInput.nChannels * (wfxInput.wBitsPerSample / 8); // don't change this because it's calculated very nicely
wfxInput.nAvgBytesPerSec =
wfxInput.nBlockAlign * wfxInput.nSamplesPerSec; // same for this one as well
Which makes our life easier since the tricky parts are figured out. You don't need to set cbSize because if you are opening to write, it will start at 0 anyways.

WAVEFORMATEX* GetFormat()

This funciton gets the .WAV file format. It returns a pointer to a WAVEFORMATEX* structure that's currently being used in the class. Actually the class definition is pretty self-explanatory since the implementation is right there. However, this may confuse some people. Here's a few tricks to this function.

WAVEFORMATEX destWaveFormat;
memcpy( &destWaveFormat, source.GetFormat(), sizeof(sourceWaveFormat));

wcout << L"Channels = " << source.GetFormat()->nChannels << endl;
wcout << L"Samples/sec = " << destWaveFormat.nSamplesPerSec << endl;

This copies the format to another structure, which is handy if you want to take a source .wav and store the results as another one.

DWORD   GetSize();

Simply returns the size, in BYTEs, of the data portion of the audio. I haven't tested this part for accuracy yet, and something seems off about the calculation.

HRESULT CWaveFile::Read( BYTE* pBuffer, DWORD dwSizeToRead, DWORD* pdwSizeRead )

//-----------------------------------------------------------------------------
// Name: CWaveFile::Read()
// Desc: Reads section of data from a wave file into pBuffer and returns
//       how much read in pdwSizeRead, reading not more than dwSizeToRead.
//       This uses m_ck to determine where to start reading from.  So
//       subsequent calls will be continue where the last left off unless
//       Reset() is called.
//-----------------------------------------------------------------------------

Whoa, it's actually commented. So, yeah, it works similar to the UNIX read() function. Basically you take your array (doesn't have to be a BYTE * array) and typecast it as BYTE* (so an array of short int would work if you are working with 16-bit audio). Make sure you are using sizeof() when specifying the dwSizeToRead. pwdSizeRead basically fills that variable with the amount of bytes read

Assuming you are storing an array of short ints (aka 16-bit signed audio):

DWORD sizeread;
HRESULT hr;
.
.
.
hr = waveFile.Read( (BYTE*) buffer, buffer_size * sizeof(short int), &sizeread);

where 8-bit would be

DWORD sizeread;
HRESULT hr;
.
.
.
hr = waveFile.Read( (BYTE*) buffer, buffer_size * sizeof(unsigned char), &sizeread);

Simple as that. I'll put some example code later to help you understand this function.

HRESULT CWaveFile::Write( UINT nSizeToWrite, BYTE* pbSrcData, UINT* pnSizeWrote )
//-----------------------------------------------------------------------------
// Name: CWaveFile::Write()
// Desc: Writes data to the open wave file
//-----------------------------------------------------------------------------

Less documented than read(), but it's very similar. Again, if you want to write a short int buffer to disc, you would use:

DWORD writtenSize;
HRESULT hResult ;

.
.
.
hResult = waveFile.Write(buffer_size * sizeof(short int), (BYTE *) soundBuffer, &writtenSize)

 
Generate a sine wave (for those good at math)

#pragma comment(lib, "winmm.lib")
#pragma comment(lib, "DXUTOpt.lib")
#pragma comment(lib, "DXUT.lib")

#include <windows.h>
#include <SDKWavefile.h>
#include <math.h>

// notice that this is a UNICODE build
//
// Update: I found out that wchar_t **argv might not work for parameters.
//         You may have to change it out for char **argv and then convert it to
//         UNICODE if you want to rig to program to use different file names
int main(int argc, wchar_t **argv)
{
   // declare the important stuffs
   CWaveFile sinwave;
   WAVEFORMATEX sinWaveFormat;
   HRESULT hr;

   const double soundLength = 3;
   const double PI = 3.1415926;

   // fill the struct with all 0s
   ZeroMemory((void*) &sinWaveFormat, sizeof(WAVEFORMATEX));

   // now configure a 16-bit, 44100 hz, mono. PCM (uncompressed .WAV file)
   sinWaveFormat.wFormatTag = WAVE_FORMAT_PCM;
   sinWaveFormat.nSamplesPerSec = 44100;  
   sinWaveFormat.wBitsPerSample =  16;    
   sinWaveFormat.nChannels = 1;           
   sinWaveFormat.nBlockAlign =
        sinWaveFormat.nChannels * (sinWaveFormat.wBitsPerSample / 8);
   sinWaveFormat.nAvgBytesPerSec =
        sinWaveFormat.nBlockAlign * sinWaveFormat.nSamplesPerSec;

   // open the file for writing
   hr = sinwave.Open(L"Sinwave.wav", &sinWaveFormat, NULL);

   if ( SUCCEEDED(hr) == TRUE )
   {
      // all sorts of math I figured out
     
      // how much to increment the period by
      double increment   = 2 * PI / sinWaveFormat.nSamplesPerSec;
     
      // counts where in the array to put stuff
      unsigned int buffercount = 0;

      // and how big the buffer you want
      const unsigned int buffer_size = 1024;

      // and the buffer. Since it's 16-bit, we use a signed short int
      short int    soundBuffer [buffer_size];

      // and this keeps track of how much data is written. I really should check it
      // after it runs
      unsigned int writtenSize;

      // clear the sound buffer just in case. This is a handy trick to debug if something went wrong as well
      ZeroMemory(&soundBuffer, buffer_size * sizeof(short int));
     
      // generate the sin wave. The equation is y(t) = A * sin ( angFreq * t + phase)
      // A = amplitude / t is  time / phase is phase offset / ang freq is the angular frequency
      // source: wikipedia
      for ( double theta = 0; theta < (2 * PI * soundLength); theta += increment )
      {
         soundBuffer[buffercount] = (short int)10000 * sin ( (double) (theta * 880)  );

         buffercount++;

         // dump the buffer
         if ( buffercount == buffer_size )
         {
            sinwave.Write(buffer_size * sizeof(short int), (BYTE *) &soundBuffer, &writtenSize);
            buffercount = 0;
         }
      }

      // dump anything left
      if ( buffercount > 0 )
      {
         sinwave.Write(buffercount * sizeof(short int), (BYTE *) &soundBuffer, &writtenSize);
      }
     
   } // end if ( SUCCEEDED(hresult) == TRUE )

   sinwave.Close();

   system("pause");

   return 42;
}

One note is that I originally tried to use floats, but there turned out to be precision errors! So, doubles worked out far better.

Copy

A good way to figure out an API like this one is to do a copy. Basically you open a file for reading and a file for writing, and copy the contents. If done correctly, the data parts will be 100% identical binarialy. However, if the .wav file contains any header information, it may be discarded in the copy. I found that out from using ding.wav as a test file.

#pragma comment(lib, "winmm.lib")
#pragma comment(lib, "DXUTOpt.lib")
#pragma comment(lib, "DXUT.lib")
#pragma comment(lib, "dxerr.lib")
#include <windows.h>
#include <SDKWavefile.h>
#include <Dxerr.h>
#include <math.h>

#define MAX_BUFFER_SIZE 10240

#include <iostream>
 using std::cout;
 using std::endl;
 using std::wcout;

int main(int argc, char **argv)
{
   if ( argc < 3 )
   {
      cout << "Please specify source and destination files.\n";
      return 0; // this cuts out of the function early. I'm doing this to avoid bracket hell
   }

   HRESULT hresult;

   CWaveFile source;
   WAVEFORMATEX sourceWaveFormat;
   WAVEFORMATEX destWaveFormat;

   CWaveFile dest;

   BYTE buffer[MAX_BUFFER_SIZE];

   wchar_t file  [MAX_PATH]; // note: MAX_PATH is a constant defined in windows.h
   wchar_t file2 [MAX_PATH]; //       which is the maximum amount of characters in a file path. Neat, huh?

   size_t convertedChars = 0;

   DWORD amountRead = 0;
   unsigned int amountWrote = 0;

   // Open the file for reading
   //
   // note: the only flag I found in the SDKWavefile.cpp is WAVEFILE_READ
   //       which tells the function to read the file
   //  
   //       Otherwise, not specifying WAVEFILE_READ looks like it will create
   //                  a new file
  
   // convert the data in argv[1] to UNICODE
   mbstowcs( file, argv[1], MAX_PATH);
   hresult = source.Open(file, NULL, WAVEFILE_READ);

   if ( FAILED(hresult) == TRUE )
   {
      wchar_t error [10240];

      cout << "Error when trying to open file for reading.\n";

      wcscpy(error, DXGetErrorString(hresult));
      wcout << error << endl;

      wcscpy(error, DXGetErrorDescription(hresult));
      wcout << error << endl;

      system("pause");
      return hresult;
   }
  
   memcpy( &sourceWaveFormat, source.GetFormat(), sizeof(sourceWaveFormat));

   wcout << L"Opened file with stats:\n\n";
   wcout << L"sourceWaveFormat.cbSize = " << sourceWaveFormat.cbSize << endl;
   wcout << L"sourceWaveFormat.nAvgBytesPerSec = " << sourceWaveFormat.nAvgBytesPerSec << endl;
   wcout << L"sourceWaveFormat.nBlockAlign = " << sourceWaveFormat.nBlockAlign << endl;
   wcout << L"Channels = " << source.GetFormat()->nChannels << endl;
   wcout << L"Samples/sec = " << sourceWaveFormat.nSamplesPerSec << endl;
   wcout << L"Bits/sample = " << sourceWaveFormat.wBitsPerSample << endl;

   // copy the contents between the two WAVEFORMATEX structures
   memcpy( &destWaveFormat, source.GetFormat(), sizeof(sourceWaveFormat));
  
   // Open the file for writing. Seems like it'll truncate the file to 0 if it exists
   // In other words, consider it overwritten
   //
   // Note that I didn't specify a flag
   mbstowcs( file2, argv[2], MAX_PATH);
   hresult = dest.Open(file2, &destWaveFormat, NULL);
  
   // file couldn't open for some reason. Exit now but give the reason why
   if ( SUCCEEDED(hresult) == FALSE )
   {
      cout << "Error when trying to open file for writing.\n";
     
      wchar_t error [10240];
      wcscpy(error, DXGetErrorString(hresult));
      wcout << error << endl;

      wcscpy(error, DXGetErrorDescription(hresult));
      wcout << error << endl;
      system("pause");
      return hresult;
   }

   // We got to this point. Now begin copying
   do
   {
      // read into the buffer
      hresult = source.Read( buffer, MAX_BUFFER_SIZE, &amountRead );

      // write to disc
      hresult = dest.Write( (unsigned int) amountRead, buffer, &amountWrote );
     
   } while ( amountRead != 0 );

   source.Close();
   dest.Close();

   return 42;
}

Anyways, I hope this guide helps you figure out this nice set of code. Personally, having this guide helps me later down the road after not using it for a while, and I need to do audio manipulation, even if it is for one of my own projects.

Made with Mozilla SeaMonkey

Copyright (C) 2009 by Joe Plante. Code extracted from SDKwavefile.cpp and SDKwavefile.h are copyright by Microsoft
You are free to use any code in this page for your projects. If you post this page on another website (which I don't mind since it's less for me to host), please give the creators credit.

Other sources:
http://msdn.microsoft.com/en-us/library/ee419050%28VS.85%29.aspx