There are 2 things you need for this:
- The HDF5 library
- The C# wrapper that makes calls to this library
I won’t go into details of .net installation. Let’s get started. The HDF5 library can be installed on Mac in two ways:
Method 1: Using brew (I don’t recommend it). sample install log if you still follow this option. On my mac, the library got installed under /opt/homebrew/lib. You will need this path when running the C# program.
Method 2: From the official site. This will give you more than Method 1. You will have to register first and create an account. Or, you could try downloading from:
wget https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.14/hdf5-1.14.1/bin/unix/hdf5-1.14.1-2-Std-macos11m1_64-clang.tar.gz
Above is for Mac with Apple Silicon (arm64 i.e., non-Intel) CPU. You can see other packages on https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.14/hdf5-1.14.1/bin/unix/.
Steps to install:
gunzip hdf5-1.14.1-2-Std-macos11m1_64-clang.tar.gz
tar -xvf hdf5-1.14.1-2-Std-macos11m1_64-clang.tar
./hdf/HDF5-1.14.1-Darwin.sh
After that I recommend moving the package to ~/Library/HDF5. The static libraries will be under ~/Library/HDF5/1.14.1/lib folder. Double-check and verify.
To use the C# wrapper, add a reference to it in your .csproj project like this:
<PackageReference Include="HDF.PInvoke.1.10" Version="1.10.612" />
When you run dotnet build, the dotnet compiler will download and install the dependency if its not already there. Some tips: I wouldn’t recommend using other C# HDF5 libraries. I tried some of them and they didn’t work out very well for me.
When the C# program is run HDF.PInvoke.1.10 is going to make calls to libhdf5.dylib. The C# program should be able to find this file on your system. On Mac, I had to set DYLD_LIBRARY_PATH environment variable to the the path where libhdf5.dylib is located. If you use VS Code, set this environment variable in launch.json like so:
{
"name": ".NET Core Launch (console)",
"type": "coreclr",
"request": "launch",
"preLaunchTask": "build",
"program": "${workspaceFolder}/bin/Debug/net7.0/my-program.dll",
"args": [],
"cwd": "${workspaceFolder}",
"console": "internalConsole",
"stopAtEntry": false,
"env": {
"DYLD_LIBRARY_PATH": "/path/to/libhdf5.dylib"
}
}
Next we come to the code itself. Before writing the code, it would be a good idea to familiarize yourself with the HDF5 data format. This can be done by reading the documentation on official HDF5 website.
Reading HDF5 File
I give a few code snippets to help you out. This is not complete code but should get you there for the most part:
using HDF.PInvoke;
...
H5.open();
long fileId = H5F.open(fileName, H5F.ACC_RDONLY);
var train = Hdf5Utils.Read2DTensor<float>(fileId, "train");
var test = Hdf5Utils.Read2DTensor<float>(fileId, "test");
H5F.close(fileId);
H5.close();
where Hfd5Utils.cs is a class I have written.
public static ReadResult<T> Read<T>(long fileId, String dataset) {
// Open the dataset
long datasetId = H5D.open(fileId, dataset);
// Get the datatype and dataspace
long datatypeId = H5D.get_type(datasetId);
long dataspaceId = H5D.get_space(datasetId);
// Get the number of dimensions in the dataspace
int rank = H5S.get_simple_extent_ndims(dataspaceId);
// Get the dimensions of the dataspace
ulong[] dimensions = new ulong[rank];
H5S.get_simple_extent_dims(dataspaceId, dimensions, null);
// Read the data
ulong n = 1;
for (int i = 0; i < dimensions.Length; i++) {
n *= dimensions[i];
}
T[] data = new T[n];
GCHandle handle = GCHandle.Alloc(data, GCHandleType.Pinned);
H5D.read(datasetId, datatypeId, H5S.ALL, H5S.ALL, H5P.DEFAULT, handle.AddrOfPinnedObject());
handle.Free();
// Close the dataset, dataspace, and file
H5D.close(datasetId);
H5S.close(dataspaceId);
return new ReadResult<T> {
dimensions = dimensions,
data = data
};
}
Writing HDF5 File
Again, I give most relevant code snippets which should get you there for the most part:
hid_t outFileId = H5F.create(outFileName, H5F.ACC_TRUNC);
ulong[] dimensions = new ulong[] {(ulong)n, (ulong)m};
Hdf5Utils.WriteDataset<int>(outFileId, "labels", labels, dimensions);
Hdf5Utils.WriteDataset<float>(outFileId, "distances", distances, dimensions);
H5F.close(outFileId);
public static void Write2DTensor<T>(long fileId, String dataset, T[][] data) {
var array = flatten<T>(data);
string datasetName = "myDataset";
ulong[] dimensions = { (ulong)data.Length, (ulong)data[0].Length }; // 2D array
hid_t dataType = H5T.copy(GetDatatype(typeof(T)));
hid_t dataspaceId = H5S.create_simple(dimensions.Length, dimensions, null); // Create dataspace
// Create the dataset
hid_t datasetId = H5D.create(fileId, datasetName, dataType, dataspaceId, H5P.DEFAULT, H5P.DEFAULT, H5P.DEFAULT);
unsafe {
fixed (T* dataPtr = array) {
H5D.write(datasetId, dataType, H5S.ALL, H5S.ALL, H5P.DEFAULT, new IntPtr(dataPtr));
}
}
H5D.close(datasetId);
H5S.close(dataspaceId);
H5T.close(dataType); // https://github.com/HDFGroup/HDF.PInvoke/wiki/Cookbook-:-Strings
}
Hope you find it useful. Let me know what you think.