How to read/write HDF5 files in C#

There are 2 things you need for this:

I won’t go into details of .net installation. Let’s get started. The HDF5 library can be installed on Mac in two ways:

Method 1: Using brew (I don’t recommend it). sample install log if you still follow this option. On my mac, the library got installed under /opt/homebrew/lib. You will need this path when running the C# program.

Method 2: From the official site. This will give you more than Method 1. You will have to register first and create an account. Or, you could try downloading from:

wget https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.14/hdf5-1.14.1/bin/unix/hdf5-1.14.1-2-Std-macos11m1_64-clang.tar.gz

Above is for Mac with Apple Silicon (arm64 i.e., non-Intel) CPU. You can see other packages on https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.14/hdf5-1.14.1/bin/unix/.

Steps to install:

gunzip hdf5-1.14.1-2-Std-macos11m1_64-clang.tar.gz
tar -xvf hdf5-1.14.1-2-Std-macos11m1_64-clang.tar
./hdf/HDF5-1.14.1-Darwin.sh

After that I recommend moving the package to ~/Library/HDF5. The static libraries will be under ~/Library/HDF5/1.14.1/lib folder. Double-check and verify.

To use the C# wrapper, add a reference to it in your .csproj project like this:

<PackageReference Include="HDF.PInvoke.1.10" Version="1.10.612" />

When you run dotnet build, the dotnet compiler will download and install the dependency if its not already there. Some tips: I wouldn’t recommend using other C# HDF5 libraries. I tried some of them and they didn’t work out very well for me.

When the C# program is run HDF.PInvoke.1.10 is going to make calls to libhdf5.dylib. The C# program should be able to find this file on your system. On Mac, I had to set DYLD_LIBRARY_PATH environment variable to the the path where libhdf5.dylib is located. If you use VS Code, set this environment variable in launch.json like so:

{
            "name": ".NET Core Launch (console)",
            "type": "coreclr",
            "request": "launch",
            "preLaunchTask": "build",
            "program": "${workspaceFolder}/bin/Debug/net7.0/my-program.dll",
            "args": [],
            "cwd": "${workspaceFolder}",
            "console": "internalConsole",
            "stopAtEntry": false,
            "env": {
                "DYLD_LIBRARY_PATH": "/path/to/libhdf5.dylib"
            }
        }

Next we come to the code itself. Before writing the code, it would be a good idea to familiarize yourself with the HDF5 data format. This can be done by reading the documentation on official HDF5 website.

Reading HDF5 File

I give a few code snippets to help you out. This is not complete code but should get you there for the most part:

using HDF.PInvoke;

...
H5.open();
long fileId = H5F.open(fileName, H5F.ACC_RDONLY);
var train = Hdf5Utils.Read2DTensor<float>(fileId, "train");
var test = Hdf5Utils.Read2DTensor<float>(fileId, "test");
H5F.close(fileId);
H5.close();

where Hfd5Utils.cs is a class I have written.

public static ReadResult<T> Read<T>(long fileId, String dataset) {
            // Open the dataset
            long datasetId = H5D.open(fileId, dataset);

            // Get the datatype and dataspace
            long datatypeId = H5D.get_type(datasetId);
            long dataspaceId = H5D.get_space(datasetId);

            // Get the number of dimensions in the dataspace
            int rank = H5S.get_simple_extent_ndims(dataspaceId);

            // Get the dimensions of the dataspace
            ulong[] dimensions = new ulong[rank];
            H5S.get_simple_extent_dims(dataspaceId, dimensions, null);

            // Read the data
            ulong n = 1;
            for (int i = 0; i < dimensions.Length; i++) {
                n *= dimensions[i];
            }

            T[] data = new T[n]; 
            GCHandle handle = GCHandle.Alloc(data, GCHandleType.Pinned);
            H5D.read(datasetId, datatypeId, H5S.ALL, H5S.ALL, H5P.DEFAULT, handle.AddrOfPinnedObject());
            handle.Free();

            // Close the dataset, dataspace, and file
            H5D.close(datasetId);
            H5S.close(dataspaceId);

            return new ReadResult<T> {
                dimensions = dimensions,
                data = data
            };
        }

Writing HDF5 File

Again, I give most relevant code snippets which should get you there for the most part:

hid_t outFileId = H5F.create(outFileName, H5F.ACC_TRUNC);
ulong[] dimensions = new ulong[] {(ulong)n, (ulong)m};
Hdf5Utils.WriteDataset<int>(outFileId, "labels", labels, dimensions);
Hdf5Utils.WriteDataset<float>(outFileId, "distances", distances, dimensions);
H5F.close(outFileId);
public static void Write2DTensor<T>(long fileId, String dataset, T[][] data) {
            var array = flatten<T>(data);
            string datasetName = "myDataset";
            ulong[] dimensions = { (ulong)data.Length, (ulong)data[0].Length }; // 2D array
            hid_t dataType = H5T.copy(GetDatatype(typeof(T))); 
            hid_t dataspaceId = H5S.create_simple(dimensions.Length, dimensions, null); // Create dataspace

            // Create the dataset
            hid_t datasetId = H5D.create(fileId, datasetName, dataType, dataspaceId, H5P.DEFAULT, H5P.DEFAULT, H5P.DEFAULT);
            unsafe {
                fixed (T* dataPtr = array) {
                    H5D.write(datasetId, dataType, H5S.ALL, H5S.ALL, H5P.DEFAULT, new IntPtr(dataPtr));
                }            
            }

            H5D.close(datasetId);
            H5S.close(dataspaceId);
            H5T.close(dataType); // https://github.com/HDFGroup/HDF.PInvoke/wiki/Cookbook-:-Strings
        }

Hope you find it useful. Let me know what you think.

This entry was posted in Computers, programming, Software and tagged , . Bookmark the permalink.

Leave a comment