How to read/write HDF5 files in Java

there are 2 things you need for this:

WARNING: You will need JDK 20 because the Java wrapper has been compiled with JDK 20 (when I wrote this blog post). If you have earlier version of JDK you will get a runtime error. Try it. At time of this writing JDK 20 was not available on Debian package repository so I could not install it directly using sudo apt-get install. I first downloaded the .deb package from https://download.oracle.com/java/20/latest/jdk-20_linux-x64_bin.deb and then ran sudo apt-get install passing the .deb file as argument. You will need Maven as well and set the JAVA_HOME environment variable.

Installing Pre-requisites

  1. First I installed the HDF5 package from the official site. It requires you to create an account. Update: alternatively, get the package from here (below link is for arm64 i.e., non-intel CPU):
wget https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.14/hdf5-1.14.1/bin/unix/hdf5-1.14.1-2-Std-macos11m1_64-clang.tar.gz

Run following commands to extract the pre-built binaries and libraries for arm64:

gunzip hdf5-1.14.1-2-Std-macos11m1_64-clang.tar.gz
tar -xvf hdf5-1.14.1-2-Std-macos11m1_64-clang.tar
./hdf/HDF5-1.14.1-Darwin.sh
  1. I moved the package from Downloads to ~/Library/HDF5.
  2. Verify you have a ~/Library/HDF5/1.14.1/lib folder with this file in it: libhdf5_java.dylib (its actually a symlink)
  3. Verify you have below file: ~/Library/HDF5/1.14.1/lib/jarhdf5-1.14.1.jar
  4. Install it to your Maven repository by running following command:
mvn install:install-file -Dfile=/Users/siddjain/Library/HDF5/1.14.1/lib/jarhdf5-1.14.1.jar -DgroupId=hdf.hdf5lib.H5 -DartifactId=hdf5lib -Dversion=1.14.1 -Dpackaging=jar

You can give any values for the -DgroupId, -DartifactId and -Dversion. You just have to use the same values while importing the dependency in pom.xml

  1. Verify there is a jar in ~/.m2/repository/hdf/hdf5lib/H5/hdf5lib/1.14.1
  2. Add reference to hdf5 jar in pom.xml

      hdf.hdf5lib.H5
      hdf5lib
      1.14.1
  

Here make sure you use the same values of groupId, artifactId and version that you used in step 5 otherwise maven will not be able to import the dependency.

When running Java program it should be able to find hdf5_java.jar file in the classpath. On Mac, I was able to do this by setting JAVA_LIBRARY_PATH to the location where hdf5_java.jar is stored but on Ubuntu setting above variable did not help. Instead I had to set the LD_LIBRARY_PATH. On Windows you have to set the PATH. To best of my knowledge and experience, setting the java.library.path system property makes no difference and will not help. What makes this frustrating is that you are likely to read different things on different websites regarding these variables and which is the right one to set. I once ran a program on Ubuntu in AWS. I did not set LD_LIBRARY_PATH but set the JAVA_LIBRARY_PATH and it worked. Then when I ran same program on WSL, it did not work! Setting LD_LIBRARY_PATH worked. In both cases I used same version of JDK (JDK21). If things don’t work as expected, try setting all 3 of them! It cannot hurt. And refer source code that does the loading of the dll if still stuck.

Next, let us come to the code. You can find a collection of code snippets here and here. Also there is a collection of examples you can find in `~/Library/HDF5/1.14.1/share/HDF5Examples/JAVA/. replace the path with the location where you installed HDF5.

Reading HDF5 file

I don’t provide complete code but the below code snippet in combination with references above should get you there for the most part:

import hdf.hdf5lib.H5;
import hdf.hdf5lib.HDF5Constants;

private static float[][] read2DTensor(long fileId, String datasetName) {
        long datasetId = H5.H5Dopen(fileId, datasetName, HDF5Constants.H5P_DEFAULT);
        long dataspaceId = H5.H5Dget_space(datasetId);
        
        // Get the number of dimensions in the dataspace (rank of the tensor)
        int rank = H5.H5Sget_simple_extent_ndims(dataspaceId);

        // Get the length of the tensor along each dimension
        long[] dimensions = new long[rank];
        H5.H5Sget_simple_extent_dims(dataspaceId, dimensions, null);

        // Read the data
        int n = 1;
        for (int i = 0; i < dimensions.length; i++) {
            n *= dimensions[i];
        }

        float[] data = new float[n]; 
        H5.H5Dread(datasetId, HDF5Constants.H5T_NATIVE_FLOAT,
                        HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
                        HDF5Constants.H5P_DEFAULT, data);

        // now convert 1D array into 2D
        int rows = (int) dimensions[0];
        int cols = (int) dimensions[1];
        float[][] dest = new float[rows][];

        for (int i = 0; i < rows; i++) {
            dest[i] = new float[cols];
            System.arraycopy(data, cols * i, dest[i], 0, cols);
        }

        H5.H5Sclose(dataspaceId); // H5Sclose releases a dataspace.
        H5.H5Dclose(datasetId); // H5Dclose ends access to a dataset specified by dataset_id and releases resources used by it.

        return dest;
    }

Writing HDF5 File

Code snippet:

private static long createHdf5File(String filename) {
        return H5.H5Fcreate(filename, HDF5Constants.H5F_ACC_TRUNC,
                    HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT);        
    }

    private static void writeIntArray(long fileId, String datasetName, int[][] data) {
        int rows = data.length;
        int cols = data[0].length;
        long[] dims2D = { rows, cols };
        int[] flattenedArray = convert2Dto1D(data);
        var dataspace_id1 = H5.H5Screate_simple(dims2D.length, dims2D, null);
        var datasetId = H5.H5Dcreate(fileId, datasetName,
                        HDF5Constants.H5T_NATIVE_INT32, dataspace_id1,
                        HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT);
        H5.H5Dwrite_int(datasetId, HDF5Constants.H5T_NATIVE_INT32, 
                        HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL, HDF5Constants.H5P_DEFAULT, flattenedArray);
        H5.H5Dclose(datasetId);
        H5.H5Sclose(dataspace_id1);
    }

That’s it for this post! Let me know what you think.

This entry was posted in Computers, programming, Software and tagged , . Bookmark the permalink.

1 Response to How to read/write HDF5 files in Java

  1. Allen Byrne's avatar Allen Byrne says:

    Just found this after a lot of bad searches! Finally, an article about using the latest HDF5 Java package.

    As one of the HDF5 Java developers, I would like to use this post to help improve our documentation. Please contact us at https://forum.hdfgroup.org/ with any new information/issues.

    Allen

Leave a reply to Allen Byrne Cancel reply