Refactor JNI code in C++ into Java code with JavaCPP#18
Refactor JNI code in C++ into Java code with JavaCPP#18karllessard merged 6 commits intotensorflow:masterfrom
Conversation
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/SavedModelBundle.java
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/SavedModelBundle.java
Outdated
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/SavedModelBundle.java
Outdated
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/SavedModelBundle.java
Outdated
Show resolved
Hide resolved
| throw new IndexOutOfBoundsException("MetaGraphDef is too large to serialize into a byte[] array"); | ||
| } else { | ||
| byte[] jmetagraph_def = new byte[(int)metagraph_def.length()]; | ||
| new BytePointer(metagraph_def.data()).get(jmetagraph_def); |
There was a problem hiding this comment.
I suggest we move this array copy stunt in AbstractTF_Buffer as a utility method like toBytes(), since TF_Buffer is meant to be generic.
Alternatively, we could retain a reference to metaGraphDef and return its data as a ByteBuffer to the user instead of an array, as it is only used for deserializing the proto message... But I'm wondering why we just don't import those proto classes in the core API and return typed objects to the user like MetaGraphDef instead of leaving him the burden to serialize/deserialize the proto messages.
What do you think @sjamesr ? Is the original reason of leaving the protos outside the Java client was only to avoid an extra dependency to grpc?
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/SavedModelBundle.java
Outdated
Show resolved
Hide resolved
...core/tensorflow-core-api/src/main/java/org/tensorflow/internal/c_api/AbstractTF_Session.java
Show resolved
Hide resolved
|
@saudet , this PR is still marked as WIP, looks like it's ready to be merged now? |
|
I was thinking we could do all of it together before merging? It shouldn't take me more than a few days to do... |
|
As you wish, we can do it iteratively with smaller PRs as well |
|
Let's talk about it at the meeting tomorrow :) |
dd6373b to
55b7b5b
Compare
karllessard
left a comment
There was a problem hiding this comment.
Looks good @saudet , I just left a few questions and remarks here and there.
I didn't compared yet the old JNI code with the new Java implementations but I think we can also rely on the Junit coverage for this.
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/EagerOperation.java
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/EagerOperation.java
Outdated
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/EagerOperation.java
Show resolved
Hide resolved
| int length = TFE_OpGetOutputLength(handle, name, status); | ||
| status.throwExceptionIfNotOK(); | ||
| return length; | ||
| } |
There was a problem hiding this comment.
Instead of starting new scopes in all of these methods, could it be simpler and more efficient to just create the status in a try-with-resource block when there are not other resource allocated?
try (TF_Status status = TF_Status.newStatus()) {
...
}There was a problem hiding this comment.
Yes, it would be more efficient, but it would also make it more error-prone when we start creating other objects in there that may start doing temporary allocations.
There was a problem hiding this comment.
Yeah... I would still prefer we go with the most efficient approach but it's up to you if you want to make the changes or not, we can merge it like this too.
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Graph.java
Outdated
Show resolved
Hide resolved
| outputTensorHandles[i] = outputValues.get(TF_Tensor.class, i); | ||
| } | ||
|
|
||
| return runMetadata != null ? runMetadata.get() : null; |
There was a problem hiding this comment.
Question: did you ever ran a benchmark showing that all these context switches between the JVM and the native code (i.e. at each JavaCPP generated method) ends up to be as performant as when only one call was made (run in this case)?
There was a problem hiding this comment.
Calls from the JVM to native code are very efficient, in the order of 30 ns. On the other hand, calls from native code to the JVM are typically very expensive, in the order of 300 ns, so it's almost certain that the new code here is going to be faster. I'll run a simple benchmark and post the results here just to confirm, but if you're worried about performance, we should think about writing a whole set of benchmarks to make sure there is never any regression in performance anywhere.
There was a problem hiding this comment.
I've started to write some of them as well (like this one), I think with high-performance projects like TF we should have a set of benchmarks, not only it is useful to detect regression but also is a very good indicator of where we should spend time for optimization.
If you can add some, that would be great, but we can merge without if you want to create these later.
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Tensor.java
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Tensor.java
Outdated
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Tensor.java
Outdated
Show resolved
Hide resolved
tensorflow-core/tensorflow-core-api/src/test/java/org/tensorflow/TensorTest.java
Outdated
Show resolved
Hide resolved
|
I ran the
So it does look like this refactoring also increases performance! |
karllessard
left a comment
There was a problem hiding this comment.
So it does look like this refactoring also increases performance!
Great!
karllessard
left a comment
There was a problem hiding this comment.
All right, I'm merging this. There is a few unresolved conversation left that are not mandatory to be closed right now and could be part of another PR if needs be.
* Move NdArray library to subfolder * Add missing dependencies * Fix settings.xml path * Kotlin friendly names (Shape.get) * Fix bug when slicing on a segmented dimension (#2) * Sparse tensor (#3) * Allow SparseNdArray impls to be inherited (#5) * Better examples in Sparse array documentation (#6) * Build on JDK11 by default (#7) * Add missing export * Adding toString to AbstractDenseNdArray and AbstractSparseNdArray (#8) * Test Java copyFrom Ok * Test Java copyFrom - trying to replicate Scala error * Test Java copyFrom - trying to replicate Scala error v2 * Added basic index tests (rank 2) * Added module-info to tests * Module-info for tests use the same module name of src * Value streaming for NdArrays (#15) * Release 0.4.0 * Prepare next iteration * Viewing arrays with different shapes (#18) * Rename read/write to copyTo/From (#19) * Releasing 1.0.0-rc.1 * Increase version for next iteration * Move ndarray to tensorflow-java * Apply spotless --------- Co-authored-by: Ryan Nett <JNett96@gmail.com> Co-authored-by: Jim Clarke <JimClarke5@me.com> Co-authored-by: Adam Pocock <craigacp@gmail.com> Co-authored-by: hmf <hugo6ferreira@gmail.com> Co-authored-by: Adam Pocock <adam.pocock@oracle.com>
This is a WIP to remove the manually written JNI code from tensorflow-core-api. In the first commit for SavedModelBundle.java, I replaced about 100 lines of JNI code in C++ with about 40 lines of code in Java, and the unit tests still pass. We need to do the same with the rest of the JNI code, which shouldn't take me that long to do by myself, but if anyone is interested in helping, please let me know and I will give you write access to my fork!
There is also a lot of code in the Java section of the wrappers that is redundant with JavaCPP and that we will be able to remove in a second phase.
/cc @tzolov