Conversation
d4d1adf to
037a2ca
Compare
| <groupId>org.tensorflow</groupId> | ||
| <artifactId>nio-utils</artifactId> | ||
| <version>${project.version}</version> | ||
| </dependency> |
There was a problem hiding this comment.
That feels strange. Why do we need this dependency here? Is it just because that we can't have a separate module until we move away from the code generator in C++?
There was a problem hiding this comment.
Tensor are now based on NdArray for mapping their memory and allowing direct access to it, so it must depend on nio-utils. I'm not sure to understand what is strange, you meant that memory mapping should occur outside the core API?
There was a problem hiding this comment.
I think it's just the names that are confusing me. When I read something like tensorflow-core, I think like this means the actual "core" without dependencies on other modules. Maybe it's just that tensorflow-nio should really be named something else and not "TensorFlow"? That might also help with adoption from other projects. That's probably something we should continue on the mailing list you created about that and one of the first things to clear up with guys from MXNet, DL4J, etc.
There was a problem hiding this comment.
Exactly, that's why I renamed the artifact tensorflow-nio-utils to nio-utils (though I'm not a huge fan of that name neither so if anyone comes up with a better one...). So the tensorflow you've noticed there is just to mention our organization in the group and does not name the artifact itself.
There was a problem hiding this comment.
ndarray-core? I think nio-utils is likely to make people think of java.nio.* and as the language moves away from those classes (and they are 15+ years old at this point), calling them "new" is something of a misnomer.
There was a problem hiding this comment.
I agree, we will group artifacts if there is a need but right now, I cannot think of any other utility library than this one.
There was a problem hiding this comment.
I think there should be a functional need to group things together. If it's just to put them in abstract categories, it will make it hard to decide which category each module belongs to, possibly without any actual benefits. Maven doesn't even put the parent name in artifact coordinates, so unless we do it explicitly like with tensorflow-core -> tensorflow-core-api, etc then the benefits are even more marginal.
There was a problem hiding this comment.
Ok, here is what I ended up to do: I preserved the tensorflow-utils name but now it contains the code of the library itself, including DataBuffer and NdArray APIs.
So if in the future it happens that we have more small utility classes like these that we would like to share with the world and that are independent from TF runtime, they will end up in this library as well (kind of a Guava for ML).
There is a lot of presumptions and questions that makes it hard to pick the right name for this library right away, so let's simply rename it in the future if we need to.
There was a problem hiding this comment.
Sounds good, but let's name it tensorflow-util just to be consistent with the package name? Or inversely, let's name the package org.tensorflow.utils? I don't have preference for either, I just think that consistency is a good thing to have whenever it makes sense.
There was a problem hiding this comment.
I think it is fine to have them mismatch here, tensorflow-utils sounds more natural for the name of library (as there is more than one utility class in it) while package names are often singular by convention (e.g. java.util.*).
In addition, we don't have a org.tensorflow.core package neither in our core artifacts.
| <compilerOption>${project.basedir}/src/main/native/server_jni.cc</compilerOption> | ||
| <compilerOption>${project.basedir}/src/main/native/session_jni.cc</compilerOption> | ||
| <compilerOption>${project.basedir}/src/main/native/tensorflow_jni.cc</compilerOption> | ||
| <compilerOption>${project.basedir}/src/main/native/tensor_buffers_jni.cc</compilerOption> |
There was a problem hiding this comment.
I thought everyone was OK with moving away from writing JNI manually. Why do we need this?
There was a problem hiding this comment.
Because I need to create an instance of ByteBuffer out of a tensor native address and size in case Unsafe is not available, and the only way I know of is to do it from JNI. Do you have another way to do it?
There was a problem hiding this comment.
This is going to do the same as the JNI code in that file:
TF_TensorData(nativeTensor).capacity(TF_TensorByteSize(nativeTensor)).asByteBuffer()Along with a couple of null checks there too if needed.
There was a problem hiding this comment.
Ok I'll give it a try, thanks
There was a problem hiding this comment.
Great, that works, bye bye JNI...
| @@ -0,0 +1,3 @@ | |||
| package org.tensorflow.types.family; | |||
|
|
|||
| public interface TType {} | |||
There was a problem hiding this comment.
Why do we need empty interfaces like this? That's always a code smell to me... They probably have some functionality in common. BTW, we have default methods too starting with Java SE 8. Can we use those?
There was a problem hiding this comment.
This is just to make sure that not all Java types can be passed as a generic type of a Tensor, like it is enforced here:
public static <T extends TType> Tensor<T> create(Object obj, DataType<T> dtype) {...}
It very soft compile-time type safety but better than nothing, I think.
There was a problem hiding this comment.
In a future version of Java we will be able to seal this interface so it only has the implementations that we specify, which will make it more useful & reliable (as we'll be able to switch on the subclasses with exhaustiveness checking, and users won't be able to do their own implementations that our code can't work with).
There was a problem hiding this comment.
Question for you @Craigacp, will the sealed interface work in Java the same way as in Kotlin, meaning that will only interfaces from the same package (or the same file for Kotlin) be allowed to extend from it or will there be a different safety mechanism?
There was a problem hiding this comment.
This is the current JEP for the sealed interfaces feature - https://openjdk.java.net/jeps/360. It requires them to be in the same package or module, and listed in the permits clause of the interface/class. If the subclasses are all in the same java file, then they don't have to be listed in the permits clause (but that's probably not how we'll do it).
| Class.forName("sun.misc.Unsafe"); | ||
| unsafeAvailable = true; | ||
| } catch (ClassNotFoundException e) { | ||
| unsafeAvailable = false; |
There was a problem hiding this comment.
This is misleading. sun.misc.Unsafe is often available in other implementations of Java SE, but it doesn't always contain all the methods from OpenJDK. We need to make sure it has all the methods we want like this:
https://github.com/bytedeco/javacpp/blob/master/src/main/java/org/bytedeco/javacpp/indexer/UnsafeRaw.java
There was a problem hiding this comment.
Thanks, I'll revise this
|
What's the copyright on the 1500x916.jpg file? I didn't see it mentioned anywhere, but that needs to be clear to put anything in a repository from Google. |
On the I was inspired by this: if you think that |
Oops! I had a reminder to change this one for one of my own picture but clearly I forgot, thanks for pointing that out! |
Ah, so the |
Ok. Like I said, we could rename those in |
|
BTW group, I was thinking to push a new version that renames the following methods (and their siblings):
Do you have any preference? |
|
@karllessard No preferences, but we should try to go with unwritten conventions people are used to: |
|
I'm fine with those renamings, they seem fairly idiomatic. |
b35bbeb to
6579534
Compare
|
Just to let you know that I've just pushed a new version of the library with all your recommendations. |
a9ff547 to
a8b7996
Compare
|
Yes, so what about org.tensorflow.ndarray? That would make it more
consistent in a way.
|
Craigacp
left a comment
There was a problem hiding this comment.
Other general points:
- Should a scalar ndarray have a tag interface?
- Might want to use the code gen pattern from Java Vector API (https://github.com/openjdk/panama/blob/vectorIntrinsics/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/gen-src.sh) to emit the buffers/ndarrays of a particular kind (raw, jdk, etc) from a single source file. Reduces the amount of things that can go wrong in a refactor.
- Can we get remove UnsafeReference from the public API? I'm uncomfortable with public methods that need it as an argument.
- Is there a relationship between buffer.Validator and ndarray.Validator? They feel like they have a lot of duplicated code.
- Needs more javadoc (in the impl classes mainly). I understand how irritating it is to say that too, trust me, but bringing in new people to work in this codebase will be easier with it.
| import org.tensorflow.types.TInt64; | ||
| import org.tensorflow.types.TString; | ||
| import org.tensorflow.types.TUInt8; | ||
| import org.tensorflow.types.family.TType; | ||
|
|
||
| /** | ||
| * A statically typed multi-dimensional array whose elements are of a type described by T. |
There was a problem hiding this comment.
Can we more strongly type Tensor to Tensor<T extends TType>?
There was a problem hiding this comment.
That is what I tried at first but to discover than this enforcement became quickly viral and that it needs to be everywhere in our methods, classes, wrappers, frameworks and, ultimately, user code.
But since TType does not expose any method and since we enforce it at the beginning of the chain (i.e. at the creation of the Tensor), I realized that there is no real gain carrying it afterward. We just want to make sure that tensors are created using one of our supported data types.
I think our users will appreciate that we relax a bit this rule.
There was a problem hiding this comment.
The advantage of doing it is that the erased type bound of Tensor becomes Tensor<TType> which makes it tricker for people to skirt the generics to do things that they shouldn't be doing. I agree it seems like it's everywhere in the codebase, and it does ripple a little into user code, but there is some benefit to it.
There was a problem hiding this comment.
I'm aware of the benefits of enforcing these safety checks, I've always been a defender of it, but I think we can make an exception here. I don't care that much of the verbosity in our code base but I'm concerned about the user code.
I guess writing down a quick example may help us to take the right decision. Meanwhile, I'm still not sure to understand what wrong can happen if we leave the generic type parameter unbounded after creating the tensor, other than not being a good practice. Do you have a specific scenario in mind?
| t.buffer().put(data); | ||
| return t; | ||
| } | ||
|
|
||
| public static <T extends TType> Tensor<T> allocate(DataType<T> dtype, Shape shape) { |
There was a problem hiding this comment.
Needs some javadoc. Looks like this makes an empty buffer, but what does that mean for types like String? Are they populated by the null reference or the empty string etc?
There was a problem hiding this comment.
I have to admit that the need to add a proper Javadoc to all endpoints affected by this PR is on my TODO list. I was waiting for some people to try it our first but then the SIG decided we should better merge it and fix it after. Are you still OK with this?
There was a problem hiding this comment.
Yeah, I'm fine with you merging it without the extra javadoc. We need to make sure it's there before there is a release with this code in though, so maybe file an issue for it?
There was a problem hiding this comment.
good idea, I'll create an issue right after merging
| @@ -287,21 +287,20 @@ | |||
| // Helper function to allocate a Tensor for the create() methods that create a Tensor from | |||
| // a java.nio.Buffer. | |||
| // Requires: dataType matches T | |||
| private static <T> Tensor<T> allocateForBuffer(DataType dataType, long[] shape, int nBuffered) { | |||
| final int nflattened = numElements(shape); | |||
| private static <T extends TType> Tensor<T> allocateForBuffer(DataType dataType, long[] dimSizes, int nBuffered) { | |||
There was a problem hiding this comment.
Can we type DataType with <T> here? And in a bunch of other places. Or alternatively <? extends TType>.
There was a problem hiding this comment.
It should definitely extend <T>, those are left overs from previous implementation, I'm surprised my IDE is not giving me any warning about it... I'll fix it, thanks!
| DataType(int value, int byteSize) { | ||
| this.value = value; | ||
| this.byteSize = byteSize; | ||
| public static <T> DataType<T> create(String name, int value, int byteSize, TensorMapper<T> tensorMapper) { |
There was a problem hiding this comment.
Stronger type? <T extends TType>.
There was a problem hiding this comment.
Mmh, I don't remember if that one was as hemorrhagic as with Tensor, I'll give it a try and let you know.
| * @param shape the shape of the tensor | ||
| * @return data structure of elements of this type | ||
| */ | ||
| T apply(TF_Tensor nativeTensor, Shape shape); |
There was a problem hiding this comment.
This exposes the JavaCPP TF_Tensor type into the public API. Can we fix that?
There was a problem hiding this comment.
It is not "really" exposed. This interface is only used internally but it has to be public unfortunately because it is shared between packages.
What I can do is to move it under org.tensorflow.internal.buffer so at least it is known to be internal. And since it does not return a TF_Tensor but accepts one, there is no major "leak" neither.
There was a problem hiding this comment.
It's times like this that I understand why C++ has friend classes.
| this.start = start; | ||
| } | ||
|
|
||
| private long start; |
There was a problem hiding this comment.
These fields should be final, it signals to the JIT that it can make more optimisations (though I dunno if it actually will do).
| * that select which elements on a given dimension should be included/excluded | ||
| * from that view. | ||
| */ | ||
| public interface Index { |
There was a problem hiding this comment.
Is there a way to make this interface functional or more constant? I'm a little worried that the JVM won't be able to optimise all these small objects properly (until they become value types). Maybe we could make them ValueBasedClasses? The indices provide conceptually similar functionality to VarHandle from Java 9, which Paul wrote, so maybe his input would be useful on these classes.
There was a problem hiding this comment.
Ok, let's do this as a follow up.
|
|
||
| @Override | ||
| public boolean equals(Object obj) { | ||
| return false; // All unknown dimensions are distinct |
There was a problem hiding this comment.
Shouldn't they have reference equality? The hashcode does.
There was a problem hiding this comment.
This class is a remnant of a previous implementation and is not used anymore, I'll remove it
| */ | ||
| @Override | ||
| public int hashCode() { | ||
| return (int) numElements(); |
There was a problem hiding this comment.
If we care about hashcode for this we should apply some mixing function first. Integer's hashcode is unfortunately terrible but I think it's dictated by the spec.
There was a problem hiding this comment.
Totally agree, I'll fix it
|
|
||
| @Override | ||
| public double getDouble(long... indices) { | ||
| return buffer().getDouble(positionOf(indices, true)); |
There was a problem hiding this comment.
Some of these methods go through buffer(), some access the buffer field directly. It should pick one way, or document that they behave differently (as DoubleDenseNdArray isn't final, so could be subclassed).
There was a problem hiding this comment.
buffer() is only there to let the superclass access the buffer stored in the subclass. Again, this has changed so many times... I'll sanitize it, thanks!
@saudet , I still think that the actual naming is better. Right now, the package tree looks like this: It make sense because the buffer API can actually be used with no ndarrays. Also, I have moved the Also, just to clarify my previous post, I brought the counterexample of |
@Craigacp : might be hard to enforce in all cases. For example, for a 2x2 matrix, both
Sounds interesting, I'll take a look, thanks
I guess the difference can be subtle, I will check
Yes, like I said in a previous comment, I postponed that task for later as this work is still known as "in progress" but I agree it should become a priority. |
Ah, yeah I'd forgotten that the accessor took a varargs so you couldn't have a sharper return type. Not that the type system is quite powerful enough to do the best thing there anyway. As scalars don't have dimension then I feel like that should be represented in the type system so you can specialise when being passed a scalar (i.e. immediately unboxing it), but it's hard to get methods that emit the right types given the way the library is setup.
I understand, but anything that's public is part of the public API as far as users are concerned. I've written impl classes with big warnings on them, but unless you can lock them off with a module system or another access control people will still see them in the javadoc and use them for things you weren't expecting. |
I think the extra package is fine (it allows to split the module into |
Yes, I like ...and we would like to thank all our supporters, friends and family :) |
It would be possible to add a different method (something like
I can shuffle things a bit to mark these methods as |
Sure
Protected would work. I don’t consider this a security thing, more an api cleanliness one. We don’t want people to start incorporating UnsafeReference as an field or argument for their classes as then we can’t remove it without it being a breaking change, and it’s really an internal implementation detail. |
Ah, I remember what you said. You're basically considering the |
|
Well it's only the DataBuffer classes that need to be implemented separately for things like The |
|
The idea of passing the We all know that there is no perfect solution here neither. I talked with Maurizio from Panama and on their side, they focus to enforce the safety of native memory mapping from the Now I'm experimenting something new: having a non-final |
|
That sounds really nice, but you guys have to realize that all this running around in circles prevents end users from actually using those tools for any libraries not explicitly supported, including all those distributed with the JavaCPP Presets: One of the reasons why NumPy works so well is that it facilitates data sharing between libraries, but this implementation of ndarray here doesn't...If I'm missing something though, please enlighten me. |
It does not prevent any of this, libraries just need to extend from this factory class and make use of the class TensorRawDataBufferFactory extends RawDataBufferFactory {
static ByteDataBuffer mapTensorToBytes(Pointer tensorMemory) {
return mapNativeBytes(tensorMemory.address(), tensorMemory.capacity(), false);
}
static IntDataBuffer mapTensorToInts(Pointer tensorMemory) {
return mapNativeInts(tensorMemory.address(), tensorMemory.capacity(), false);
}
static LongDataBuffer mapTensorToLongs(Pointer tensorMemory) {
return mapNativeLongs(tensorMemory.address(), tensorMemory.capacity(), false);
}
static FloatDataBuffer mapTensorToFloats(Pointer tensorMemory) {
return mapNativeFloats(tensorMemory.address(), tensorMemory.capacity(), false);
}
static DoubleDataBuffer mapTensorToDoubles(Pointer tensorMemory) {
return mapNativeDoubles(tensorMemory.address(), tensorMemory.capacity(), false);
}
static StringTensorBuffer mapTensorToStrings(Pointer tensorMemory, long numElements) {
long offsetByteSize = numElements * Long.BYTES;
LongDataBuffer offsets = mapNativeLongs(tensorMemory.address(), numElements, false);
ByteDataBuffer data = mapNativeBytes(
tensorMemory.address() + offsetByteSize,
tensorMemory.capacity() - offsetByteSize,
false);
return new StringTensorBuffer(offsets, data);
}
}BTW, I'm not sure why you are referring to NumPy in this case, the actual discussion is about how native memory should be mapped initially from Java and has no impact on how data can be shared thereafter or how we can manipulate it. |
The aim is to make a Java type for ndarrays, not to wrap every C library's version of an ndarray into the same interface. Supporting every C library's view of memory is likely to cause issues (e.g. at the moment it doesn't support different storage orders).
All the Python libraries use NumPy arrays, or implement the same interface (via duck typing). So the libraries are built on numpy and opted into it's representation. This API will also require opting into it's representation by implementing it's interfaces if you want to store memory in a different way to how the current implementation does (which should at most be the buffer interface) so I'm not sure what the difference is? |
Ok, I might have misunderstood the original question then. FYI, it is also possible to change the layout of your data in memory just by providing a |
But for things like GPU memory vs CPU memory (where you need to apply a native hook to get the pointer into the right address space) it's probably best that they implement a new |
No, that's not true. It's very easy to create a NumPy ndarray on top of any buffer: |
But the equivalent of the buffer protocol is what this PR provides? Java doesn't have a low level buffer which has multidimensional indexing, that's why Karl wrote one. It's as much effort to implement the Python buffer as it is to implement this one so I'm not sure how it's easier in Python (beside the fact that everyone has already written the code in Python, and we're just starting to write it here). |
|
FYI, I just pushed a new version with all requested changes except the bounded types in For your concerns @saudet , I'm still not sure what TF tools cannot handle that you are proposing; with the buffer and data layout interfaces, you can customize pretty much the backing store of an ndarray the way you like. But again it might be my misunderstanding, so I suggest that once the code is merged, we can try to map some buffers of the C libraries you were referring to and then it might become more clear if something is missing and what it is. |
73a3af8 to
a100ea2
Compare
Ok, I've bound types in generics to So if everyone agrees on those changes, I'll merge the branch by the end of this week, thanks |
The closest equivalent to the buffer protocol in Java is
It's not important for TensorFlow for Java itself because we have access to all the internal APIs. It just makes it hard to use for anything outside this library since all the glue code is not made available publicly, that's all. But that's something we can update later on. If no one ends up using this API outside TensorFlow because it's hard to use, then that's fine too. Something else will eventually come around. |
bc959c7 to
4cd0760
Compare
|
PR has been merged based on other members reviews. |
Update after losses merge
* Move NdArray library to subfolder * Add missing dependencies * Fix settings.xml path * Kotlin friendly names (Shape.get) * Fix bug when slicing on a segmented dimension (#2) * Sparse tensor (#3) * Allow SparseNdArray impls to be inherited (#5) * Better examples in Sparse array documentation (#6) * Build on JDK11 by default (#7) * Add missing export * Adding toString to AbstractDenseNdArray and AbstractSparseNdArray (#8) * Test Java copyFrom Ok * Test Java copyFrom - trying to replicate Scala error * Test Java copyFrom - trying to replicate Scala error v2 * Added basic index tests (rank 2) * Added module-info to tests * Module-info for tests use the same module name of src * Value streaming for NdArrays (#15) * Release 0.4.0 * Prepare next iteration * Viewing arrays with different shapes (#18) * Rename read/write to copyTo/From (#19) * Releasing 1.0.0-rc.1 * Increase version for next iteration * Move ndarray to tensorflow-java * Apply spotless --------- Co-authored-by: Ryan Nett <JNett96@gmail.com> Co-authored-by: Jim Clarke <JimClarke5@me.com> Co-authored-by: Adam Pocock <craigacp@gmail.com> Co-authored-by: hmf <hugo6ferreira@gmail.com> Co-authored-by: Adam Pocock <adam.pocock@oracle.com>
This PR merges the first draft of Tensor NIO into our master repository so that it can be tried out and tested by a larger audience. Here is a minimalist example of usage (more examples can be also found in the unit tests):
Some additional notes:
Tensorclass does not carry a Java boxed type, likeInteger, but a custom tensor type, likeTInt32. This allows us to support more types in Java while still guaranteeing compile-time safety check of operand type compatibility. A lot of the changes seen in this PR are related to this.c_apipackage has been moved toorg.tensorflow.internal, next to the new packagebufferthat holds new Tensor buffer mapping utilities.@Operatorannotation for generating*Opsclasses. For this reason, methods are moved within the classes at each compilation, as you can observe in this PR.