You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Generalizes #260 from @karpathy to accept arbitrary L_p norms.
A few remarks:
Maybe LpNormalize or Normalize is a better name ?
Only accepts 1D/2D inputs. @nicholas-leonard proposed to add a dim parameter to make it equivalent to torch.norm. Is it worth it given that we have SpatialBatchNormalization ?
updateGradInput is quite memory consuming for large d.
@szagoruyko Maybe it's not too difficult to extend it to work for other dimensions, by viewing the input to be 2D with the last dimension the desired normalized dimension, but one has to take care about batched/non-batched inputs as well. Maybe we could add a setNumInputDims function, as in nn.View ?
I'll spend some more time trying to make it more generic.
Why is it obvious that the name should be Normalize and not Norm? As far as I can tell almost all operations from torch are ported without name changes to nn layers. E.g. analogous to max operation there is a nn.Max, so why is it obvious that norm operation should be nn.Normalize? Shouldn't we have nn.Maximize then? My first instinct would be to stick with the current naming for naming consistency.
Good point. Almost suggest that there should be both nn.Norm that does exactly what norm does, and then nn.Normalize that also does a div right afterwards. But perhaps that gets a bit too hairy then :)
It's good. I just need to squash the commits.
I didn't had the time to add a dimension parameter though, it would complicate a bit the logic because of the setNumInputDims function. But it could be added later if needed.
Lots of memory is needed in backprop of this module. One reason might be creating the eyeExpand matrix and later doing multiplication. In my case, when the batch size is 64, input dimension is 4800, two Normalize layer would be out of memory for one 4G memory GPU. Any idea to implement a more space efficient Normalize layer?
@ffmpbgrnn here is a version of Normalize which uses much less memory (it doesn't depend on the batch size anymore). It should be slower on GPU. The tests passes so it should be fine. Use it with fastMode(false). fmassa@015ba9c
@ffmpbgrnn so, does this patch works fine for you ? Is it much slower than the previous version ?
Maybe we could push this simplified version to master (taking of the faster mode to make things simple) ?
cc @soumith
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Generalizes #260 from @karpathy to accept arbitrary
L_pnorms.A few remarks:
LpNormalizeorNormalizeis a better name ?dimparameter to make it equivalent totorch.norm. Is it worth it given that we haveSpatialBatchNormalization?updateGradInputis quite memory consuming for larged.cc: @bamos