You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
JIL Execution in the VM is responsible for a small but important subset of node execution. While most nodes today are executed via the FFI (Foreign Function Interpreter) mechanisms which invoke C# code, some basic utility methods especially in Math category are written in native design script. This type of function is handled by the JILFunctionEndPoint. While many of these enhancements are targeted at this execution path, they also yield net improvements to the general execution of all EndPoint types. This is especially true in the POP_Handler, DEP_Handler, SetupExecutive, and RestoreFromCall methods within the VM's Executive. For a specific test graph with a mixture of geometry and mathematical operations, these enhancements specifically reduce overall UpdateGraph run time by 35% (17s to 13s). The net impact is also dramatically reduces temp memory allocation. For the case of this specific test graph the temporary memory allocation associated with UpdateGraph went from 11.3gb to 3.6gb. In summary this PR optimizes the execution of functions handled via the JIL Endpoint but has a net improvement to all node types.
Specifically this PR does the following
Clean up dependency on CurrentStackFrame property of RuntimeMemory. This getter creates a new StackFrame object to reference a subset of items at a specific location in the VM's Stack. The CurentStackFrame property is utilized 99% of the time from the IsGlobalScope method. This issue is IsGlobalScope can be called tens of millions of times during a Graph Execution run which creates a new StackFrame object every time the CurrentStackFrame property is referenced. This optimization simply removes the need to allocate a temporary StackValue object when the data which is needed can be easily referenced directly from the Stack. This optimization represents the majority of the extra gigabytes of temporary allocation described above in the sample graph performance delta.
Refactor RestoreFromCall to not allocate empty list until after the required check of runtimeCore.Options.RunMode == InterpreterMode.Expression. This allocation is done repeatedly with no items added to the collection. The call later to check the list via Any() checks can be refactored to a null check.
Fast path for GetGraphNodesAtScope when asking for a Invalid ClassIndex and ProcessIndex (ie -1). This is another function that can be called millions of times during a UpdateGraph run. Many calls that are routed through this function are looking for the same item in the graphNodeMap dictionary. This optimization creates a shortcut when the lookup is asking for the specific case of the invalid ClassIndex and ProcessIndex that avoids accessing the object from the dictionary. Note, in the case of this function, caching the previous lookup would not speed up the lookup as the method usage typically alternates between values.
Refactor UpdateGraph to not allocate a temporary list of GraphNodes.
@jasonstratton@aparajit-pratap Closing #12153 in favor of this. This removes the commit refactoring JILFunctionEndPoint to allow caching of the Interpreter which also had a failing test. This PR now is exclusively allocation optimizations and should have no impact on exectuion.
The reason will be displayed to describe this comment to others. Learn more.
I don't want to initialize the stackFrame object unless you actually need a StackFrame object. That is the big time penalty here. All of these essentially copy the implementation from StackFrame here to ovoid the getter for the CurrentStackFrame. On my sample graph it is called that getter is called 21 million times.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
JIL Execution in the VM is responsible for a small but important subset of node execution. While most nodes today are executed via the FFI (Foreign Function Interpreter) mechanisms which invoke C# code, some basic utility methods especially in Math category are written in native design script. This type of function is handled by the
JILFunctionEndPoint. While many of these enhancements are targeted at this execution path, they also yield net improvements to the general execution of all EndPoint types. This is especially true in thePOP_Handler,DEP_Handler,SetupExecutive, andRestoreFromCallmethods within the VM's Executive. For a specific test graph with a mixture of geometry and mathematical operations, these enhancements specifically reduce overallUpdateGraphrun time by 35% (17s to 13s). The net impact is also dramatically reduces temp memory allocation. For the case of this specific test graph the temporary memory allocation associated withUpdateGraphwent from 11.3gb to 3.6gb. In summary this PR optimizes the execution of functions handled via the JIL Endpoint but has a net improvement to all node types.Specifically this PR does the following
Clean up dependency on
CurrentStackFrameproperty ofRuntimeMemory. This getter creates a newStackFrameobject to reference a subset of items at a specific location in the VM's Stack. TheCurentStackFrameproperty is utilized 99% of the time from theIsGlobalScopemethod. This issue isIsGlobalScopecan be called tens of millions of times during a Graph Execution run which creates a newStackFrameobject every time theCurrentStackFrameproperty is referenced. This optimization simply removes the need to allocate a temporaryStackValueobject when the data which is needed can be easily referenced directly from the Stack. This optimization represents the majority of the extra gigabytes of temporary allocation described above in the sample graph performance delta.Refactor
RestoreFromCallto not allocate empty list until after the required check ofruntimeCore.Options.RunMode == InterpreterMode.Expression. This allocation is done repeatedly with no items added to the collection. The call later to check the list via Any() checks can be refactored to a null check.Fast path for
GetGraphNodesAtScopewhen asking for a Invalid ClassIndex and ProcessIndex (ie -1). This is another function that can be called millions of times during a UpdateGraph run. Many calls that are routed through this function are looking for the same item in thegraphNodeMapdictionary. This optimization creates a shortcut when the lookup is asking for the specific case of the invalidClassIndexandProcessIndexthat avoids accessing the object from the dictionary. Note, in the case of this function, caching the previous lookup would not speed up the lookup as the method usage typically alternates between values.Refactor
UpdateGraphto not allocate a temporary list of GraphNodes.Declarations
Check these if you believe they are true
*.resxfilesReviewers
@sm6srw @aparajit-pratap
FYIs
@jasonstratton @mjkkirschner