| CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 342
Releases: ispc/ispc
=== v1.29.1 === (19 December 2025)
A minor ISPC update with reverted support of blend stores for structures causing a stability regression on SSE2/SSE4/PS4 targets.
Assets 18
- sha256:9ad98fe32aecbdb40cf9ae2ffb37806dc36a0c113dcd41c32a6d08f18aebd3ab17.5 MB
2025-12-19T17:18:47Z - sha256:07f86f8945a9e169a6f7e8947f41400b7bbba8966770ba2dfafdd17264b53123310 Bytes
2025-12-19T17:18:48Z - sha256:3a989f9594ad5fb6c6cf1d5fe75629a75ea4f0e66346dc9fc9e4f80734e0473717.6 MB
2025-12-19T17:18:48Z - sha256:3a0b4c22a15e410ffee3be47f456b307891b630c2d3322f7fd0503e3a0b1a489310 Bytes
2025-12-19T17:18:50Z - sha256:f4353abfd58f40c06bf984e268bdb1adf750a349e52b97e9f82494adb8033bf760.9 MB
2025-12-19T17:18:50Z - sha256:e47f02ad94c009859aa0deeeb4fe0f9575888647eeb9a0265bff48c7b3f9746c310 Bytes
2025-12-19T17:18:53Z - sha256:fdcad97d1b491a604dca009d78212d1c98a5b1d2e1f8cdcbf2f063d79dafd07d104 MB
2025-12-19T17:18:53Z - sha256:9a85e087dba3d5b018b6162da58c49417d7503210ae5a908b6bcd1229cfa8cf6310 Bytes
2025-12-19T17:18:16Z - sha256:c37f29c0834916cd3dae24c72a668533c34dd18f2ebf288af45ed617cf723c9499.1 MB
2025-12-19T17:18:17Z - sha256:b2efe3034252d8c045017162d8ef4590a191cf2c8a720b52784f7694ee29eaf4310 Bytes
2025-12-19T17:18:25Z -
2025-12-19T00:13:44Z -
2025-12-19T00:13:44Z - Loading
=== v1.29.0 === (17 December 2025)
ISPC release featuring sample-based profile-guided optimization, optimized dispatcher, new avx512gnr targets for Intel Granite Rapids, and numerous bug fixes and performance improvements. Based on a patched LLVM 20.1.8.
Compiler Switches:
-
Added
--profile-sample-use=<file>flag to enable sample-based profile-guided optimization (PGO). When provided, the ISPC compiler loads sample profile data to guide optimization decisions during compilation. Use in conjunction with--sample-profiling-debug-infoflag that enables debug info suitable for sample-based profiling. Sample-based PGO can provide up to 30% performance
gains thanks to aggressive loop unrolling, optimized memory access patterns, and specialized hot code paths guided by actual branch frequencies. -
Added
--[no-]internal-export-functionsto control generation of internal (ISPC-callable) versions of exported functions. The flag is enabled by default. When disabled (--no-internal-export-functions), only external versions are generated and calling exported functions from ISPC code will result in a compilation error. -
Added
--stack-protector[=<level>]flag to enable Stack Smash Protection (SSP) for ISPC functions, providing runtime detection of stack buffer overflows.--stack-protector(equivalent to--stack-protector=on) enables stack protectors for functions vulnerable to stack smashing.--stack-protector=strongenables stack protectors for functions that contain arrays of any size or take addresses of local variables.--stack-protector=allenables stack protectors for all functions.--stack-protector=nonedisables stack protectors (default). -
The default DWARF version has been updated to match the LLVM default (DWARF 5 on most platforms).
Behavioral Changes:
- A new warning has been introduced when an exported function without the
external_onlyattribute is called from ISPC code. This warning prepares for an upcoming behavior change in ISPC 1.30, whereexportfunctions will by default generate only external (C/C++-callable) versions instead of both internal and external versions. To address this warning, use a non-exported function for ISPC-to-ISPC calls, add theexternal_onlyattribute, or use the--no-internal-export-functionsflag.
Language Changes:
-
soa<>types can now be used as struct members. Previously,soa<>members in structs were not supported by the grammar. -
The compiler now assumes that all loops with non-constant conditions will make forward progress and eventually terminate. This enables additional optimizations. Infinite loops with constant conditions like
for (;;)orwhile (1)are treated specially and do not have this assumption applied.
Dispatcher:
- The dispatcher has been made more efficient with a caching mechanism and enabling LLVM optimization passes, resulting in approximately 50% faster dispatch overhead.
Targets:
-
New
avx512gnr-x4,avx512gnr-x8,avx512gnr-x16,avx512gnr-x32, andavx512gnr-x64targets have been added for Intel Granite Rapids processors. These targets support AVX-512 with AMX-FP16 capabilities. -
The
avx10.2targets have been renamed toavx10.2dmrto reflect Diamond Rapids (DMR) codename alignment. -
Fixed
--opt=disable-zmmoption to work correctly onavx512skx-x16andavx512icl-x16targets. This option avoids ZMM registers, which can be beneficial for workloads sensitive to frequency throttling on some processors.
Removed Targets:
- The
gen9-x8andgen9-x16GPU targets have been removed.
Deprecated Targets:
- The
sse2-i32x4andsse2-i32x8targets are now deprecated and will be removed in a future release.
Predefined Macros:
-
New predefined macros
ISPC_TARGET_HAS_FP16_SUPPORTandISPC_TARGET_HAS_FP64_SUPPORThave been added following the consistent naming convention used by other target capability macros. The old macro namesISPC_FP16_SUPPORTEDandISPC_FP64_SUPPORTEDremain available for backward compatibility but are now deprecated. -
The
ISPC_TARGET_AVX10_2macro has been replaced withISPC_TARGET_AVX10_2DMRto match the target renaming.
Performance:
-
Optimized
popcnt(population count) implementation for AVX512ICL and newer targets, achieving up to 3.5x speedup. -
Improved code generation for
avx512-x16andavx10.2-x16targets with ~10% improvement in geomean on benchmarks. This includes better shuffle instruction generation and improved optimization pass ordering that prevents suboptimal masked load transformations blocking SROA registerization. -
Improved masked store promotion to blend stores for structures, providing up to 53% improvement on targets without hardware masked stores (such as NEON and SSE4).
-
Fixed inefficient loop code generation when using unsigned loop counters.
-
Fixed incorrect loop full unroll behavior that caused partial unrolling for loops with unknown trip counts.
Build System:
- Optimized stdlib compilation by implementing a width family system that reduces bitcode duplication, reducing ISPC binary size by approximately 30%. This also allows adding new targets to ISPC with minimal increase in binary size.
Bug Fixes:
-
Fixed crashes when casting SOA (slice) pointers to non-SOA pointer types.
-
Fixed handling of enum negation in constant folding.
-
Fixed slice pointer handling in pointer-to-integer casts.
-
Fixed type checking of expressions wrapped by TypeCastExpr.
-
Fixed indexing into function call results that return pointer types.
-
Fixed uniform bool return values that could incorrectly return 255 instead of 1.
-
Fixed shuffle-related optimization issues.
-
Fixed enum fields missing from generated C/C++ headers.
-
Fixed VNNI intrinsic validation on SKX target.
-
Fixed rounding operations for float16 on SSE2 targets by adding emulation.
New Example:
- Added an AMX (Advanced Matrix Extensions) example demonstrating tile matrix operations.
Experimental RISC-V Support:
- Initial support for the RISC-V 64-bit (riscv64) architecture has been added with RISC-V Vector Extension (RVV) ISA, introducing the
rvv-x4target for 4-wide vectorization. This support is experimental and not included in official ISPC binaries. To use it, build ISPC from source with theRISCV_ENABLED=ONCMake option or use pre-release binaries. Feedback and contributions are welcome.
Recommended versions of Runtime Dependencies when targeting GPU:
Linux:
- Intel(R) Graphics Compute Runtime
https://github.com/intel/compute-runtime/releases/tag/25.13.33276.16 - Level Zero Loader
https://github.com/oneapi-src/level-zero/releases/tag/v1.20.2 - Threading Building Blocks (TBB)
Alternatively, you can use a validated gfx driver stack supporting Intel Arc(TM)
available at https://dgpu-docs.intel.com/driver/installation.html
Windows:
- Intel(R) Graphics Windows(R) DCH Drivers 32.0.101.8250
https://www.intel.com/content/www/us/en/download/785597/869290/intel-arc-graphics-windows.html - Level Zero Loader
https://github.com/oneapi-src/level-zero/releases/tag/v1.20.2 - OpenCL(TM) Offline Compiler (OCLOC)
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
(this is needed for AoT compilation on Windows only) - Supported GPU platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core
processor graphics
Components revisions used in GPU-enabled build:
- KhronosGroup/SPIRV-LLVM-Translator@6dd8f2a
- intel/vc-intrinsics@b980474
- https://github.com/oneapi-src/level-zero/commit/v1.20.2
- llvm/llvm-project@87f0227 (llvmorg-20.1.8) +
patches from llvm_patches folder
Assets 18
=== v1.28.2 === (24 September 2025)
A minor ISPC update with bug fixes:
- Fixed
enumfields missing from generated headers. - Fixed
booleanreturn value representation in exported functions.
Assets 18
=== v1.28.1 === (21 August 2025)
A minor ISPC update with bug fixes:
- Fixed compiler crash when indexing with pointers returned by function calls
- Fixed compiler crash when assigning null pointers to struct members
Assets 18
=== v1.28.0 === (13 August 2025)
ISPC release with enhanced struct operator support, the ability to use ISPC as a library with JIT support, simplified integration with Python via nanobind wrappers, enhanced standard library functions, and numerous stability and performance improvements. Based on a patched LLVM 20.1.8.
Language Changes
-
Struct operator overloading has been extended. Added support for overloading unary (
++,--,-,!,~), binary (*,/,%,+,-,>>,<<,==,!=,<,>,<=,>=,&,|,^,&&,||), and assignment (=,+=,-=,*=,/=,%=,<<=,>>=,&=,|=,^=) operators for struct types. -
Integer literal rules are now stricter:
- Limits the number of
[uUlL]symbols (e.g.,ulll,uul,luluare no longer valid). - The value modification suffix (
[kMG]) must precede the type modification suffix ([uUlL]). - Like C/C++,
lLandLlsuffixes are no longer allowed (mixing cases to formLL).
- Limits the number of
-
NaN to
boolconversion now matches C/C++ behavior.
ISPC as a Library
- ISPC can now be used as a C++ library (
libispc) for embedding ISPC compilation directly into applications. It also provides CMake configuration files for easy integration into other CMake projects. The library includes experimental Just-In-Time (JIT) compilation for runtime code generation and execution. See the documentation and the newsimple_libandsimple_jitexamples demonstrating theISPCEngineAPI.
Python Integration and Nanobind Support
-
ISPC can now generate nanobind wrappers for ISPC modules, enabling easy and lightweight integration with Python. The generated wrappers can be built into native Python modules and imported into Python code. The
--nanobind-wrapper=<filename>option enables this feature. -
Three new examples show ISPC integration with Python:
- point_transform_ctypes - calling an ISPC function via ctypes with three different input types.
- point_transform_nanobind - using nanobind to wrap an ISPC function for geometric transformations on 2D points, integrating with NumPy for high-performance vectorized processing.
- attention - single-head attention as used in transformer networks, featuring:
- Multiple matrix multiplication methods (GOTO-based, tiled)
- Memory pool management for efficient intermediate storage
- Task-based parallelism for multi-core scaling
- Softmax with optimized memory access patterns
Float16
- Added a new command-line option
--include-float16-conversionsto include float16 conversion functions in the compiled module. Useful for targets without native float16 conversion instructions, such as x86 prior to AVX2. Disabled by default.
Standard Library Changes
-
selectnow supports unsigned integer typesuint8,uint16,uint32, anduint64, as well as uniform short vectors. -
Added new functions:
isinf,isfinite,srgb8_to_float. -
Short vector standard library functions have been moved to
short_vec.isph. They are no longer implicitly available and must be
included explicitly. -
Added short vector type support for:
fmod,isnan,rsqrt_fast,clamp. -
Added an
include/intrinsicsdirectory with SSE intrinsic headers, useful for porting intrinsics-based code to ISPC.
Performance
- Optimized
shuffle/shift/rotateandreduce_equalon AVX-512 targets, with up to 90% speedup.
New Targets
- Added CPU targets for AMD Zen4 and Zen5 architectures.
Crash Fixes
- Fixed crashes related to atomic type handling.
- Resolved nested
foreachassertion failures. - Fixed variable scoping crashes in single-statement blocks.
- Addressed unhandled atomic type crashes.
- Fixed crashes related to
signedkeyword usage. - Resolved crash from extra index access in aggregate initialization.
- Fixed crashes from templates without arguments.
- Fixed crashes in short vector initialization and indexing.
- Fixed type signature mismatch crash with
uniform bool<N>parameters.
Function Call Enhancements
- Added struct access support on function call returns.
- Improved handling of functions returning pointer types.
- Enhanced lvalue handling for function return expressions.
Ecosystem Improvements and Community Engagement
-
Modernized the ISPC website with a refreshed look and structure.
-
Open-sourced the ISPC VS Code plugin (https://github.com/ispc/vscode-ispc). Though it has not been maintained for some time, we are committed to improving it and releasing a new version. Community contributions are welcome.
-
We have set up a Discord server to better connect with our community and make it easier for ISPC users to support one another. Join the ISPC Discord server: https://discord.com/invite/9e7E7sFe2D
Recommended versions of Runtime Dependencies when targeting GPU
Linux:
- Intel(R) Graphics Compute Runtime https://github.com/intel/compute-runtime/releases/tag/25.13.33276.16
- Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v1.20.2
- Threading Building Blocks (TBB)
Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™ available at https://dgpu-docs.intel.com/driver/installation.html
Windows:
- Intel(R) Graphics Windows(R) DCH Drivers 32.0.101.6083 https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html
- Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v1.20.2
- OpenCL™ Offline Compiler (OCLOC) https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html (this is needed for AoT compilation on Windows only)
- Supported GPU platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core processor graphics
Components revisions used in GPU-enabled build:
- KhronosGroup/SPIRV-LLVM-Translator@6dd8f2a1
- intel/vc-intrinsics@b980474c
- oneapi-src/level-zero@d7a44e0 (v1.20.2)
- llvm/llvm-project@87f0227 (llvmorg-20.1.8) + patches from llvm_patches folder
Assets 18
=== v1.27.0 === (15 May 2025)
This release introduces AVX10.2 support, extended standard library coverage for small integer types, full support for element-wise functions on short vectors in the standard library, and numerous bug fixes. It is based on a patched LLVM 20.1.4.
New targets
New targets have been added for platforms supporting Intel® Advanced Vector Extensions 10.2: avx10.2-x4, avx10.2-x8, avx10.2-x16, avx10.2-x32, and avx10.2-x64.
Standard library
- Cross-lane operations -
broadcast,rotate,shift, andshuffle- are now supported for unsigned types. - Reduction functions now support signed and unsigned
int8andint16types. - Support for packed_load and packed_store has been extended to include:
int8,int16(signed and unsigned),float16,float, anddouble. - The cube root function
cbrthas been added to the standard library forfloatanddoubletypes. - Dot product functionality has been enhanced with mixed signedness support for 16-bit integers. The following input combinations are now supported: u16 x u16 (unsigned x unsigned), i16 x i16 (signed x signed), u16 x i16 (mixed signedness). For consistency with other naming conventions, the function
dot2add_i16_packedhas been renamed todot2add_i16i16_packed. - The support for short vector types has been added for the following element wise functions:
min,max,abs,round,floor,ceil,trunc,rcp,rcp_fast,sqrt,rsqrt,sin,asin,cos,acos,tan,atan,exp,log,atan2,pow, andcbrt.
Language changes
- The
aligned(N)attribute is now available to specify the alignment of variables and struct types. - A bug was fixed where unsigned array indices or pointer arithmetic with unsigned offsets could result in overflow due to sign extension when promoting to pointer size. This issue is now resolved, and the compiler correctly handles unsigned integer indexing and pointer arithmetic.
Compiler Switches Behavior
- The
-dDand-dMflags are now supported, aiding in debugging the preprocessor and inspecting defined macros.
Template support bug fixes
- Fixed instantiation of template functions when assigned to function pointers.
- Improved implicit template argument deduction.
- Fixed a crash occurring when a nested template function did not use a templated argument.
Performance improvements
- Improved the performance of masked loads and stores for AVX-512 x32 and x64 targets by an order of magnitude (approximately ~10x on microbenchmarks).
packed_store_active2on AVX2 has been improved: ~65% speedup forint32, ~45% speedup forint64
Other bug fixes
- Fixed a crash during integer division by ensuring it occurs only on active lanes, improving stability and performance.
- Resolved crashes related to:
- Incomplete struct types
- Use of enum fields in structs
- Pointer declarations to function types
- Unsupported binary operations on pointer types
- Casting to unsized arrays in malformed code
- Accessing array elements through pointers
- Structure member access within pointer arithmetic
- Improved compiler warnings for incomplete types.
- Corrected address calculations involving unsigned indices in array accesses and pointer arithmetic.
Ecosystem improvements
- ISPC is now supported by GitHub's Linguist, enabling proper syntax highlighting for
.ispcfiles on GitHub. - ISPC syntax support has been added to the following editors, thanks to community contributions:
- CudaText - Alexey-T/CudaText#5944
- ECoder - SpartanJ/ecode#436
- If you are integrating ISPC with Python, we recommend using nanobind. Examples are available, and we plan to generate nanobind-compatible headers in a future release.
Recommended versions of Runtime Dependencies when targeting GPU
Linux:
- Intel(R) Graphics Compute Runtime
https://github.com/intel/compute-runtime/releases/tag/25.13.33276.16 - Level Zero Loader
https://github.com/oneapi-src/level-zero/releases/tag/v1.20.2 - Threading Building Blocks (TBB)
Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™
available at https://dgpu-docs.intel.com/driver/installation.html
Windows:
- Intel(R) Graphics Windows(R) DCH Drivers 32.0.101.6083
https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html - Level Zero Loader
https://github.com/oneapi-src/level-zero/releases/tag/v1.20.2 - OpenCL™ Offline Compiler (OCLOC)
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
(this is needed for AoT compilation on Windows only) - Supported GPU platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core
processor graphics
Components revisions used in GPU-enabled build:
- KhronosGroup/SPIRV-LLVM-Translator@6dd8f2a1
- intel/vc-intrinsics@b980474c
- oneapi-src/level-zero@d7a44e0 (v1.20.2)
- llvm/llvm-project@ec28b8f (llvmorg-20.1.4) + patches from llvm_patches folder
Assets 18
=== v1.26.0 === (6 February 2025)
ISPC release featuring improved ARM support, new "generic" targets that simplify ISPC's internal design and streamline the addition of new targets, improved code generation across x86 and ARM, and multiple stability fixes. This release is based on a patched LLVM 18.1.8.
ARM Support Changes:
- The
--arch=armflag, which previously mapped to ARMv7 (32-bit), now maps to ARMv8 (32-bit). There are no changes to
--arch=aarch64, which continues to map to ARMv8 (64-bit). - The CPU definitions for the ARMv7 architecture have been removed:
cortex-a9andcortex-a15. - New CPU definitions have been introduced, including
cortex-a55,cortex-a78,cortex-a510, andcortex-a520, along with support for new Apple devices. - New double-pumped targets have been introduced:
neon-i16x16andneon-i8x32. - Dot product operations are now supported using native ARM instructions (
sdot/udot). - Performance on ARMv8 has been improved by an average on 13%.
Generic Targets:
In this release, generic targets were introduced in ISPC. Their main goal is to simplify ISPC target management and serve as the foundation for hardware-specific targets, requiring only selective tuning when performance expectations are not met.
ARM targets have been refactored to use generic targets as a baseline, resulting in cleaner code and improved performance. This change also makes it easier to add support for new architectures, such as RISC-V or any other LLVM-supported target.
Generic targets can also be used as standalone targets in cases where no native target exists with the required width for a particular CPU (e.g., a 32-wide target for SSE4). This can be done by specifying the following options in ISPC:
--target=generic-i1x32 --cpu=penryn
A complete list of all generic targets and the architectures they support can be found in the output of:
ispc --support-matrix
Code Generation:
- The
-O1optimization pipeline has been further optimized for size: loop unrolling and function inlining have been adjusted accordingly. - Improved generated code for the
count_leading_zerosandcount_trailing_zerosfunctions by producing native instructions ( e.g.
vplzcntq). - Improved generated code for masked load/stores for int8/int16 types on AVX512 by generating native instructions (
vmovdqu8,vmovdqu16). - Improved code generation when returning structs from functions by eliminating unnecessary
movinstructions.
Language Changes:
- Enhanced support for LLVM intrinsics when the
--enable-llvm-intrinsicsflag is used, including support for intrinsics with no arguments and overloaded intrinsics. - Added user-visible macro definitions for the LLVM version that ISPC is based on.
- The
__attribute__((deprecated))attribute can now be applied to functions, generating a warning when the function is called.
Deprecated Targets:
- The KNL (
avx512knl-x16) target has been removed.
Compiler Switches Behavior:
- The
--darwin-version-minoption has been added to specify the minimum deployment target version for macOS and iOS applications. This addresses a new linker behavior introduced in Xcode 15.0, which issues a warning when no version is provided. - The
--nocppcommand-line flag is now deprecated and will be removed in a future release.
Dispatch Behavior:
- The behavior of user programs when no supported ISA is detected in the auto-dispatch code has changed. Instead of raising the
SIGABRTsignal, the system will now raiseSIGILL. This affects users who rely onSIGABRTin their signal handlers for error handling or recovery. Such users must update their code to handleSIGILLinstead. This change improves predictability and removes the dispatcher's reliance on the C standard library.
Bug Fixes:
- Fixed a crash for functions returning pointers.
- Fixed incorrect values for some predefined macros.
- Fixed a crash when using sizeof as a global variable initializer.
- Fixed function template overload resolution issues.
- Fixed incorrect behavior in short vector casts inside templates.
- Fixed incorrect zero handling in the
ldexpstandard library function.
Recommended versions of Runtime Dependencies when targeting GPU:
Linux:
- Intel(R) Graphics Compute Runtime https://github.com/intel/compute-runtime/releases/tag/24.35.30872.22
- Level Zero Loader https://github.com/oneapi-src/level-zero/releases/tag/v1.17.28
- Threading Building Blocks (TBB)
Alternatively, you can use a validated gfx driver stack supporting Intel® Arc™ available at https://dgpu-docs.intel.com/driver/installation.html
Windows:
- Intel(R) Graphics Windows(R) DCH Drivers 32.0.101.6083
https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html - Level Zero Loader
https://github.com/oneapi-src/level-zero/releases/tag/v1.17.28 - OpenCL™ Offline Compiler (OCLOC)
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html
(this is needed for AoT compilation on Windows only) - Supported GPU platforms: Intel(R) Arc Graphics, 11th-13th Gen Intel(R) Core processor graphics
Components revisions used in GPU-enabled build:
- KhronosGroup/SPIRV-LLVM-Translator@43fb73fe
- intel/vc-intrinsics@4f5bc1bb
- oneapi-src/level-zero@c1f6e28 (v1.17.28)
- https://github.com/llvm/llvm-project/commit/3b5b5c1(llvmorg-18.1.8) + patches from llvm_patches folder
Assets 18
trunk-artifacts
Automatically updated trunk artifacts
Assets 4
=== v1.25.3=== (8 November 2024)
A minor ISPC update with a fix for --vectorcall calling convention mode on Windows.
Assets 20
=== v1.25.2=== (2 November 2024)
A minor ISPC update with several bug fixes:
- Fixed broken
--vectorcallcalling convention mode on Windows. - Fix build error on FreeBSD.
- Removed in-memory ISPC headers (
/core.isph,/stdlib.isph) from dependencies for-Mswitch for Windows binaries. - Fixed linker errors on Windows for multi-target compilation.