Mon . 20 Aug 2020
TR | RU | UK | KK | BE |

Streaming SIMD Extensions

streaming simd extensions, streaming simd extensions support required
In computing, Streaming SIMD Extensions SSE is an SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series of processors as a reaction to AMD's 3DNow! SSE contains 70 new instructions, most of which work on single precision floating point data SIMD instructions can greatly increase performance when exactly the same operations are to be performed on multiple data objects Typical applications are digital signal processing and graphics processing

Intel's first IA-32 SIMD effort was the MMX instruction set MMX had two main problems: it re-used existing floating point registers making the CPU unable to work on both floating point and SIMD data at the same time, and it only worked on integers SSE floating point instructions operate on a new independent register set the XMM registers, and it adds a few integer instructions that work on MMX registers

SSE was subsequently expanded by Intel to SSE2, SSE3, SSSE3, and SSE4 Because it supports floating point math, it had a wider application than MMX and became more popular The addition of integer support in SSE2 made MMX largely redundant, though further performance increases can be attained in some situations by using MMX in parallel with SSE operations

SSE was originally called Katmai New Instructions KNI, Katmai being the code name for the first Pentium III core revision During the Katmai project Intel sought to distinguish it from their earlier product line, particularly their flagship Pentium II It was later renamed Intel Streaming SIMD Extensions ISSE, then SSE AMD eventually added support for SSE instructions, starting with its Athlon XP and Duron Morgan core processors

Contents

  • 1 Registers
  • 2 SSE instructions
    • 21 Floating point instructions
    • 22 Integer instructions
    • 23 Other instructions
  • 3 Example
  • 4 Later versions
  • 5 Software and hardware issues
  • 6 Identifying
  • 7 References
  • 8 External links

Registersedit

SSE originally added eight new 128-bit registers known as XMM0 through XMM7 The AMD64 extensions from AMD originally called x86-64 added a further eight registers XMM8 through XMM15, and this extension is duplicated in the Intel 64 architecture There is also a new 32-bit control/status register, MXCSR The registers XMM8 through XMM15 are accessible only in 64-bit operating mode

SSE used only a single data type for XMM registers:

  • four 32-bit single-precision floating point numbers

SSE2 would later expand the usage of the XMM registers to include:

  • two 64-bit double-precision floating point numbers or
  • two 64-bit integers or
  • four 32-bit integers or
  • eight 16-bit short integers or
  • sixteen 8-bit bytes or characters

Because these 128-bit registers are additional machine states that the operating system must preserve across task switches, they are disabled by default until the operating system explicitly enables them This means that the OS must know how to use the FXSAVE and FXRSTOR instructions, which is the extended pair of instructions which can save all x86 and SSE register states all at once This support was quickly added to all major IA-32 operating systems

The first CPU to support SSE, the Pentium III, shared execution resources between SSE and the FPU While a compiled application can interleave FPU and SSE instructions side-by-side, the Pentium III will not issue an FPU and an SSE instruction in the same clock cycle This limitation reduces the effectiveness of pipelining, but the separate XMM registers do allow SIMD and scalar floating point operations to be mixed without the performance hit from explicit MMX/floating point mode switching

SSE instructionsedit

SSE introduced both scalar and packed floating point instructions

Floating point instructionsedit

  • Memory-to-register/register-to-memory/register-to-register data movement
    • Scalar– MOVSS
    • Packed – MOVAPS, MOVUPS, MOVLPS, MOVHPS, MOVLHPS, MOVHLPS, MOVMSKPS
  • Arithmetic
    • Scalar – ADDSS, SUBSS, MULSS, DIVSS, RCPSS, SQRTSS, MAXSS, MINSS, RSQRTSS
    • Packed – ADDPS, SUBPS, MULPS, DIVPS, RCPPS, SQRTPS, MAXPS, MINPS, RSQRTPS
  • Compare
    • Scalar – CMPSS, COMISS, UCOMISS
    • Packed – CMPPS
  • Data shuffle and unpacking
    • Packed – SHUFPS, UNPCKHPS, UNPCKLPS
  • Data-type conversion
    • Scalar – CVTSI2SS, CVTSS2SI, CVTTSS2SI
    • Packed – CVTPI2PS, CVTPS2PI, CVTTPS2PI
  • Bitwise logical operations
    • Packed – ANDPS, ORPS, XORPS, ANDNPS

Integer instructionsedit

  • Arithmetic
    • PMULHUW, PSADBW, PAVGB, PAVGW, PMAXUB, PMINUB, PMAXSW, PMINSW
  • Data movement
    • PEXTRW, PINSRW
  • Other
    • PMOVMSKB, PSHUFW

Other instructionsedit

  • MXCSR management
    • LDMXCSR, STMXCSR
  • Cache and Memory management
    • MOVNTQ, MOVNTPS, MASKMOVQ, PREFETCH0, PREFETCH1, PREFETCH2, PREFETCHNTA, SFENCE

Exampleedit

The following simple example demonstrates the advantage of using SSE Consider an operation like vector addition, which is used very often in computer graphics applications To add two single precision, four-component vectors together using x86 requires four floating-point addition instructions

vec_resx = v1x + v2x; vec_resy = v1y + v2y; vec_resz = v1z + v2z; vec_resw = v1w + v2w;

This corresponds to four x86 FADD instructions in the object code On the other hand, as the following pseudo-code shows, a single 128-bit 'packed-add' instruction can replace the four scalar addition instructions

movaps xmm0, v1 ;xmm0 = v1w | v1z | v1y | v1x addps xmm0, v2 ;xmm0 = v1w+v2w | v1z+v2z | v1y+v2y | v1x+v2x movaps vec_res, xmm0

Later versionsedit

  • SSE2, Willamette New Instructions WNI, introduced with the Pentium 4, is a major enhancement to SSE SSE2 adds two major features: double-precision 64-bit floating point for all SSE operations, and MMX integer operations on 128-bit XMM registers In the original SSE instruction set, conversion to and from integers placed the integer data in the 64-bit MMX registers SSE2 enables the programmer to perform SIMD math on any data type from 8-bit integer to 64-bit float entirely with the XMM vector-register file, without the need to use the legacy MMX or FPU registers It offers an orthogonal set of instructions for dealing with common data types
  • SSE3, also called Prescott New Instructions PNI, is an incremental upgrade to SSE2, adding a handful of DSP-oriented mathematics instructions and some process thread management instructions
  • SSSE3, Merom New Instructions MNI, is an upgrade to SSE3, adding 16 new instructions which include permuting the bytes in a word, multiplying 16-bit fixed-point numbers with correct rounding, and within-word accumulate instructions SSSE3 is often mistaken for SSE4 as this term was used during the development of the Core microarchitecture
  • SSE4, Penryn New Instructions PNI, is another major enhancement, adding a dot product instruction, additional integer instructions, a popcnt instruction, and more
  • XOP, FMA4 and CVT16 are new iterations announced by AMD in August 200712 and revised in May 20093
  • AVX Advanced Vector Extensions, Gesher New Instructions GNI, is an advanced version of SSE announced by Intel featuring a widened data path from 128 bits to 256 bits and 3-operand instructions up from 2 Intel released processors in early 2011 with AVX support4 AVX requires support from the operating system
  • AVX2
  • AVX51231 and 32

Software and hardware issuesedit

With all x86 instruction set extensions, it is up to the BIOS, operating system and application programmer to test and detect their existence and proper operation

  • Intel and AMD offer applications to detect what extensions a CPU supports
  • The CPUID opcode is a processor supplementary instruction its name derived from CPU IDentification for the x86 architecture It was introduced by Intel in 1993 when it introduced the Pentium and SL-Enhanced 486 processors

User application uptake of the x86 extensions has been slow with even bare minimum baseline MMX and SSE support in some cases not being supported by applications some 10 years after these extensions became commonly available Distributed computing has accelerated the use of these extensions in the scientific community—and many scientific applications refuse to run unless the CPU supports SSE2 or SSE3

The use of multiple revisions of an application to cope with the many different sets of extensions available is the simplest way around the x86 extension optimization problem Software libraries and some applications have begun to support multiple extension types hinting that full use of available x86 instructions may finally become common some 5 to 15 years after the instructions were initially introduced

Identifyingedit

Processor ID applications

  • Intel Processor Identification Utility5
  • CPU-Z - CPU, motherboard, and memory identification utility

Referencesedit

  1. ^ "AMD plots single thread boost with x86 extensions" The Register 30 August 2007 Retrieved 1 February 2008 
  2. ^ http://developeramdcom/sse5jsp
  3. ^ "AMD64 Architecture Programmer's Manual Volume 6: 128-Bit and 256-Bit XOP, FMA4 and CVT16 Instructions" PDF AMD 1 May 2009 
  4. ^ Girkar, Milind 2013-10-01 "Intel Instruction Set Architecture Extensions | Intel® Developer Zone" Softwareintelcom Retrieved 2013-10-23 
  5. ^ Intel Processor Identification Utility

External linksedit

  • Intel Intrinsics Guide

streaming simd extensions, streaming simd extensions (sse), streaming simd extensions download, streaming simd extensions support, streaming simd extensions support required


Streaming SIMD Extensions Information about

Streaming SIMD Extensions


  • user icon

    Streaming SIMD Extensions beatiful post thanks!

    29.10.2014


Streaming SIMD Extensions
Streaming SIMD Extensions
Streaming SIMD Extensions viewing the topic.
Streaming SIMD Extensions what, Streaming SIMD Extensions who, Streaming SIMD Extensions explanation

There are excerpts from wikipedia on this article and video

Random Posts

Book

Book

A book is a set of written, printed, illustrated, or blank sheets, made of ink, paper, parchment, or...
Boston Renegades

Boston Renegades

Boston Renegades was an American women’s soccer team, founded in 2003 The team was a member of the U...
Sa Caleta Phoenician Settlement

Sa Caleta Phoenician Settlement

Sa Caleta Phoenician Settlement can be found on a rocky headland about 10 kilometers west of Ibiza T...
Bodybuilding.com

Bodybuilding.com

Bodybuildingcom is an American online retailer based in Boise, Idaho, specializing in dietary supple...