Sat . 19 Sep 2019
TR | RU | UK | KK | BE |

HPCC, hpccu
HPCC High-Performance Computing Cluster, also known as DAS Data Analytics Supercomputer, is an open source, data-intensive computing system platform developed by LexisNexis Risk Solutions The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing big data 1 The HPCC platform includes system configurations to support both parallel batch data processing Thor and high-performance online query applications using indexed data files Roxie 2 The HPCC platform also includes a data-centric declarative programming language for parallel data processing called ECL 3

The public release of HPCC was announced in 2011, after ten years of in-house development according to LexisNexis It is an alternative to Hadoop4 and other Big data platforms 5


  • 1 System architecture
  • 2 Software architecture
  • 3 See also
  • 4 References
  • 5 External links

System architectureedit

Figure 2 Thor processing cluster

The HPCC system architecture includes two distinct cluster processing environments, each of which can be optimized independently for its parallel data processing purpose The first of these platforms is called a data refinery whose overall purpose is the general processing of massive volumes of raw data of any type for any purpose but typically used for data cleansing and hygiene, extract, transform, load processing of the raw data, record linking and entity resolution, large-scale ad-hoc complex analytics, and creation of keyed data and indexes to support high-performance structured queries and data warehouse applications The data refinery is also referred to as Thor, a reference to the mythical Norse god of thunder with the large hammer symbolic of crushing large amounts of raw data into useful information A Thor cluster is similar in its function, execution environment, filesystem, and capabilities to the Google and Hadoop MapReduce platforms

Figure 2 shows a representation of a physical Thor processing cluster which functions as a batch job execution engine for scalable data-intensive computing applications In addition to the Thor master and slave nodes, additional auxiliary and common components are needed to implement a complete HPCC processing environment

Figure 3 Roxie processing cluster

The second of the parallel data processing platforms is called Roxie and functions as a rapid data delivery engine This platform is designed as an online high-performance structured query and analysis platform or data warehouse delivering the parallel data access processing requirements of online applications through Web services interfaces supporting thousands of simultaneous queries and users with sub-second response times Roxie utilizes a distributed indexed filesystem to provide parallel processing of queries using an optimized execution environment and filesystem for high-performance online processing A Roxie cluster is similar in its function and capabilities to Hadoop with HBase and Hive capabilities added, and provides for near real time predictable query latencies Both Thor and Roxie clusters utilize the ECL programming language for implementing applications, increasing continuity and programmer productivity

Figure 3 shows a representation of a physical Roxie processing cluster which functions as an online query execution engine for high-performance query and data warehousing applications A Roxie cluster includes multiple nodes with server and worker processes for processing queries; an additional auxiliary component called an ESP server which provides interfaces for external client access to the cluster; and additional common components which are shared with a Thor cluster in an HPCC environment Although a Thor processing cluster can be implemented and used without a Roxie cluster, an HPCC environment which includes a Roxie cluster should also include a Thor cluster The Thor cluster is used to build the distributed index files used by the Roxie cluster and to develop online queries which will be deployed with the index files to the Roxie cluster

Figure 4 HPCC software architecture

Software architectureedit

The HPCC software architecture incorporates the Thor and Roxie clusters as well as common middleware components, an external communications layer, client interfaces which provide both end-user services and system management tools, and auxiliary components to support monitoring and to facilitate loading and storing of filesystem data from external sources An HPCC environment can include only Thor clusters, or both Thor and Roxie clusters The overall HPCC software architecture is shown in Figure 4

See alsoedit

  • Apache Hadoop
  • Apache Spark
  • Aster Data Systems
  • MapReduce
  • Sector/Sphere
  • HPCC Systems


  1. ^ Handbook of Cloud Computing, "Data-Intensive Technologies for Cloud Computing," by AM Middleton Handbook of Cloud Computing Springer, 2010
  2. ^ "HPCC Systems: Introduction to HPCC High-Performance Computing Cluster" CiteSeerX 24 May 2011 Retrieved 29 October 2015 
  3. ^ Handbook of Data Intensive Computing, "ECL/HPCC: A Unified Approach to Big Data," by AM Middleton Handbook of Data Intensive Computing Springer, 2011
  4. ^ "LexisNexis Will Open-Source Its Hadoop Alternative for Handling Big Data" ReadWrite 15 June 2011 Retrieved 20 November 2014 
  5. ^ "9 Useful Open Source Big Data Tools" EnterpriseAppsToday 11 Nov 2015 Retrieved 18 November 2015 

External linksedit

  • Sandia sees data management challenges spiral
  • Sandia National Laboratories Leverages the Data Analytics Supercomputer DAS by LexisNexis Risk & Information Analytics Group, Which Offers Breakthrough High Performance Computing to Address Data Management and Analysis Challenges
  • Programming models for the LexisNexis High Performance Computing Cluster
  • LexisNexis Data Analytics Supercomputer
  • LexisNexis HPCC Systems
  • Reference to the term BORPS Billions of Records Per Second
  • LexisNexis Brings Its Data Management Magic To Bear on Scientific Data
  • High Performance Computing Clusters HPCC and Big Data Analytics Certificate - Stand-Alone, hpcc certification,,,,, hpccu,,,

HPCC Information about


  • user icon

    HPCC beatiful post thanks!


HPCC viewing the topic.
HPCC what, HPCC who, HPCC explanation

There are excerpts from wikipedia on this article and video

Random Posts

Amorphous metal

Amorphous metal

An amorphous metal also known as metallic glass or glassy metal is a solid metallic material, usuall...
Arthur Lake (bishop)

Arthur Lake (bishop)

Arthur Lake September 1569 – 4 May 1626 was Bishop of Bath and Wells and a translator of the King Ja...
John Hawkins (author)

John Hawkins (author)

Sir John Hawkins 29 March 1719 – 21 May 1789 was an English author and friend of Dr Samuel Johnson a...
McDonnell Douglas MD-12

McDonnell Douglas MD-12

The McDonnell Douglas MD-12 was an aircraft design study undertaken by the McDonnell Douglas company...