`
ruilinruirui
  • 浏览: 1050941 次
文章分类
社区版块
存档分类
最新评论

Sector/Sphere:High Performance Distributed File System and Parallel Data Processing Engine

 
阅读更多

1. Overview
sector/sphere was created by Dr. Yunhong Gu in 2006 and it is now maintained by a group of open source developers, available from : http://sector.sourceforge.net/
sector : Distrubuted file system
sphere: parallel data processing framework
There is a test, in some cases,sector/sphere is about twice as fast as Hadoop

2. Sector
Sector system architecture:


the figure shows the overall architecture of the sector system, whichconsistsof three parts:
Security Server
: maintains user accounts, user passwd, file access infomation, ip addresses of the authorized slave nodes
Master:
maintains the metadata of the files stored in the syste, controls the running of all slave nodes, responds to users' requests
Slaves
: the nodes that store the files managed by the system and process the data upon the request of a sector client
The clients includes:
1. sector file system client api: access sector files in applications using the c++ api
2. sector system tools
3. FUSE: mount sector file system as a local directory
4. sphere programming api
A more detail figure:

Feature:
1. Compared to Hadoop, sector does not split user files into blocks, instead, every sector slice is stored as one single file in the native file system
2. Sector runs an independent security server, this design allows different security service providers to be deployed. In addition, multiple sector masters can user the same security service
3. Topology aware and application aware
4. uses UDP for message passing and UDT for transfer

Replication:
1. provide software level falut tolerance(no hardware RAID is required)
2. all files are replicated to a specific number by defalut
3. by default, replication is created on furthest node

UDT:
A high performance data transfer protocol designed for transferring large volumetric datasets over high speed wide area networks. Such settings are typically disadvantageous for the more commonTCP protocol.
UDT uses UDP to transfer bulk data with its own reliability control and congestion control mechanisms. The new protocol can transfer data at a much higher speed than TCP does.

Limitations:
1. File size if limited by available space individual storage nodes
2. Users my need to split their datasets into proper sizes
3. Sector is designed to provide high throughput on large datases, rather than extreme low latency on small files

3. Sphere
Sphere is a parallel data processing engine integrated in Sector and it can be used to process data stored in Sector in parallel,
Sphere users a stream processing computing paradigm. A stream is an abstraction in sphere and it represents either a dataset or a part of a dataset(A sector dataset consists of one of more physical files)
This figure illustrates how sphere processes the segments in a stream.
SPE: Sphere Proccessing Engine


This figure illustrates the basic model that sphere supports. sphere also supports some extensions of this model, which occur quite frequently
1. Processing multiple input streams.
2. Shuffling input streams.
Interested guys can refer to: “Sector and Sphere: The Design and Implementation of a High Performance Data Cloud”

4. References
Sector and Sphere: The Design and Implementation of a High Performance Data Cloud
http://sector.sourceforge.net/
http://en.wikipedia.org/wiki/Sector/Sphere
http://dongxicheng.org/mapreduce/streaming-mapreduce-sphere/
http://en.wikipedia.org/wiki/UDP-based_Data_Transfer_Protocol
http://udt.sourceforge.net/

分享到:
评论

相关推荐

    sector/sphere 源代码

    一个C++版本的mapreduce实现。

    Big.Data.Algorithms.Analytics.and.Applications.pdf

    Chapter 5: Approaches for High-Performance Big Data Processing : Applications and Challenges Chapter 6: The Art of Scheduling for Big Data Science Chapter 7: Time–Space Scheduling in the MapReduce ...

    FUZZY CONTROL SYSTEMS DESIGN AND ANALYSIS

    2 TAKAGI-SUGENO FUZZY MODEL AND PARALLEL DISTRIBUTED COMPENSATION 5 2.1 Takagi-Sugeno Fuzzy Model/6 2.2 Construction of Fuzzy Model/9 2.2.1 Sector Nonlinearity/10 2.2.2 Local Approximation in Fuzzy ...

    获得硬盘序列号 delphi源码

    // IDE high order cylinder value bDriveHeadReg: BYTE; // IDE drive/head register bCommandReg: BYTE; // Actual IDE command. bReserved: BYTE; // reserved for future use. Must be zero. end;

    RHEL7.4Oracle12cR2RAC安装.docx

    如果没有/u01挂载点需要现在创建 [root@vm-b4-fxkzdb1 software]# df -h df: ‘/root/.gvfs’: ...Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes

    AS/NZS 4360:2004 Risk management 澳新风险管理标准

    design and implementation of the risk management system will be influenced by the varying needs of an organization, its particular objectives, its products and services, and the processes and specific...

    Bioinformatics - From Genomes to Therapies: Volumes 1-3 part 2

    Thomas Lengauer has succeeded in creating a comprehensive and immensely useful bioinformatics resource that meets even the high standards of professionals in the pharmaceutical and medical sector....

    Bioinformatics - From Genomes to Therapies: Volumes 1-3 part 3

    Thomas Lengauer has succeeded in creating a comprehensive and immensely useful bioinformatics resource that meets even the high standards of professionals in the pharmaceutical and medical sector....

    Bioinformatics - From Genomes to Therapies: Volumes 1-3 part 1

    Thomas Lengauer has succeeded in creating a comprehensive and immensely useful bioinformatics resource that meets even the high standards of professionals in the pharmaceutical and medical sector....

    ISO/IEC 27010:2015 英文原版

    ISO/IEC 27010:2015 — Information technology — Security techniques — Information security management for inter-sector and inter-organisational communications (second edition) ISO/IEC 27010:2015 — ...

    ISO/IEC 27009:2020 英文原版

    ISO/IEC 27009:2020 — Information technology — Security techniques — Sector-specific application of ISO/IEC 27001 — Requirements (second edition) ISO/IEC 27009:2020 — 信息技术 — 安全技术 — ISO/...

    delphi-EYE-U0XX-SZ开发包定制

    // st:=rf_load_key(icdev,loadmode,sector, nkey); nkey:='ffffffffffff'; st:=rf_load_key_hex(icdev,loadmode,sector, nkey); if st<>0 then listbox1.items.add('load key error') else listbox1.items....

    ISO/IEC 27005:2011-EN

    for example on the scope of the ISMS, context of risk management, or industry sector. A number of existing methodologies can be used under the framework described in this International Standard to ...

    云数据管理研究综述

    作为一种全新的互联网应用模式,...计算技术的基础上,提出了云数据管理系统的概念,深度剖析了BigTable、Hbase、Sector/Sphere 等当 前互联网主流云数据管理系统的基本原理,最后指出了云数据管理领域的主要研究方向。

    Cisco Press:Storage Networking Protocol Fundamentals.chm

    Direct Access File System (DAFS) Collaborative International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) International Telecommunication Union-...

    Onion sector in Niger: an overview of “Violet de Galmi” onion.

    尼日尔洋葱扇形:“加勒米之紫” 洋葱品种概述,贾钊,田志宏,尼日尔是一个遭受贫困的发展中国家。然而,它因为铀产量第三的位置以及最近发现并提炼的地下原油而闻名。该国的农业以生产用于消

    ISO/IEC 27009:2016 英文原版

    Information technology — Security techniques — Sector-specific application of ISO/IEC 27001 — Requirements ISO/IEC 27009:2016 - 信息技术 - 安全技术 - ISO/IEC 27001 的特定行业应用 - 要求

    Strategy Mechanism for Tourism Sector in Fuxin City: in Respect of Absorptive Capacity and Collaboration

    基于吸收能力和企业协作的阜新旅游企业战略机制研究,杨红玉,杨彤骥,尽管目前有一定数量关于提升旅游竞争力及相应对策的研究成果,但是最新研究表明,人们开始关注如阜新市这一类资源枯竭型城市旅游

    麦肯锡2010.01最新报告debt and deleveraging:The global credit bubble and its economic consequences

    【资料名称】:debt and deleveraging:The global credit bubble and its economic consequences 【资料作者】:麦肯锡 【出版社】:麦肯锡 【简介及目录】: The recent bursting of the great global credit ...

    JLink_Windows_V648.zip

    DLL: NXP KW34: Added flash programming support for the program and data flash area. DLL: NXP KW34: Added flash programming support for the program and data flash area. DLL: NXP KW35 / KW36 / KW38 / KW...

Global site tag (gtag.js) - Google Analytics