'Hadoop' 카테고리의 글 목록

Column

Hadoop 2017. 11. 18. 20:42

String은 ENCODING STRING으로

숫자는 ENCODING BINARY로 하는 게 속도가 빠름.

CREATE HBASE TABLE enc_str (
   ti TINYINT, si SMALLINT, i INT,
   bi BIGINT, b BOOLEAN, f FLOAT,
   d  DOUBLE, s1 STRING, s2 STRING,
   s3 STRING, s4 STRING, s5 STRING
) 
COLUMN MAPPING ( 
   KEY MAPPED BY (s1,ti,bi,i) ENCODING STRING,
   f1:a MAPPED BY (si,s2,f) ENCODING STRING, 
   f1:b MAPPED BY (d,b,s3) ENCODING STRING,
   f1:c MAPPED BY (s4,s5) ENCODING STRING
);

저작자표시 (새창열림)

Posted by yongary

,

Pheonix

Hadoop 2017. 11. 16. 10:18

Hive가 batch오리엔티드인데 반해

Phoeix는 real-time Query도 수행하도록 만든 빠른 레이어이다.

-query를 HBase scan으로 만들어 빠르게 수행한다.

-HBase API를 바로 사용도 가능.

==> 그냥 jdbc 드라이버+SQL지원 정도의 용도로 많이 사용한다.

-----------------------------------

Phoenix : RDB 레이어 over hBase.

- Query engine + Metadata 저장소.

- JDBC driver 제공.

- Hive와 같이 join등 제공.

- write only / + append 방식

- global for read-heavy mutable data

- local for write-heavy mutable or immutable

각종 join모두 지원.

Phoenix Calcite ?

----View생성 from HBase 테이블: 이름 transactions, ColumnFamily:transactions일때

CREATE VIEW "TransactionHistory" (k VARCHAR primary key, "Transactions"."transactionId" VARCHAR);.

=> Select "transactionId" from "Transactions"

그리고 ALTER VIEW명령어로 column추가 가능.

REF

======Sqlline 사용한 shell 이용 방법====

$cd /usr/hdp/current/phoenix-client/bin/

$./sqlline.py

pnix> !tables

pnix> select * from MY.MY_TABLE

0: jdbc:phoenix:> help

!all Execute the specified SQL against all the current connections

!autocommit Set autocommit mode on or off

!batch Start or execute a batch of statements

!brief Set verbose mode off

!call Execute a callable statement

!close Close the current connection to the database

!closeall Close all current open connections

!columns List all the columns for the specified table

!commit Commit the current transaction (if autocommit is off)

!connect Open a new connection to the database.

!dbinfo Give metadata information about the database

!describe Describe a table

!dropall Drop all tables in the current database

!exportedkeys List all the exported keys for the specified table

!go Select the current connection

!help Print a summary of command usage

!history Display the command history

!importedkeys List all the imported keys for the specified table

!indexes List all the indexes for the specified table

!isolation Set the transaction isolation for this connection

!list List the current connections

!manual Display the SQLLine manual

!metadata Obtain metadata information

!nativesql Show the native SQL for the specified statement

!outputformat Set the output format for displaying results

(table,vertical,csv,tsv,xmlattrs,xmlelements)

!primarykeys List all the primary keys for the specified table

!procedures List all the procedures

!properties Connect to the database specified in the properties file(s)

!quit Exits the program

!reconnect Reconnect to the database

!record Record all output to the specified file

!rehash Fetch table and column names for command completion

!rollback Roll back the current transaction (if autocommit is off)

!run Run a script from the specified file

!save Save the current variabes and aliases

!scan Scan for installed JDBC drivers

!script Start saving a script to a file

!set Set a sqlline variable

=> !set maxWitdh 200 : Terminal 가로 출력 길이 세팅.

!sql Execute a SQL command

!tables List all the tables in the database

!typeinfo Display the type map for the current connection

!verbose Set verbose mode on

저작자표시 (새창열림)

Posted by yongary

,

HBase & Hive

Hadoop 2017. 11. 14. 17:35

HBase는 ByteArray 베이스의 NoSQL이면서 [Columnal DBMS임] RefY

- MemStore와 HFile에 저장.

- Bloom Filter라는게 HFile마지막에 저장되면서 속도를 냄. (옵셔널)

- ColumnFamily라는 개념이 존재. (prefix맞춰서 columnFamily로 만들면.., 자동으로 timestamp같은게 기록된다고라)

sameID: (cf:name timestamp value=kim)
sameID: (cf:score timestamp value=50) 내부적으론 이런식으로 dup해서 기록된다고 하네요.. 흠. REFY

ColumnFamily는 저장소 분리 및 압축에 사용됨. (유사한 type들을 모아놓아야 columnFam단위로 압축이 되거나 안되거나 함)

- extremely Low Latency & Low time delay..

- MapReduce와는 별도로 실행도 가능하고, 동시에 실행도 가능하다. (not mutually exclusive)

실행) $sudo /etc/init.d/hadoop-hbase-master start

====hbase shell=============

$hbase shell

hbase> list (ALL hbase Table listing)

hbase> desc 'MY:TABLE'

hbase> scan 'MY:TABLE' ( log라는 테이블 전체 조회)

hbase>

설계 룰:

- 가능한 적은 table을 만든다

- up front디자인에 많은 시간 투자!

- data loading은 최소한으로 한다.(모아서)

(Java API 지원)

Put = new Put()

Get = new Get()

---------------

Hive 은 ETL(Extract, Transform, Load) 툴.-> query타임이 중요치 않을때 사용

Hbase 는 온라인 access가 가능한 DB. -> query타임이 중요할 때 사용.

hive 예제. $hive 이후에 REFY

hive> add jar file:///usr/lib/hive-0.13.1-bin/lib/hive-contrib-0.13.1.jar;

hive> load data inpath 'hdfs:/wiki-access.log'

hive> select * from access;

ㅁ hBase/hive mapping된 테이블 생성

hive에는 hbase_table_access테이블을, hbase에는 log테이블을 생성.

저작자표시 (새창열림)

Posted by yongary

,

hortonworks sandbox

Hadoop 2017. 11. 8. 17:31

실습REF REF

===> hadoopexam.com 에 가면 각종 예제 많이 있음.

아, 그리고 windows의 경우 BIOS셋업에서 Virtual Technology를 Enable시켜야 하는 경우 있음.

hortonworks의 sandbox는 HDP(hortonworks Data Platform) 및 관련된 sw들이 모두 다 설치되어 있는 VM으로서

다운로드 받아서 VirtualBox같은 환경에서 실행만 하면 HDP의 모든 서비스를 바로 사용할 수 있다.

(용도는 test용일까, 상용일까? 상용도 가능하겠지요.. 이부분 공부 중.)

centOS환경으로 제공되며, 실행후에는

localHost와 여러 포트가 portMapping이 되어 있으므로

http://localhost:8888 or 8080(Ambari) (maria_dev/maria_dev) 등으로 바로 실행가능하고,

ssh root@localhost -p 2222 (root/hadoop) 로 접속도 가능하다.

저작자표시 (새창열림)

Posted by yongary

,

hdfs 실행 및 report

Hadoop 2015. 2. 28. 20:34

<설치 후, 첫 실행. >

sudo /usr/local/hadoop-2.5.0/sbin/start-dfs.sh

<상태 check>

$hdfs dfsadmin -report

<웹브라우저에서 확인.>

http://localhost:50070/

$hdfs dfs -ls /

$hdfs dfs -mkdir /myTest

$hdfs dfs -put test.txt /

$hadoop fs -mkdir /wordCount

$hadoop fs -copyFromLocal wordCount.jar /wordCount

$hadoop fs -ls

$hadoop jar wordCount.jar wordCount[main class] /wordCount[folder] /wordCount/output [out folder]

저작자표시 (새창열림)

Posted by yongary

,

Hadoop

Hadoop 2015. 2. 26. 14:15

Hadoop : MapReduce과 parallelDB의 장점을 모두 지님.

(DB가 스키마onWrite이라면, 하둡은 스키마onRead 이다)

HDFS : Data Centric Computing - 데이터 사이즈가 크므로 데이터가 있는 곳으로 이동해서 계산함

: write-Once, read-many, noUpdate-but-Append임.
: 디폴트는 3개의 replica.(다른 racki들) - client는 가까운데서 읽음.

: metaData (NameNode=Master라고 부르며- single관리되므로 중요..) 는 모두 in-Memory.
: dataNode=Slave(들도 각자의 metaData을 보관함.)

-디폴트 128Mega block들로 구성

- CRC32체크로 분산데이타체크함.

- 512byte단위로 checksum관리해서 매번 체크함

?secodary Name Node는?

- Map & Reduce utubeREF - 카드로설명

: Map-데이터가 있는 곳으로 프로그램이 가서 동작하는 개념.

: Reduce-계산이 끝나면 다시 분리 함.

youtube소개 REF: 25petaByte까지 저장가능, 4500개 머신까지 동작가능.

Pig - 컴파일러

Hive - SQL유사 I/F

HBase - top level apache project - 메신저 메세지는 object형태로 저장가능

HCatalog - 메타data서버 (Hive에서 분리되어 나옴)

기타) Mahout - 머신 러닝 libray for MapReduce

Ambari, Galnglia, Nagios - cluster분석 툴

Sqoop - RDB와 I/F 툴

Cascading - 트랜스레이팅 툴 for Pig..?

Oozie - 스케줄러. workflow 코디네이션.. 언제 실행할지 등.

Flume - 스트리밍 input for Hadoop

Protobuf, Avro, Thrift 를 지원.

Fuse-DFS : os 레벨 access지원.

저작자표시 (새창열림)

Posted by yongary

,

IT & Mobile

'Hadoop'에 해당되는 글 6건

Column

Pheonix

HBase & Hive

hortonworks sandbox

hdfs 실행 및 report

Hadoop

링크

카테고리

최근에 올라온 글

태그목록

달력

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

티스토리툴바


	by yongary