본문 바로가기

백엔드/DB 상세 정보

Data Replication 개요

반응형

Replication

배경


아주 단순한 Database를 구성할때에는 아래의 그림처럼 하나의 서버와 하나의 Database를 구성하게 된다.

하지만 사용자는 점점 많아지고 Database는 많은 Query를 처리하기엔 너무 힘든 상황이 오게 된다.

Query의 대부분을 차지하는 Select를 어느 정도 해결하기 위해 Replication이란 방법이 나오게 되었다.

 

 

 

 

Replication이란?


  • 두 개의 이상의 DBMS 시스템을 Master / Slave로 나눠서 동일한 데이터를 저장하는 방식이다.
  • Data replication is the process of making multiple copies of data and storing them at different locations for backup purposes, fault tolerance and to improve their overall accessibility across a network.
  • Similar to data mirroring, data replication can be applied to both individual computers and servers.
  • The data replicates can be stored within the same system, on-site and off-site hosts, and cloud-based hosts.
  • Common database technologies today either have built-in capabilities, or use third-party tools to accomplish data replication. While Oracle Database and Microsoft SQL actively support data replication, some traditional technologies may not include this feature out of the box.
  • Data replication can either be synchronous, meaning that any changes made to the original data will be replicated, or asynchronous, meaning replication is initiated only when the Commit statement is passed to the database.
  • 레플리케이션(Replication)은 데이터 저장 백업하는 방법과 관련이 있는 데이터 호스트 컴퓨터에서 다른 컴퓨터로 복사하는 것
  • 이 때 다른 컴퓨터가 반드시 떨어진 지역에 있어야 하는 것은 아니다. 컴퓨터 네트워크 상태에서는 데이터 저장을 할 수 있게 하는데 로컬 데이터 물리적 기억 장치와는 완전하게 구분된다.
  • 레플리케이션은 유명한 데이터베이스 관리 시스템(RDBMS, Relational DataBase Management Systems)에서 추가적으로 제공하거나 여러 대의 데이터베이스 서버의 부하를 맞추어 줄 용도로 제공한다.
  • 레플리케이션은 남아 있는 리소스와 관련이 있는데 소프트웨어 요소나 하드웨어 부품이 말해 주며, 이는 신뢰성, 허용 오차, 그리고 성능을 개선한다.
  • 전형적으로 '레플리케이션 인 스페이스'(replication in space)와 관련이 있는데 이것은 동일한 데이터를 다수의 저장 장치에 저장하거나 동일한 계산 업무를 다수 장치에서 수행하는 것이다. 또한 '레플리케이션 인 타임'(replication in time)는 컴퓨터 계산 수행이 반복적으로 한 개의 장치에서 일어나는 것이다.

 

방식

- Master DBMS에는 데이터의 수정사항을 반영

- Replication을 하여 Slave DBMS에 실제 데이터를 복사

 

로그기반 복제(Binary Log)


  • Statement Based : SQL문장을 복사하여 진행
    • issue : SQL에 따라 결과가 달라지는 경우(Timestamp, UUID, …)
  • Row Based : SQL에 따라 변경된 Row 라인만 기록하는 방식
    • issue : 데이터가 많이 변경된 경우 데이터 커질 수 밖에 없다.
  • Mixed : 기본적으로 Statement Based로 진행하면서 필요에 따라 Row Based를 사용한다.

 

 

Replication 장점


- Query의 대부분은 Select가 차지하고 있다.

- 이 부분의 부하를 낮추기 위해 많은 Slave Database를 생성하게 된다면 Read(Select) 성능 향상 효과를 얻을 수 있다.

Master Database 영향없이 로그를 분석할 수 있다.

  1. Improve the availability of data
  2. Increase the speed of data access
  3. Enhance server performance
  4. Accomplish disaster recovery

 

Improve the availability of data

When a particular system experiences a technical glitch due to malware or a faulty hardware component, the data can still be accessed from a different site or node. Data replication enhances the resilience and reliability of systems by storing data at multiple nodes across the network.

Increase data access speed

In organizations where there are multiple branch offices spread across the globe, users may experience some latency while accessing data from one country to another. Placing replicas on local servers provides users with faster data access and query execution times.

Enhance server performance

Database replication effectively reduces the load on the primary server by dispersing it among other nodes in the distributed system, thereby improving network performance. By routing all read-operations to a replica database, IT administrators can save the primary server for write-operations that demand more processing power.

Accomplish Disaster recovery

Businesses are often susceptible to data loss due to a data breach or hardware malfunction. During such a catastrophe, the employees' valuable data, along with client information can be compromised. Data replication facilitates the recovery of data which is lost or corrupted by maintaining accurate backups at well-monitored locations, thereby contributing to enhanced data protection. 

 

 

 

Types of data replication


Depending on data replication tools employed, there are multiple types of replication practiced by businesses today. Some of the popular replication modes are as follows

  1. Full table replication
  2. Transactional replication
  3. Snapshot replication
  4. Merge replication
  5. Key-based incremental replication

Full table replication

Full table replication means that

- the entire data is replicated.

This includes new, updated as well as existing data that is copied from source to the destination. This method of replication is generally associated with higher costs since the processing power and network bandwidth requirements are high.

However, full table replication can be beneficial when it comes to the recovery of hard-deleted data, as well as data that do not possess replication keys - discussed further down this article.

Transactional replication

In this method, the data replication software

- makes full initial copies of data from origin to destination following which the subscriber database receives updates whenever data is modified.

This is more efficient mode of replication since fewer rows are copied each time data is changed. Transactional replication is usually found in server-to-server environments.

Snapshot replication

In Snapshot replication,

- data is replicated exactly as it appears at any given time.

Unlike other methods, Snapshot replication does not pay attention to the changes made to data. This mode of replication is used when changes made to data tends to be infrequent; for example performing initial synchronizations between publishers and subscribers

Merge replication

This type of replication is commonly found in server-to-client environments and

- allows both the publisher and subscriber to make changes to data dynamically.

In merge replication, data from two or more databases are combined to form a single database thereby contributing to the complexity of using this technique.

Key-based incremental replication

Also called key-based incremental data capture,

- this technique only copies data changed since the last update.

Keys can be looked at as elements that exist within databases that trigger data replication. Since only a few rows are copied during each update, the costs are significantly low.

However, the drawback lies in the fact that this replication mode cannot be used to recover hard deleted data, since the key value is also deleted along with the record.

 

 

 

 

Reference

 

MySQL :: MySQL 5.7 Reference Manual :: 16 Replication

Replication enables data from one MySQL database server (the source) to be copied to one or more MySQL database servers (the replicas). Replication is asynchronous by default; replicas do not need to be connected permanently to receive updates from the sou

dev.mysql.com

Database의 리플리케이션(Replication)이란? (nesoy.github.io)

 

Database의 리플리케이션(Replication)이란?

 

nesoy.github.io

 

레플리케이션 - 위키백과, 우리 모두의 백과사전 (wikipedia.org)

 

레플리케이션 - 위키백과, 우리 모두의 백과사전

위키백과, 우리 모두의 백과사전. 레플리케이션(Replication)은 데이터 저장과 백업하는 방법과 관련이 있는 데이터를 호스트 컴퓨터에서 다른 컴퓨터로 복사하는 것인데 이때 다른 컴퓨터가 반드

ko.wikipedia.org

 

반응형

'백엔드 > DB 상세 정보' 카테고리의 다른 글

MyBatis 개요  (0) 2022.10.14
정규화  (0) 2022.08.26
N+1 문제  (0) 2022.02.28
ORM  (0) 2022.02.21