1. Consistency
· Although
UNIX FS keeps consistency in directories and concurrent writes to a file, GFS
can be better through the defined state:
consistency and all updates from writes and append are known to clients.
· However,
GFS is still possible to have an undefined
but consistent state.
2. Scalable Performance:
· To
achieve performance, GFS separates control and data plane.
1. Master
server is responsible for control plane.
If clients want to know which one Chunkserver they can request the
replication from, then they can ask the Master server.
2. Chunkserver
is responsible for data plane. After clients asks Master server to get the
exact chunkservers, they can request replications, then clients can transfer
data between themselves and chunkservers.
· To
achieve scalability, GFS manages different levels of replications such as Master
server, primary replica, and secondary replicas throughout chunkservers so that
enormous amount clients can request same content from different chunkservers.
· Because
of so many replications in different chunkservers, GFS uses the pipeline technique to achieve high
performance of updates.
1. To completely
utilize each machine’s networking bandwidth, each chunkserver, masterserver,
and client transfer the data as soon as possible, rather the sliced data into multiple
receivers.
2. To keep
away from high-latency connections and network bottleneck, according to the
network location, each machine just copies the data to the nearest machine,
which does not receive the same data.
· UNIX FS
has little scalability because, in the 1970s, networking was not popular like
now. They did not have enough motivation to achieve scalable performance like
GFS in 2003.
3. Reliability
· Unlike
UNIX FS, which has no self-made visible
backup for files, GFS has chunk replications in many chunkservers. When one
replication failed, clients can request the replications from other chunkservers.
· In
GFS’s internal management, it does not only replicate data chunks but also maintains replication for management chunk of Master server in different
chunkservers. Therefore, if the central manager, Master server, crashed, then
client can request information from the replications of Master server so that
GFS achieves high reliability.
4. Availability
· According
to GFS assumption, server failures are the norm so that creators made GFS
achieve high availability through the techniques Fast Recovery, Chunk and
Master Replication.
· For Fast
Recovery, masters and chunckserver are regularly shut down to restore their
state and restart their services so that they can prepare well for any abnormal
termination in the future.
· For Chunk
Replication, because chunk replications are in chunkservers of different racks,
clients can acquire the file namespace’s different parts from servers, even
though some of chunk replication might be malfunctioned.
· Master
Replication is not only helpful to reliability but also availability. GFS has
shadow masters falling behind primary master servers with management
information. Although shadow servers only provide Read Access, they still can
provide some requested information to clients.
5. Fault
tolerance
· GFS
achieves fault tolerance through three techniques: Master and Chunk
replication, operation log, and data integrity.
· For
Master and Chunk replication, client does not have to worry about any failure
from Master or Chunkserver because GFS will restore state and restart service through
replications of different servers.
· For Operation
Log, GFS will save operations after the last checkpoint. When servers were malfunctioned
for any reason, to restart these servers, GFS will read the operation log to
load a checkpoint and replay each operation in the log.
· For Data
Integrity, GFS uses checksum to verify the integrity of each chunk. GFS also
enhances checksum mechanisms through partial updates of the tail in each
checksum because appending is the major file operation.
6. Transparency
· For
fault tolerance and load balance, GFS can achieve transparency through the
location independent namespace stored in Master server.