A distributed database is a database that consists of two or more files located in different sites either on the same network or on entirely different networks. ... A centralized distributed database management system (DDBMS) integrates data logically so it can be managed as if it were all stored in the same location.
Distributed File System (DFS) is a set of client and server services that allow an organization using Microsoft Windows servers to organize manydistributed SMB file shares into a distributed file system.
A Distributed database is a database in which not all storage devices are attached to a common processor. It may be stored in multiple computers, located in the same physical location; or may be dispersed over a network of interconnected computers. Unlike parallel systems, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components.
System administrators can distribute collections of data (e.g. in a database) across multiple physical locations. A distributed database can reside on organized network servers or decentralized independent computers on the Internet, on corporate intranets or extranets, or on other organization networks. Because distributed databases store data across multiple computers, distributed databases may improve performance at end-user worksites by allowing transactions to be processed on many machines, instead of being limited to one.
A Distributed File system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources.
Distributed file systems differ in their performance, mutability of content, handling of concurrent writes, handling of permanent or temporary loss of nodes or storage, and their policy of storing content.
The difference between distributed databases and distributed file system is shown:
01. Distributed databases:
In a distributed database, there are a number of databases that may be geographically distributed all over the world.
A distributed DBMS manages the distributed database in a manner so that it appears as one single database to users.
A distributed database is a collection of multiple interconnected databases, which are spread physically across various locations that communicate via a computer network.
A distributed database is a database that consists of two or more files located in different sites either on the same network or on entirely different networks. Portions of the database are stored in multiple physical locations and processing is distributed among multiple database nodes.
Features:
Databases in the collection are logically interrelated with each other. Often they represent a single logical database.
Data is physically stored across multiple sites. Data in each site can be managed by a DBMS independent of the other sites.
The processors in the sites are connected via a network. They do not have any multiprocessor configuration.
A distributed database is not a loosely connected file system.
A distributed database incorporates transaction processing, but it is not synonymous with a transaction processing system.
Advantages of Distributed databases:
Following are the advantages of distributed databases.
Modular Development − If the system needs to be expanded to new locations or new units, in centralized database systems, the action requires substantial efforts and disruption in the existing functioning. However, in distributed databases, the work simply requires adding new computers and local data to the new site and finally connecting them to the distributed system, with no interruption in current functions.
More Reliable − In case of database failures, the total system of centralized databases comes to a halt. However, in distributed systems, when a component fails, the functioning of the system continues may be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be met from local data itself, thus providing faster response. On the other hand, in centralized systems, all queries have to pass through the central computer for processing, which increases the response time.
Lower Communication Cost − In distributed database systems, if data is located locally where it is mostly used, then the communication costs for data manipulation can be minimized. This is not feasible in centralized systems.
02. Distributed file system:
The Distributed File System (DFS) functions provide the ability to logically group shares on multiple servers and to transparently link shares into a single hierarchical namespace. DFS organizes shared resources on a network in a treelike structure.
DFS supports stand-alone DFS namespaces, those with one host server, and domain-basednamespaces that have multiple host servers and high availability. The DFS topology data for domain-based namespaces is stored in Active Directory. The data includes the DFS root, DFS links, and DFS targets.
A distributed file system (DFS) is a file system with data stored on a server. The data is accessed and processed as if it was stored on the local client machine. The DFS makes it convenient to share information and files among users on a network in a controlled and authorized way. The server allows the client users to share files and store data just like they are storing the information locally. However, the servers have full control over the data and give access control to the clients.
Distributed file system (DFS) is a method of storing and accessing files based in a client/server architecture. In a distributed file system, one or more central servers store files that can be accessed, with proper authorization rights, by any number of remote clients in the network.
Each DFS tree structure has one or more root targets. The root target is a host server that runs the DFS service. A DFS tree structure can contain one or more DFS links. Each DFS link points to one or more shared folders on the network. You can add, modify and delete DFS links from a DFS namespace. When you remove the last target associated with a DFS link, DFS deletes the DFS link in the DFS namespace. (In earlier documentation, DFS links were called junction points.)
A DFS link can point to one or more shared folders; the folders are called targets. When users access a DFS link, the DFS server selects a set of targets based on a client's site information. The client accesses the first available target in the set. This helps to distribute client requests across the possible targets and can provide continued accessibility for users even when some servers fail.
Features:
1. Transparency:
There are four types of transparencies desirable:
Structure Transparency: Although not necessary for performance, scalability and reliability reasons, adistributed file system normally uses multiple file servers. Each file server is normally a user process orsometimes a kernel process that is responsible for controlling a set of secondary storage devices on the node onwhich it runs. In multiple file servers, the multiplicity of the file servers should be transparent to the clients of adistributed file system.
Access Transparency: Both local and remote files should be accessible in the same way. That is, the file systeminterface should not distinguish between local and remote files and the file system should automatically locate anaccessed file and arrange for the transport of the data to the client’s site.
Naming Transparency: The name of a file should give no hint as to where the file is located. Furthermore, a fileshould be allowed to move from one node to another in a distributed system without having to change the name ofthe file.
Replication Transparency: If a file is replicated on multiple nodes, both the existence of multiple copies andtheir locations should be hidden from the clients.
2. User Mobility:
In a distributed system, a user should not be forced to work on a specific node, but should have theflexibility to work on different nodes at different times. Furthermore, the performance characteristics of the file systemshould not discourage users from accessing their files from workstations other than the one at which they usually work.
Advantages of Distributed file system:
Following are the advantages of distributed file system:
Distributed file systems can be advantageous because they make it easier to distribute documents to multiple clients and they provide a centralized storage system so that client machines are not using their resources to store files.
There are a number of potential advantages using a distributed system. One of the easiest to understand is redundancy and resiliency.
If a company is serving its website from a distributed set of servers, rather than a single server, it may be able to stay up even if one server physically fails.
If data is distributed between multiple servers or disks, a common occurrence in modern distributed systems, there may not be any data loss even if a storage device ceases to work.
To me, this is a very strange question that boils down to whether or not there is a
difference between file systems and databases. And certainly there are. This is my very high level view on that matter: file systems deal with hierarchically grouped, named chunks of binary data (files) while databases (rather database management systems, DBMS) operate on named and typed data items. This leads to very different APIs for accessing the data within a DBMS (e.g. SQL) and file system (e.g. POSIX). Now, due to the fact that DBMS have more information available on the data they store, they can do more sophisticated things with it such as indexing.
The distribution aspect in both domains has undergone slight changes in the last decade. While earlier a distributed file system was more or less a synonym for remote file system (a file system on a different host), I don't think this holds any more. (at least to me) distributed file systems are file systems that potentially span across more than one (server) host. This includes e.g. Ceph and HDFS, but not NFS (ignoring things like pNFS). Similarly, IMHO distributed DBMS are DBMS that span across several servers.
Due to the differentiation we have been seeing in the last 10-15 years in the field of distributed DBMS, cf. the rise of NoSQL DBMS, the difference between DBMS and file systems has been blurred in some places. This is particularly true for pure key-value stores that do not provide more functionality than a simple file system.
Final note: DBMS do not necessarily require files (cf. caches and in-memory databases) and file systems do not necessarily require disks (cf. temp file systems).