Distributed Filesystem Forensics-XtreemFS: A Case Study


Discuss Distributed Filesystem Forensics-XtreemFS as a Case Study.

The amounts of data captured in the recent times have increased at a huge rate. The data are also stored and disseminated in the electronic formats. The distributed file systems are used for storing high volumes of data at a steady pace and also a variety of information can be stored in the systems efficiently in with the help of the technologies such a big data, cloud computing  and the other technologies that are associated with them. These technologies posses the capabilities of providing legal advantages for the users. And hence, as a result of this the big data technology is ranked among the top ten technology trends in the recent past (Chua et al., 2103). According to various reports it has been revealed that the big data technology is going to generate about 232 billion dollars from 2011 to 2016 (Casonato et al., 2013). The general definition of the big data technology can be stated as “high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making”.  There various type of procedures that are completed with the help of big data technologies and various organizations such as the big businesses and governments of the countries have great responsibilities on them take help of this technology to work efficiently and deliver the requirements on time. In addition to this, the cloud computing systems are also associated with the use of the technologies. There have been many technologies that have been subjected to unethical exploitation and have been used by the mal practitioners for wrong purposes (Butler & Choo, 2013). Hence, digital evidences are to be gathered in order to track the criminal activities and restrict them to a certain level. For the performing the investigations a few steps are to be followed; firstly, there is requirement for the gathering of the evidences about the use of the electronic devices for performing the criminal activities. Then the device is to be tracked and stopped so the criminal no longer has access to the device and not perform the desired criminal activity. The process of identification of the devices that is used for the criminal activity is known as the digital forensic procedure.

This report aims to address the gap in knowledge of the readers about the digital forensic techniques.  It addition to this, the need for the digital forensics have also been described in this report.

Experiment Environment

The report contains the information about the in-depth forensic experiment performed on the XtreemFS project. The report provides the details about the experiment environment, directory services, metadata and replication catalog, object storage devices and the collection of the evidences form the distributed file system.

For the Experiment environment the reader should be provided with the information about the XtreemFS architecture overview. The XtreemFS is virtual network- provisioned file system that would be used for delivering the file storage services provided for the cloud solution for the systems. It provides various types of key services such as replication and striping. Other similar types of systems similar to the XtreemFS are GlusterFs, Ceph and BeeFS (Gluster, 2014) (Ceph, 2014) (Fraubhofer, 2014). It is important that a strict distinction in between the back end and the front end processes of storage systems. The frontend systems are generally not involved with the storage system. They are generally involved with the collection of the data that are provided by the users on the systems. The backend processes are generally connected to the storage systems. After the collection of the data in the front end of the system the data is stored in the databases with the help of the back end systems. In addition to this are various types of online storage systems such as the Drop box, Google Dive and Sky drive systems. The clouds provides also provides various type of services such as the IaaS, PaaS, SaaS, The IaaS is the Infrastructure as a Service, the PaaS is the Platform as a Service and the SaaS is the Software as a Service provided to the users form the cloud providers. In addition to this, it should be noted that the XtreemFS have three main components and they are Directory Service (DIR), the Metadata and Replica Catalog(s) (MRC) and the Object Storage Device(s) (OSD).

In addition to this there is the experiment environment overview for the systems. There can be security issues with the systems that are being used in the XtreemFS. There can be users with the root privileges can access quite a high volume of the data stored in the file system. Gaining access to the system the user can harm the system to a large extent. This consists of a number of virtual machine that would be providing the various features for the XtreemFS features such as the DIR, OSD and the client that would be depending on the type of experiment performed on the systems (Almulla, Iraqi & Jones, 2013). The virtual machines are generally simplified for the forensic disk image collection and also for the simulation of the cloud configurations. These would help in hosting both the storage functions and also the distributed computations. The clients were also hosted as the Virtual Machines.

Directory Services

Within the features of the XtreemFS system the first process involves with the Directory Services. The feature helps the systems to store the data that is required for systems to define the data and also locate the various types of technical components of the system (Casanota et al., 2013). It would be beneficiary for the investigator to start from the components of the system and extend till the installation of the system.

There are three types of value which are potential for the forensic investigation on the DIR server:

Volatile environment metadata: These data generally involve the logical addresses of the different location and also information about various types of nodes and the unique identifiers for the required node.

Non-volatile environment metadata: The non-volatile environment metadata that are subjected to change and some of the data might be committed to the non-volatile storage of the systems. The logging data and the backup data are generally considered for this case.

Configuration files: The configuration files contain the data that are generally used for the better understanding of the system. Configuration information is generally the information about the network, the list of authentications and the operational information about the systems.

The evidence source identification and preservation is an important step for the identification of the anomalies in the systems and for the forensic report about the system. The DIR is used mostly for the identification procedures and the assistance in the preservations of the storage data that is distributed. Hence, the data for investigation present in the XtreemFS is being identified in this step. Hence, before the assessment of the value of DIR the location should be identified. The server would be used for the identification of the file system that is mounted on the system. After the verification the file system goes through further steps of verifications. Once the access to the systems is gained and the location of the files system is gained, the configuration data can be easily obtained.

The collection and the examination analysis of the collected data should also be performed by the forensic testing system(Quick, Martini & Choo, 2014). These involve the collection of data by the practitioner and also identify the directories present in the systems. In addition to this, various other steps such as the address mapping, services registry and configurations data are to be traced for obtaining the value for the systems. In addition to this, system time and the operation logs for the systems should also be checked.   

Metadata and Replication Catalog

The Metadata and the replication catalog have the capabilities to store a wide range of metadata that are related to the information about the XtreemFS and the file and the directories that are stored in them (Contrail, 2013). This data can be used in the forensic report to locate the components and also for understanding of files in XtreemFS system. There are three types of value which are potential for the forensic investigation of MRC server:

Volatile and non-volatile construct metadata: These data generally involve the metadata that are used for the definition of the internal constructs of the XtremmFS file system.

Volatile and Non-volatile file metadata: These data involves the number of data such as the higher level of metadata like number of OSD and the low level metadata such as the filename, size and the date modified by the files.

Configuration files: The configuration files contain the data that are generally used for the better understanding of the system. Configuration information is generally the information about the network, the list of authentications and the operational information about the systems.

For the evidence preservation and source identification the data gathered from DIR server should be used for the identification of location of servers (Dukaric & Juric, 2013). The MRC status page is used for listing the number of configuration directories that are used for the forensic report. They are responsible for authentication service to the virtual file systems. “org.xtreemfs.common.auth.NullAuthProvider” is the default authentication provider that provides the authentication for the local operating systems.

The collection and examination analysis of the MRC data is very critical procedure as it not only provides the evidences but also for the remaking of the XtreemFS condition on the off chance that it is definitely not at present working or completely available. It is additionally basic if a professional tries to recreate records from physical extractions of the pertinent OSD segments (Dykstra & Sherman, 2013). There two methods for the collection of the MRC databases. In the experiment it was found that the broad structures of the MRC XML file as follows:

The root element is “FILESYSTEM”; “DBVERSION” is a numerical identifier of the database version. The next element was “VOLUME” where “ID” is the UUID for the volume generated by the XtreemFS system. B “NAME” is the volume name entered by the user which created the volume. “ACPOLICY” is the numerical identifier for the “Authorization Policy” “ID” is the file ID assigned by XtreemFS for the directory B “LOCATION” e The UUID of the OSD that stores a stripe/ replica of the file.

Object Storage Devices

The OSDs is the main component for the forensic analysis as the OSD stores the data stripes which permit the professional to recreate the records. The OSD likewise has the ability for a specialist to recoup erased document parts utilizing existing criminological procedures on the fundamental file system (Federici, 2014). The specialist’s distinguishing proof and examination of past parts (DIR and MRC) will give the specialist with the data they have to decide the type of OSDs in the system contain the information they are looking to gather and enable the professional to recognize the important individual records on the OSDs. There are two types of value which are potential investigation on OSD server:

Non-volatile file metadata: The non-volatile provides the information about the content of the file in the file system. In addition to this, the local metadata and the log data fall into this type of category.

Configuration files: The configuration files contain the data that are generally used for the better understanding of the system. Configuration information is generally the information about the network, the list of authentications and the operational information about the systems. Additionally they also contain several authentication data such as the user identification and the passwords for the users.

The most important directory for the forensic analysis is the “object_dir” (Hale, 2013). The default value for the decretive is set to “/ var / lib / xtreemfs /objs /”. This directive can be used for searching of the data for the construction of the forensic report. The “object_dir” could be mounted on the system where the operating system provides the remote access to the storage of the system (Hooper, Martini & Choo, 2013). The OSD HTTP status provides the information on the runtime statistics of the systems, and these are provided in the terms of usage of the system.

The collection and the examination analysis of the collected data involve the collection of the OSD data and the procedures may vary according to the requirements of the experiment that is to be performed on the system. The data is generally collected by mounting the file system and the relevant data are collected from the files. The OSD operation log is stored at “/var/log/xtreemfs/osd.log and from this the data are collected and are used for analysis. The usefulness information that is obtained will be very case specific and during the implementation of the system the user should use appropriate level of logging and hence, the evidences would be available later when required.

Digital Forensic Techniques

Collection of evidences from a distributed file system

 The collections of evidence from a distributed file system provide overviews of the generic procedure that is used for the collection of evidences for the different components of the XtreemFS filesystem and define the methods for the acquisition of the data from the file system (Patel et al., 2013).   This evidence provides the information about the system that a procedure must be taken after to guarantee collection of information and metadata to farthest conceivable degree from a circulated file system environment. In the event that a professional took after existing practice what’s more, endeavored to gain a bit stream picture of the storage device (for this situation the OSDs), obviously an extensive measure of metadata (accessible at the MRC) can miss. Metadata put away by DIR likewise be essential as some portion of proof gathering or condition reproduction (Kruger et al., 2014). Various types of situations can arise during these processes. The process consists of the three following sub processes: The directory service where the data are collected examined for the determination of the file system and the knowledge for the location of the file system. There are various ways of implementing this type of procedures (Shartono, Setiawan & Irwanto, 2014). Regardless the data for the forensic report should be obtained from the directory of the metadata. The metadata storage uses the directory services for locating metadata server and to identify the files that are of use. The examination of the data would result in the reduction of the quantity of data that is to be analyzed. The data storage is uses the environment metadata that is collected from the directory services and the information that is useful is obtained from the metadata storage (Relvas et al., 2013).  After the examination of the data the experiment would have the necessary data to perform the analysis procedure and forensic report can be easily available.


For conclusion it can be easily said that the chances for the exploitation of the dataset have increased with the advancement in the technologies such as big data and cloud computing. There have been many technologies that have been subjected to unethical exploitation and have been used by the mal practitioners for wrong purposes. Hence, digital evidences are to be gathered in order to track the criminal activities and restrict them to a certain level. For the performing the investigations a few steps are to be followed; firstly, there is requirement for the gathering of the evidences about the use of the electronic devices for performing the criminal activities. The process of identification of the devices that is used for the criminal activity is known as the digital forensic procedure. The need for the digital forensics has been described in this report. This report also aims to address the gap in knowledge of the readers about the digital forensic techniques. The report contains the information about the in-depth forensic experiment performed on the XtreemFS project which is a distributed file system is implemented in environments (Kleineweber, Reinefeld & Schutt, 2014). The details about the experiment environment, directory services, metadata and replication catalog, object storage devices and the collection of the evidences form the distributed file system have been provided in this report. In addition to this, the report has been used to provide the details about understanding the technical and the process related issues that is related to the collection of the data for the evidences of the digital systems in the file systems. In addition to this, information about various types of key technical terms such as Directory Services, Metadata Storage and data storage have been provided in this report. The report also displays the importance of forensic procedures and the various types of data can be provided to present them in front of the court of law. Further research on the topic involves the validation of the system on which the processes are conducted, and the process that would be used if the research was conducted on the other file systems. The forensic approach for the cloud services is also a field that requires a thorough research.


