4
Research and Implementation of Data Recovery Technology Based on WINDOWS FAT Yao Qingshan Dept. of Computer Science and Engineering Henan Institute of Engineering Zhengzhou, China E-mail: [email protected] Gu Chunying Dept. of Computer Science and Engineering Henan Institute of Engineering Zhengzhou, China E-mail: [email protected] Abstract—In this article, the data recovery technology's basic theory and relevant technical background knowledge are given, and on the basis of the understanding of documents recovery principles, using WinHex achieve the recovery of the mistakenly cut document under FAT32 file system. Data Recovery's basic theories including: the basic concept of data recovery, the basic level, the basic classification, and the causes and symptoms of the loss of data .Also, the domestic and international data recovery technology states are analyzed and compared. Principle of disk storage and FAT file system are detailed carried out in the paper, focusing on the overall structure of the disk data storage, the main boot sector and the structure of directory under FAT32 file system. And on the basis of the principles and feasibility analysis of documents restoration under the Windows FAT32 file system, has used WinHex to realize the recovery of lost data manually. Has carried on the preliminary discussion to the data recovery programming, proposed the basic programming mentality of recovering the files deleted under the FAT file system, and after analyzed the possible difficulty and questions, has proposed the corresponding solution and the improvement. Keywords- data recovery; storage principle; file allocation table; master boot record I. INTRODUCTION Domestic research is deep in soft-recovery, but in comparison with foreign professional company, there are still big gaps. The key technology in this respect is to fully understand the storage structure, especially Linux and Apple systems, and even the server operating system; these are weak for domestic data recovery industry. Most of the domestic data recovery started from the hard drive maintenance, so technical strength is not good, often leads to low recovery rate. If domestic data service providers want to catch up with foreign countries in this field, we will need a lot of money compared with the data recovery industry, the prosperity of the domestic, domestic data recovery acceleration of technological development is still relatively optimistic. Pure software recovery has great limitations; the prerequisite is to need the hard disk which is useful. Therefore, some minor defects in hard disk, and a little repair and let the hard disk use normally again after the data recovery software; for those which in any case could not move the hard disk, the software is powerless, and we need to use relatively high cost of combination of hardware and software recovery methods. Use of data recovery software and hardware combination of methods, the key is to restore equipment used in the equipment; the current domestic ownership data recovery equipment, software and hardware combination can do to restore the way the company, according to statistics being only two, which are located in Beijing and Guangdong. Hard recovery, databases and special-shaped systems, the United States and Russia, as a leader, it is because the main technologies are controlled in the hands of Western countries, but the biggest difference is the level of coverage of recovery, said the U.S. military to resume coverage of 6 to 9 times data, Russia could resume cover 3 to 4 times the data, IBM itself cost 600 million U.S. dollars of research can be covered by 2 or 3 times to restore the data, because these are related to the core national secrets, the specific details and reliability is still unknown, however, Japan's carrier requirements to be classified low- level formatted from 6 to 9-th power of recyclable, or able to explain the problem. After the data is overwritten, and then to restore the case, it is very difficult, at present only a few countries and hard drive manufacturers can do so, its application is generally related to national security. This in-depth research in China started late, in particular the lack of basic input of funds, but now we have built with state investment a professional data security research center, combined with the existing research base, there is a long-term research planning, I believe that soon a group of research results will be used to serve the community [11]. Since the late nineties of the twentieth century, China's data recovery industry flourished. The number of professional data Recovery Company in 1996 is more than a dozen from the development of more than 300 in 2002 (only officially registered number of company’s subject). The company's business is also simple for customers from the initial repair the hard disk sub-regional development to the development of independent intellectual property rights of data recovery software. At present, a considerable number of companies are not only a data recovery services and also data recovery software development. At present, the data recovery software on the market can be said that the vast amount of terminology, each has his strengths. But mainly software-based in foreign countries, and our self-developed software, data recovery classes to demonstrate the following characteristics: (1) small number of products, functions on a single track; (2) recovery success rate and product stability, and some other key Technical indicators are low; (3) still need a breakthrough in a number of key technologies; (4) the product of our own characteristics, some products still in the stage of imitation [6]. 2010 International Conference on Machine Vision and Human-machine Interface 978-0-7695-4009-2/10 $26.00 © 2010 IEEE DOI 10.1109/MVHI.2010.214 549

[IEEE 2010 International Conference on Machine Vision and Human-machine Interface - Kaifeng, China (2010.04.24-2010.04.25)] 2010 International Conference on Machine Vision and Human-machine

Embed Size (px)

Citation preview

Page 1: [IEEE 2010 International Conference on Machine Vision and Human-machine Interface - Kaifeng, China (2010.04.24-2010.04.25)] 2010 International Conference on Machine Vision and Human-machine

Research and Implementation of Data Recovery Technology Based on

WINDOWS FAT

Yao Qingshan

Dept. of Computer Science and Engineering Henan Institute of Engineering

Zhengzhou, China

E-mail: [email protected]

Gu Chunying

Dept. of Computer Science and Engineering Henan Institute of Engineering

Zhengzhou, China

E-mail: [email protected]

Abstract—In this article, the data recovery technology's basic

theory and relevant technical background knowledge are

given, and on the basis of the understanding of documents

recovery principles, using WinHex achieve the recovery of

the mistakenly cut document under FAT32 file system. Data

Recovery's basic theories including: the basic concept of data

recovery, the basic level, the basic classification, and the

causes and symptoms of the loss of data .Also, the domestic

and international data recovery technology states are

analyzed and compared. Principle of disk storage and FAT

file system are detailed carried out in the paper, focusing on

the overall structure of the disk data storage, the main boot

sector and the structure of directory under FAT32 file

system. And on the basis of the principles and feasibility

analysis of documents restoration under the Windows

FAT32 file system, has used WinHex to realize the recovery

of lost data manually. Has carried on the preliminary

discussion to the data recovery programming, proposed the

basic programming mentality of recovering the files deleted

under the FAT file system, and after analyzed the possible

difficulty and questions, has proposed the corresponding

solution and the improvement.

Keywords- data recovery; storage principle; file allocation

table; master boot record

I. INTRODUCTION

Domestic research is deep in soft-recovery, but in comparison with foreign professional company, there are still big gaps. The key technology in this respect is to fully understand the storage structure, especially Linux and Apple systems, and even the server operating system; these are weak for domestic data recovery industry. Most of the domestic data recovery started from the hard drive maintenance, so technical strength is not good, often leads to low recovery rate. If domestic data service providers want to catch up with foreign countries in this field, we will need a lot of money compared with the data recovery industry, the prosperity of the domestic, domestic data recovery acceleration of technological development is still relatively optimistic. Pure software recovery has great limitations; the prerequisite is to need the hard disk which is useful. Therefore, some minor defects in hard disk, and a little repair and let the hard disk use normally again after the data recovery software; for those which in any case could not move the hard disk, the software is powerless, and we need to use relatively high cost of combination of hardware and software recovery methods. Use of data recovery software and hardware combination of methods,

the key is to restore equipment used in the equipment; the current domestic ownership data recovery equipment, software and hardware combination can do to restore the way the company, according to statistics being only two, which are located in Beijing and Guangdong.

Hard recovery, databases and special-shaped systems, the United States and Russia, as a leader, it is because the main technologies are controlled in the hands of Western countries, but the biggest difference is the level of coverage of recovery, said the U.S. military to resume coverage of 6 to 9 times data, Russia could resume cover 3 to 4 times the data, IBM itself cost 600 million U.S. dollars of research can be covered by 2 or 3 times to restore the data, because these are related to the core national secrets, the specific details and reliability is still unknown, however, Japan's carrier requirements to be classified low-level formatted from 6 to 9-th power of recyclable, or able to explain the problem. After the data is overwritten, and then to restore the case, it is very difficult, at present only a few countries and hard drive manufacturers can do so, its application is generally related to national security. This in-depth research in China started late, in particular the lack of basic input of funds, but now we have built with state investment a professional data security research center, combined with the existing research base, there is a long-term research planning, I believe that soon a group of research results will be used to serve the community [11].

Since the late nineties of the twentieth century, China's data recovery industry flourished. The number of professional data Recovery Company in 1996 is more than a dozen from the development of more than 300 in 2002 (only officially registered number of company’s subject). The company's business is also simple for customers from the initial repair the hard disk sub-regional development to the development of independent intellectual property rights of data recovery software. At present, a considerable number of companies are not only a data recovery services and also data recovery software development.

At present, the data recovery software on the market can be said that the vast amount of terminology, each has his strengths. But mainly software-based in foreign countries, and our self-developed software, data recovery classes to demonstrate the following characteristics: (1) small number of products, functions on a single track; (2) recovery success rate and product stability, and some other key Technical indicators are low; (3) still need a breakthrough in a number of key technologies; (4) the product of our own characteristics, some products still in the stage of imitation [6].

2010 International Conference on Machine Vision and Human-machine Interface

978-0-7695-4009-2/10 $26.00 © 2010 IEEE

DOI 10.1109/MVHI.2010.214

549

Page 2: [IEEE 2010 International Conference on Machine Vision and Human-machine Interface - Kaifeng, China (2010.04.24-2010.04.25)] 2010 International Conference on Machine Vision and Human-machine

II. STRUCTURE OF HARD DISK AND ANALYSIS OF FILE

SYSTEM

A. The Logical Structure of Hard Disk

The early hard disk technology is to use the head, cylinder, sector to access to the disk; this disk access method is called 3D addressing modes. In which the disk head number indicates total number of heads, that is, there are several side disc, and the maximum 255 (using 8 bits is stored); the number of cylinders to drive each disc has a few cylinders, a maximum of 1023 (with 10 binary bits are stored); a number of sectors per track, said there are a few sectors, a maximum of 63 (with 6 bits is stored). Each sector is generally 512B. In the BIOS interrupt 13H entry parameters, the register CH is the head number, the value of 0H-FEH (up to 255 heads), medium and low registers CL No. 6 for the sector and its value for the 1H-3FH (maximum of 63 sectors ), registers low-DH for the cylinder No. 8, in the high register CL cylinder No. 2 to the high 2, that is, a maximum number of cylinders from 10-bit binary number indicates, (1111111111) 2 = ( 1023) 10, which can be expressed up to the number of cylinders for 0-1023, a total of 1024. The granting of such access methods shows that the greatest access to the disk capacity of 255 × 1024 × 63/1018576 = 8032.5MB, only about 8GB of space.

Today, more than 8GB of hard disk is because taking a newer disk-access technology - expanding the hard disk interrupts techniques. The technology uses a linear addressing mode to access the hard disk in order to access the sector as a unit, breaking the 8GB limit, and joined to the removable media support.

B. The Overall Structure of the Hard Disk Data Storage

From a logical point of view, the first sector of the hard disk is the master boot sector, sector number is 0, once the No. 0 sector is damaged, and the hard disk was paralyzed [8].

In the main boot sector, followed by 62 reserved sectors, these sectors under normal circumstances is empty (that is, the whole composition from the value 00H), and the system was not used, so some of the more often the underlying software, data or programs will on these sectors in order to achieve some of the more unique features.

Immediately behind the reserved sector, under normal circumstances is the beginning of the first partition, that is, the first partition boot sector, accounting for one sector. Boot sector is still behind the reserved sector, the same is generally constituted by 00H.

If the first one partition is FAT16 or FAT32, in the reserved sector is the first one behind the FAT partition area. FAT area generally has two exactly the same FAT chain, the purpose is that when FAT1 available when FAT2 recover damaged.

FAT area after the end of the system's root directory area, the directory number of directory entries constitutes a district. FAT16 structure of each directory entry of 32 bytes constitute, FAT32 structure, in addition to a directory entry 32B of the directory entry, there is a description of the file name directory entry structure, so FAT32 system will be able to use some of the original FAT16 can not be used for file names, such as allowing long file names, allowing the use of space, case sensitive and so on. The

description of the file name of the directory entry and its length is variable, but always an integer multiple of 32B.

III. RESTORE OF ACCIDENTALLY DELETED FILES

A. File Recovery Principles

Many computer users may have had experience accidentally deleted files, in the early era of MSDOS 5.0, Microsoft added an Undelete command in its DOS system, and the command is used to recover accidentally deleted files. Until MSDOD 6.22, the order still exists. With the Windows era, the order was no longer supported by Microsoft's system, replaced by the file icon in Windows, right-click and select "Delete" command after the file is not being really deleted, they are moved to the one called the "Recycle Bin" system directory, but when the Recycle Bin to delete the file reaches a certain size, then delete the file can not be re-entered the Recycle Bin, but are really deleted. When the user empty the recycle bin, these files in the Recycle Bin was only truly delete, or users to use Shift + Delete key combination to really flashed a selected file or folder, these are actually deleted the file, if one have accidentally deleted files, the command provided by Microsoft can restore them.

When windows system deletes a file, and there is no real emptied of its file data, but made a deletion of its mark, a different file system, remove the tags vary. For the FAT file system, just in its directory entry in the file name of the region into the first byte hexadecimal value E5H, but if it is used to describe the FAT32 system, all the long file name directory entries are also the first byte be marked E5H, and then the directory FAT area occupied by clusters is marked as not being used, so good for free up space for other files.

B. Manual Restore Accidentally Deleted Files in FAT

volumes

If you accidentally deleted a file in the FAT volume, as long as the directory entry exists, and that its data is not overwritten, it is possible to restore them, but the FAT file system FAT zone when you delete a file in the file chain is empty, The directory entry is only the beginning of the file data cluster number of information and the file size of the information, so to restore the file can only assume that the file on the disk occupied by clusters is continuous, according to the size of the file to re - contiguous clusters assigned to it. We often do defragmentation of FAT volumes FAT volumes than never finishing in the possibility of accidentally deleted file recovery is much bigger, because the file recovery FAT volume is to assume that document does not fragments [8].

If the directory entries have all been destroyed in theory, as long as the data is not overwritten, the possibility of recovery is there, but only if it is necessary to restore the file format for a better understanding of, each file has a specific file header, such as:. Jpg files are always hexadecimal FFH D8H FFH E0H or FFH D8H FFH E1H start, zip files are always hexadecimal number 50H 4BH 03H 04H 14H start,. exe files are always 16 binary number 4DH 5AH began to know the data on disk in the starting position, to restore the data is not a big problem. This recovery method in a number of data recovery tools can be reflected, such as the resumption of error FinalData

550

Page 3: [IEEE 2010 International Conference on Machine Vision and Human-machine Interface - Kaifeng, China (2010.04.24-2010.04.25)] 2010 International Conference on Machine Vision and Human-machine

formatted volumes, if the file format before the directory entry stored in the root directory, they could only restore a specific type of documents; and if EasyRecovery The Raw recovery mode.

IV. DESIGN OF DATA RECOVERY PROGRAM

A. Disk Operation in Windows Operating Systems

In Windows NT/2000/XP/2003 systems it is necessary to operate on the physical disk, you can use the CreateFile function to open a file named \\.\ PHYSICALDRIVE0 documents. Open the file, you can use ReadFile or WriteFile function is the same as for ordinary files on the physical disk reads and writes. Operation after the end of the file is closed with the CloseHandle function; the same can be SetfilePointer function to move the file pointer to the specified sector to operate.

In Windows NT/2000/XP/2003 systems it is necessary to operate on the logical disk, you can use the CreateFile function to open a file named \\.\ X: file (where X represents the drive letter). Open the file, you can use ReadFile or WriteFile function is the same as for ordinary files on the logical disk read and write operations can also be used SetfilePointer function to move the file pointer, in the operation after the end of the file is closed with the CloseHandle function.

To use the above method of disk operations, the system Windows NT as the core logic of the disk C, under the boot sector to read. The main code segment is as follows

invoke CreateFile, offset FileName, \ GENERIC_READ, FILE_SHARE_READ OR

FILE_SHARE_WRITE, \ NULL, OPEN_EXISTING, NULL, NULL mov [hFile],eax cmp eax,INVALID_HANDLE_VALUE jnz read invoke ShowError,offset ErrCreate ; read: ; invoke ReadFile,eax,offset Buffer,512,offset

readed,NULL cmp eax,0 jnz show invoke ShowError,offset ErrRead ; show: invoke ShowBuffer ; invoke CloseHandle ,[hFile] ; Program running results shown in Figure 1:

Use WinHex to open C after-hours, you can see the running results and the data of its boot sector are consistent.

Figure 1. running results

B. Program Design Ideas of Restore Accidentally

Deleted Files in FAT File System

The main functions of Program to design are to find FAT volume deleted file data, and then recovery by the documents specified by the user name and the letter .

According to the analysis of the FAT32 file system to restore accidentally deleted files by hand in the principles and process of the general flow of data recovery procedures are as follows:

(1) Require the user to enter where the partition to restore the file and file name, and then under the boot sector at offset 52H - 59H whether the "FAT32" to determine whether the partition FAT32 volumes.

(2) If it is not FAT32 volume the error message prompted out, if it is FAT32 volume is the partition boot sector is read into memory and, based on 2CH - 2FH offset the value of the find the root directory of the starting position in the FAT table found in the root directory of the cluster chain.

(3) According to the cluster chain will be the root directory is read into memory after the file name based on user input to find out.

(4) If found in the root directory of the file, directory entry to determine whether it is to delete the first sign that the value of offset 01H whether E5.

(5) If the file is deleted, there may be a user to restore the files; basis directory entry offset 1AH - 1BH Department directory entry value and the upper and lower value of 14H - 15H at the start to find the file's data clusters.

(6) To find the file data area, based on the directory entry offset 1CH - 1FH Office of the length of the file to read data into memory. Problems in the above-mentioned ideas are:

(1) If you want to restore the file in the root directory under a folder, it will think that it does not exist, that they could only restore the file under the root directory.

(2) Find the file start cluster may need to try many times, the same time, even if the file type is correct, the data also may not be required to restore; another if the file

551

Page 4: [IEEE 2010 International Conference on Machine Vision and Human-machine Interface - Kaifeng, China (2010.04.24-2010.04.25)] 2010 International Conference on Machine Vision and Human-machine

directory entries are also near the deleted files directory entry, find the file starting clusters are especially difficult.

(3) If the file data in discrete clusters in the restoration out of a document would be wrong. Because the system when you delete files in FAT area occupied by clusters is marked as unused, FAT chain is lost.

Tackle the problem (1), can be improved on the procedures are as follows: In the program by adding a loop or recursion, the root directory of the folder and the root directory to take the same steps to find that the root directory of the directory entry if it is the parent directory , then find a subdirectory under it and then find the file name, and similarly, on the subdirectory folders to take the same measures.

For the problem (2) (3), its solution is: re-write a small auxiliary programs, its main function, every time a user completely delete a file, empty the file system FAT chain, before the file corresponds to The FAT table information to save the backup, the backup information including the file name, delete the pre-partition where the document's FAT chain. So that to restore a file, first locate the backup file to restore the availability of information.

V. CONCLUSIONS

In the current information society, data security is increasingly important. Data recovery technology is the fastest growing and most dynamic technology, with a vast market and development prospects in the field of computer security and maintenance.

Data recovery technology based on Windows FAT needs disk data storage and access principles and FAT file system principle, this paper mainly analyzes the FAT32 system, the relevant principles. In the FAT32 file system, through the use of the disk analysis tool to perform manual restore accidentally deleted files, documents recovered fragments to restore them is more troublesome. If the text file, in the use WinHex to open the disk, the display is clear, you can then select content directly through the context connection. The use of high version of WinHex, it can appear in the software have all the files directly to use the directory to the directory window, double-click to open later also transferred to the appropriate location, and simultaneously displays the current file cluster chain to

open a window, easy to use; point directly to the selected start and end of the re-election block replication can be restored. The whole process does not require the transfer of computing and hand position. But the experiment is FAT32 system in order to deepen understanding of the principles under the data recovery, so a complete embodiment of a detailed calculation process, and to lay the groundwork for future in-depth study, while these are also the use of assembly language knowledge necessary to achieve recovery.

REFERENCES

[1] Dai Shi Jian. Data recovery technology, and development of the basic concept of the status quo. http://www.siniot.org.cn/InformationSecurity/Author/daishijian1.html.

[2] Dai Shi Jian, Yong-Hong Chen. Data Recovery Technology [M]. Beijing: Electronic Industry Press, 2003.

[3] Dai Shi Jian, Zhang Jie, Guo Hisatake. Data Recovery Techniques [J]. Information network security technology research, 2006, (1): 47 - 49,2006, (2): 51 – 54.

[4] Tu Yanbin, Dai Shi Jian. Data security and programming techniques [M]. Beijing: Tsinghua University Press, 2005.

[5] Wen Guanbin. Analysis of data recovery techniques [J]. China Science and Technology Information, 2008, (3): 78,80.

[6] CUI Song. Based on WINDOWS FAT Data Recovery System Design and Implementation [J]. Chengdu Institute of Education, 2006,20 (4): 74 – 77.

[7] CUI Song. WINDOWS FAT-based data recovery system design and implementation of [D]. Chengdu: Sichuan University, 2005.

[8] Jiang Yonghui. Manually edit the partition table data recovery method and its application [J]. Hainan Journal of Radio and Television University, 2007, (4): 86 - 87,90

[9] Zhao Xiu-wen. Hard disk data recovery [J]. Computer and Network 2007, (3): 196,198.

[10] Jason Zandri. Understanding File System Options [J]. Http: // www. Server - watch .. com

[11] Thomas Kjoernes. File Allocation Table [J]. Http: // www.severwatch. Com, 2000.

[12] LI Chunwang, Shen Yong. FAT data recovery following the devastation of the table [J]. Technical Education Journal, 2003,4 (1): 26 – 31.

[13] Tang Shuofei. Computer Organization [M]. Beijing: Higher Education Press, 2003.

[14] Yan Yi. Win32 assembly language program design [M]. Beijing: Beijing Machinery Industry Press, 2004

552