searchusermenu
  • 发布文章
  • 消息中心
点赞
收藏
评论
分享

file holes

2023-10-30 01:55:28
4
0

如果从远端服务器下载日志,并对日志文件进行处理,比如写到消息队列,已下载的文件存在本地占用磁盘空间,在处理完成一个文件后会对文件进行删除,以减少磁盘占用。假如是多机任务模式执行,一个日志任务可能会中途挂掉并切调度到其他机器,这时新的机器要从上次的断点继续处理,这时就需要从远端同步offset信息,并且根据offset去下载远端日志文件,而本地文件就需要使用虫洞的技术进行填充,以让日志处理进程,可以继续从文件的offset继续处理。本文转载一遍虫洞的文章介绍相关技术。

A file hole is a portion of a regular file that contains null characters and is not stored in any data block on disk. Holes are a long-standing feature of Unix files. For instance, the following Unix command creates a file in which the first bytes are a hole:

$ echo -n "X" | dd of=/tmp/hole bs=1024 seek=6

Now /tmp/hole has 6,145 characters (6,144 null characters plus an X character), yet the file occupies just one data block on disk.

File holes were introduced to avoid wasting disk space. They are used extensively by database applications and, more generally, by all applications that perform hashing on files.

The Ext2 implementation of file holes is based on dynamic data block allocation: a block is actually assigned to a file only when the process needs to write data into it. The i_size field of each inode defines the size of the file as seen by the program, including the hole, while the i_blocks field stores the number of data blocks effectively assigned to the file (in units of 512 bytes).

In our earlier example of the dd command, suppose the /tmp/hole file was created on an Ext2 partition that has blocks of size 4,096. The i_size field of the corresponding disk inode stores the number 6,145, while the i_blocks field stores the number 8 (because each 4,096-byte block includes eight 512-byte blocks). The second element of the i_block array (corresponding to the block having file block number 1) stores the logical block number of the allocated block, while all other elements in the array are null (see Figure 176).

Figure 17-6. A file with an initial hole

Figure 17-6. A file with an initial hole

File Holes

17.6.5 Allocating a Data Block

When the kernel has to locate a block holding data for an Ext2 regular file, it invokes the ext2_get_block( ) function. If the block does not exist, the function automatically allocates the block to the file. Remember that this function is invoked every time the kernel issues a read or write operation on a Ext2 regular file (see Section 15.1.1 and Section 15.1.3).

The ext2_get_block( ) function handles the data structures already described in Section 17.6.3, and when necessary, invokes the ext2_alloc_block( ) function to actually search for a free block in the Ext2 partition.

To reduce file fragmentation, the Ext2 filesystem tries to get a new block for a file near the last block already allocated for the file. Failing that, the filesystem searches for a new block in the block group that includes the file's inode. As a last resort, the free block is taken from one of the other block groups.

The Ext2 filesystem uses preallocation of data blocks. The file does not get just the requested block, but rather a group of up to eight adjacent blocks. The i_prealloc_count field in the ext2_inode_info structure stores the number of data blocks preallocated to a file that are still unused, and the i_prealloc_block field stores the logical block number of the next preallocated block to be used. Any preallocated blocks that remain unused are freed when the file is closed, when it is truncated, or when a write operation is not sequential with respect to the write operation that triggered the block preallocation.

The ext2_alloc_block( ) function receives as parameters a pointer to an inode object and a goal. The goal is a logical block number that represents the preferred position of the new block. The ext2_getblk( ) function sets the goal parameter according to the following heuristic:

1. If the block that is being allocated and the previously allocated block have consecutive file block numbers, the goal is the logical block number of the previous block plus 1; it makes sense that consecutive blocks as seen by a program should be adjacent on disk.

2. If the first rule does not apply and at least one block has been previously allocated to the file, the goal is one of these blocks' logical block numbers. More precisely, it is the logical block number of the already allocated block that precedes the block to be allocated in the file.

3. If the preceding rules do not apply, the goal is the logical block number of the first block (not necessarily free) in the block group that contains the file's inode.

The ext2_alloc_block( ) function checks whether the goal refers to one of the preallocated blocks of the file. If so, it allocates the corresponding block and returns its logical block number; otherwise, the function discards all remaining preallocated blocks and invokes ext2_new_block( ) .

This latter function searches for a free block inside the Ext2 partition with the following strategy:

1. If the preferred block passed to ext2_alloc_block( ), the goal, is free, and the

2. If the goal is busy, the function checks whether one of the next 64 blocks after the preferred block is free.

3. If no free block is found in the near vicinity of the preferred block, the function considers all block groups, starting from the one including the goal. For each block group, the function does the following:

a. Looks for a group of at least eight adjacent free blocks.

b. If no such group is found, looks for a single free block.

The search ends as soon as a free block is found. Before terminating, the ext2_new_block( ) function also tries to preallocate up to eight free blocks adjacent to the free block found and sets the i_prealloc_block and i_prealloc_count fields of the disk inode to the proper block location and number of blocks.

0条评论
0 / 1000
李****强
14文章数
0粉丝数
李****强
14 文章 | 0 粉丝

file holes

2023-10-30 01:55:28
4
0

如果从远端服务器下载日志,并对日志文件进行处理,比如写到消息队列,已下载的文件存在本地占用磁盘空间,在处理完成一个文件后会对文件进行删除,以减少磁盘占用。假如是多机任务模式执行,一个日志任务可能会中途挂掉并切调度到其他机器,这时新的机器要从上次的断点继续处理,这时就需要从远端同步offset信息,并且根据offset去下载远端日志文件,而本地文件就需要使用虫洞的技术进行填充,以让日志处理进程,可以继续从文件的offset继续处理。本文转载一遍虫洞的文章介绍相关技术。

A file hole is a portion of a regular file that contains null characters and is not stored in any data block on disk. Holes are a long-standing feature of Unix files. For instance, the following Unix command creates a file in which the first bytes are a hole:

$ echo -n "X" | dd of=/tmp/hole bs=1024 seek=6

Now /tmp/hole has 6,145 characters (6,144 null characters plus an X character), yet the file occupies just one data block on disk.

File holes were introduced to avoid wasting disk space. They are used extensively by database applications and, more generally, by all applications that perform hashing on files.

The Ext2 implementation of file holes is based on dynamic data block allocation: a block is actually assigned to a file only when the process needs to write data into it. The i_size field of each inode defines the size of the file as seen by the program, including the hole, while the i_blocks field stores the number of data blocks effectively assigned to the file (in units of 512 bytes).

In our earlier example of the dd command, suppose the /tmp/hole file was created on an Ext2 partition that has blocks of size 4,096. The i_size field of the corresponding disk inode stores the number 6,145, while the i_blocks field stores the number 8 (because each 4,096-byte block includes eight 512-byte blocks). The second element of the i_block array (corresponding to the block having file block number 1) stores the logical block number of the allocated block, while all other elements in the array are null (see Figure 176).

Figure 17-6. A file with an initial hole

Figure 17-6. A file with an initial hole

File Holes

17.6.5 Allocating a Data Block

When the kernel has to locate a block holding data for an Ext2 regular file, it invokes the ext2_get_block( ) function. If the block does not exist, the function automatically allocates the block to the file. Remember that this function is invoked every time the kernel issues a read or write operation on a Ext2 regular file (see Section 15.1.1 and Section 15.1.3).

The ext2_get_block( ) function handles the data structures already described in Section 17.6.3, and when necessary, invokes the ext2_alloc_block( ) function to actually search for a free block in the Ext2 partition.

To reduce file fragmentation, the Ext2 filesystem tries to get a new block for a file near the last block already allocated for the file. Failing that, the filesystem searches for a new block in the block group that includes the file's inode. As a last resort, the free block is taken from one of the other block groups.

The Ext2 filesystem uses preallocation of data blocks. The file does not get just the requested block, but rather a group of up to eight adjacent blocks. The i_prealloc_count field in the ext2_inode_info structure stores the number of data blocks preallocated to a file that are still unused, and the i_prealloc_block field stores the logical block number of the next preallocated block to be used. Any preallocated blocks that remain unused are freed when the file is closed, when it is truncated, or when a write operation is not sequential with respect to the write operation that triggered the block preallocation.

The ext2_alloc_block( ) function receives as parameters a pointer to an inode object and a goal. The goal is a logical block number that represents the preferred position of the new block. The ext2_getblk( ) function sets the goal parameter according to the following heuristic:

1. If the block that is being allocated and the previously allocated block have consecutive file block numbers, the goal is the logical block number of the previous block plus 1; it makes sense that consecutive blocks as seen by a program should be adjacent on disk.

2. If the first rule does not apply and at least one block has been previously allocated to the file, the goal is one of these blocks' logical block numbers. More precisely, it is the logical block number of the already allocated block that precedes the block to be allocated in the file.

3. If the preceding rules do not apply, the goal is the logical block number of the first block (not necessarily free) in the block group that contains the file's inode.

The ext2_alloc_block( ) function checks whether the goal refers to one of the preallocated blocks of the file. If so, it allocates the corresponding block and returns its logical block number; otherwise, the function discards all remaining preallocated blocks and invokes ext2_new_block( ) .

This latter function searches for a free block inside the Ext2 partition with the following strategy:

1. If the preferred block passed to ext2_alloc_block( ), the goal, is free, and the

2. If the goal is busy, the function checks whether one of the next 64 blocks after the preferred block is free.

3. If no free block is found in the near vicinity of the preferred block, the function considers all block groups, starting from the one including the goal. For each block group, the function does the following:

a. Looks for a group of at least eight adjacent free blocks.

b. If no such group is found, looks for a single free block.

The search ends as soon as a free block is found. Before terminating, the ext2_new_block( ) function also tries to preallocate up to eight free blocks adjacent to the free block found and sets the i_prealloc_block and i_prealloc_count fields of the disk inode to the proper block location and number of blocks.

文章来自个人专栏
可观测
14 文章 | 1 订阅
0条评论
0 / 1000
请输入你的评论
0
0