FUNCTIONS, CATEGORIES and POLICIES

The POSIX filesystem API is made up of a number of functions. creat, stat, chown, etc. For ease of configuration in mergerfs, most of the core functions are grouped into 3 categories: action, create, and search. These functions and categories can be assigned a policy which dictates which branch is chosen when performing that function.

Some functions, listed in the category N/A below, can not be assigned the normal policies. These functions work with file handles, rather than file paths, which were created by open or create. That said many times the current FUSE kernel driver will not always provide the file handle when a client calls fgetattr, fchown, fchmod, futimens, ftruncate, etc. This means it will call the regular, path based, versions. statfs's behavior can be modified via other options.

When using policies which are based on a branch's available space the base path provided is used. Not the full path to the file in question. Meaning that mounts in the branch won't be considered in the space calculations. The reason is that it doesn't really work for non-path preserving policies and can lead to non-obvious behaviors.

NOTE: While any policy can be assigned to a function or category, some may not be very useful in practice. For instance: rand (random) may be useful for file creation (create) but could lead to very odd behavior if used for chmod if there were more than one copy of the file.

Functions and their Category classifications

Category	FUSE Functions
action	chmod, chown, link, removexattr, rename, rmdir, setxattr, truncate, unlink, utimens
create	create, mkdir, mknod, symlink
search	access, getattr, getxattr, ioctl (directories), listxattr, open, readlink
N/A	fchmod, fchown, futimens, ftruncate, fallocate, fgetattr, fsync, ioctl (files), read, readdir, release, statfs, write, copy_file_range

In cases where something may be searched for (such as a path to clone) getattr will usually be used.

Policies

A policy is the algorithm used to choose a branch or branches for a function to work on or generally how the function behaves.

Any function in the create category will clone the relative path if needed. Some other functions (rename,link,ioctl) have special requirements or behaviors which you can read more about below.

Filtering

Most policies basically search branches and create a list of files / paths for functions to work on. The policy is responsible for filtering and sorting the branches. Filters include minfreespace, whether or not a branch is mounted read-only, and the branch tagging (RO,NC,RW). These filters are applied across most policies.

No search function policies filter.
All action function policies filter out branches which are mounted read-only or tagged as RO (read-only).
All create function policies filter out branches which are mounted read-only, tagged RO (read-only) or NC (no create), or has available space less than minfreespace.

Policies may have their own additional filtering such as those that require existing paths to be present.

If all branches are filtered an error will be returned. Typically EROFS (read-only filesystem) or ENOSPC (no space left on device) depending on the most recent reason for filtering a branch. ENOENT will be returned if no eligible branch is found.

If create, mkdir, mknod, or symlink fail with EROFS or other fundamental errors then mergerfs will mark any branch found to be read-only as such (IE will set the mode RO) and will rerun the policy and try again. This is mostly for ext4 filesystems that can suddenly become read-only when it encounters an error.

Path Preservation

Policies, as described below, are of two basic classifications. path preserving and non-path preserving.

All policies which start with ep (epff, eplfs, eplus, epmfs, eprand) are path preserving. ep stands for existing path.

A path preserving policy will only consider branches where the relative path being accessed already exists.

When using non-path preserving policies paths will be cloned to target branches as necessary.

With the msp or most shared path policies they are defined as path preserving for the purpose of controlling link and rename's behaviors since ignorepponrename is available to disable that behavior.

Policy descriptions

A policy's behavior differs, as mentioned above, based on the function it is used with. Sometimes it really might not make sense to even offer certain policies because they are literally the same as others but it makes things a bit more uniform.

Policy	Description
all	Search: For mkdir, mknod, and symlink it will apply to all branches. create works like ff.
epall (existing path, all)	For mkdir, mknod, and symlink it will apply to all found. create works like epff (but more expensive because it doesn't stop after finding a valid branch).
epff (existing path, first found)	Given the order of the branches, as defined at mount time or configured at runtime, act on the first one found where the relative path exists.
eplfs (existing path, least free space)	Of all the branches on which the relative path exists choose the branch with the least free space.
eplus (existing path, least used space)	Of all the branches on which the relative path exists choose the branch with the least used space.
epmfs (existing path, most free space)	Of all the branches on which the relative path exists choose the branch with the most free space.
eppfrd (existing path, percentage free random distribution)	Like pfrd but limited to existing paths.
eprand (existing path, random)	Calls epall and then randomizes. Returns 1.
ff (first found)	Given the order of the branches, as defined at mount time or configured at runtime, act on the first one found.
lfs (least free space)	Pick the branch with the least available free space.
lus (least used space)	Pick the branch with the least used space.
mfs (most free space)	Pick the branch with the most available free space.
msplfs (most shared path, least free space)	Like eplfs but if it fails to find a branch it will try again with the parent directory. Continues this pattern till finding one.
msplus (most shared path, least used space)	Like eplus but if it fails to find a branch it will try again with the parent directory. Continues this pattern till finding one.
mspmfs (most shared path, most free space)	Like epmfs but if it fails to find a branch it will try again with the parent directory. Continues this pattern till finding one.
msppfrd (most shared path, percentage free random distribution)	Like eppfrd but if it fails to find a branch it will try again with the parent directory. Continues this pattern till finding one.
newest	Pick the file / directory with the largest mtime.
pfrd (percentage free random distribution)	Chooses a branch at random with the likelihood of selection based on a branch's available space relative to the total.
rand (random)	Calls all and then randomizes. Returns 1 branch.

NOTE: If you are using an underlying filesystem that reserves blocks such as ext2, ext3, or ext4 be aware that mergerfs respects the reservation by using f_bavail (number of free blocks for unprivileged users) rather than f_bfree (number of free blocks) in policy calculations. df does NOT use f_bavail, it uses f_bfree, so direct comparisons between df output and mergerfs' policies is not appropriate.

Defaults

Category	Policy
action	epall
create	epmfs
search	ff

func.readdir

examples: func.readdir=seq, func.readdir=cor:4

readdir has policies to control how it manages reading directory content.

Policy	Description
seq	"sequential" : Iterate over branches in the order defined. This is the default and traditional behavior found prior to the readdir policy introduction.
cosr	"concurrent open, sequential read" : Concurrently open branch directories using a thread pool and process them in order of definition. This keeps memory and CPU usage low while also reducing the time spent waiting on branches to respond. Number of threads defaults to the number of logical cores. Can be overwritten via the syntax `func.readdir=cosr:N` where `N` is the number of threads.
cor	"concurrent open and read" : Concurrently open branch directories and immediately start reading their contents using a thread pool. This will result in slightly higher memory and CPU usage but reduced latency. Particularly when using higher latency / slower speed network filesystem branches. Unlike `seq` and `cosr` the order of files could change due the async nature of the thread pool. Number of threads defaults to the number of logical cores. Can be overwritten via the syntax `func.readdir=cor:N` where `N` is the number of threads.

Keep in mind that readdir mostly just provides a list of file names in a directory and possibly some basic metadata about said files. To know details about the files, as one would see from commands like find or ls, it is required to call stat on the file which is controlled by fuse.getattr.

ioctl

When ioctl is used with an open file then it will use the file handle which was created at the original open call. However, when using ioctl with a directory mergerfs will use the open policy to find the directory to act on.

rename and link

NOTE: If you're receiving errors from software when files are moved / renamed / linked then you should consider changing the create policy to one which is not path preserving, enabling ignorepponrename, or contacting the author of the offending software and requesting that EXDEV (cross device / improper link) be properly handled.

rename and link are tricky functions in a union filesystem. rename only works within a single filesystem or device. If a rename can't be done atomically due to the source and destination paths existing on different mount points it will return -1 with errno = EXDEV (cross device / improper link). So if a rename's source and target are on different filesystems within the pool it creates an issue.

Originally mergerfs would return EXDEV whenever a rename was requested which was cross directory in any way. This made the code simple and was technically compliant with POSIX requirements. However, many applications fail to handle EXDEV at all and treat it as a normal error or otherwise handle it poorly. Such apps include: gvfsd-fuse v1.20.3 and prior, Finder / CIFS/SMB client in Apple OSX 10.9+, NZBGet, Samba's recycling bin feature.

As a result a compromise was made in order to get most software to work while still obeying mergerfs' policies. Below is the basic logic.

If using a create policy which tries to preserve directory paths (epff,eplfs,eplus,epmfs)
Using the rename policy get the list of files to rename
For each file attempt rename:
- If failure with ENOENT (no such file or directory) run create policy
- If create policy returns the same branch as currently evaluating then clone the path
- Re-attempt rename
If any of the renames succeed the higher level rename is considered a success
If no renames succeed the first error encountered will be returned
On success:
- Remove the target from all branches with no source file
- Remove the source from all branches which failed to rename
If using a create policy which does not try to preserve directory paths
Using the rename policy get the list of files to rename
Using the getattr policy get the target path
For each file attempt rename:
- If the source branch != target branch:
- Clone target path from target branch to source branch
- Rename
If any of the renames succeed the higher level rename is considered a success
If no renames succeed the first error encountered will be returned
On success:
- Remove the target from all branches with no source file
- Remove the source from all branches which failed to rename

The removals are subject to normal entitlement checks.

The above behavior will help minimize the likelihood of EXDEV being returned but it will still be possible.

link uses the same strategy but without the removals.

statfs / statvfs

statvfs normalizes the source filesystems based on the fragment size and sums the number of adjusted blocks and inodes. This means you will see the combined space of all sources. Total, used, and free. The sources however are dedupped based on the filesystem so multiple sources on the same drive will not result in double counting its space. Other filesystems mounted further down the tree of the branch will not be included when checking the mount's stats.

The options statfs and statfs_ignore can be used to modify statfs behavior.

flush-on-close

https://lkml.kernel.org/linux-fsdevel/20211024132607.1636952-1-amir73il@gmail.com/T/

By default, FUSE would issue a flush before the release of a file descriptor. This was considered a bit aggressive and a feature added to give the FUSE server the ability to choose when that happens.

Options:

always
never
opened-for-write

For now it defaults to "opened-for-write" which is less aggressive than the behavior before this feature was added. It should not be a problem because the flush is really only relevant when a file is written to. Given flush is irrelevant for many filesystems in the future a branch specific flag may be added so only files opened on a specific branch would be flushed on close.