Package org.apache.hadoop.fs
Class HarFileSystem
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.hadoop.fs.FileSystem
org.apache.hadoop.fs.HarFileSystem
- All Implemented Interfaces:
Closeable,AutoCloseable,Configurable,BulkDeleteSource,PathCapabilities,DelegationTokenIssuer
This is an implementation of the Hadoop Archive
Filesystem. This archive Filesystem has index files
of the form _index* and has contents of the form
part-*. The index files store the indexes of the
real files. The index files are of the form _masterindex
and _index. The master index is a level of indirection
in to the index file to make the look ups faster. the index
file is sorted with hash code of the paths that it contains
and the master index contains pointers to the positions in
index for ranges of hashcodes.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.fs.FileSystem
FileSystem.DirectoryEntries, FileSystem.DirListingIterator<T extends FileStatus>, FileSystem.Statistics -
Field Summary
FieldsFields inherited from class org.apache.hadoop.fs.FileSystem
DEFAULT_FS, FS_DEFAULT_NAME_KEY, SHUTDOWN_HOOK_PRIORITY, statistics, TRASH_PREFIX, USER_HOME_PREFIXFields inherited from interface org.apache.hadoop.security.token.DelegationTokenIssuer
TOKEN_LOG -
Constructor Summary
ConstructorsConstructorDescriptionpublic construction of harfilesystemConstructor to create a HarFileSystem with an underlying filesystem. -
Method Summary
Modifier and TypeMethodDescriptionAppend to an existing file (optional operation).append(Path f, int bufferSize, Progressable progress) Append to an existing file (optional operation).appendFile(Path path) Create a Builder to append a file.protected URIcanonicalizeUri(URI uri) Canonicalize the given URI.protected voidCheck that a Path belongs to this FileSystem.voidclose()Close this FileSystem instance.voidcompleteLocalOutput(Path fsOutputFile, Path tmpLocalFile) not implemented.voidcopyFromLocalFile(boolean delSrc, boolean overwrite, Path[] srcs, Path dst) The src files are on the local disk.voidcopyFromLocalFile(boolean delSrc, boolean overwrite, Path src, Path dst) not implemented.voidcopyToLocalFile(boolean delSrc, Path src, Path dst) copies the file in the har filesystem to a local file.create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) Create an FSDataOutputStream at the indicated Path with write-progress reporting.createFile(Path path) Create a new FSDataOutputStreamBuilder for the file with path.createNonRecursive(Path f, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) Opens an FSDataOutputStream at the indicated Path with write-progress reporting.protected PathHandlecreatePathHandle(FileStatus stat, Options.HandleOpt... opts) Hook to implement support forPathHandleoperations.booleanNot implemented.protected URIUsed for delegation token related functionality.Used for delegation token related functionality.getConf()Return the configuration used by this object.longReturn the number of bytes that large input files should be optimally be split into to minimize I/O time.longReturn the number of bytes that large input files should be optimally be split into to minimize I/O time.shortGet the default replication.shortGet the default replication for a path.getFileBlockLocations(FileStatus file, long start, long len) Get block locations from the underlying fs and fix their offsets and lengths.getFileChecksum(Path f, long length) Get the checksum of a file, from the beginning of the file till the specific length.return the filestatus of files in har archive.static intgetHarHash(Path p) the hash of the path p inside the filesystemintreturn the top level archive path.Note: with the new FileContext class, getWorkingDirectory() will be removed.Return the protocol scheme for the FileSystem.Return a set of server default configuration values.Return a set of server default configuration values.Returns a status object describing the use and capacity of the filesystem.getUri()Returns the uri of this filesystem.longgetUsed()Return the total size of all files in the filesystem.longReturn the total size of all files from a specified path.return the top level archive.booleanhasPathCapability(Path path, String capability) Declare that this filesystem connector is always read only.voidinitialize(URI name, Configuration conf) Initialize a Har filesystem per har archive.listStatus(Path f) liststatus returns the children of a directory after looking up the index files.makeQualified(Path path) Qualify a path to one which uses this FileSystem and, if relative, made absolute.booleanmkdirs(Path f, FsPermission permission) not implemented.voidmsync()Synchronize client metadata state.open(PathHandle fd, int bufferSize) Open an FSDataInputStream matching the PathHandle instance.Returns a har input stream which fakes end of file.booleanRenames Path src to Path dst.resolvePath(Path p) Return the fully-qualified path of path, resolving the path through any symlinks or mount point.voidnot implemented.voidsetPermission(Path p, FsPermission permission) Not implemented.booleansetReplication(Path src, short replication) Not implemented.voidSet access time of a file.voidsetWorkingDirectory(Path newDir) Set the current working directory for the given FileSystem.startLocalOutput(Path fsOutputFile, Path tmpLocalFile) not implemented.booleanNot implemented.Methods inherited from class org.apache.hadoop.fs.FileSystem
access, append, append, append, areSymlinksEnabled, cancelDeleteOnExit, clearStatistics, closeAll, closeAllForUGI, concat, copyFromLocalFile, copyFromLocalFile, copyToLocalFile, copyToLocalFile, create, create, create, create, create, create, create, create, create, create, create, create, createBulkDelete, createDataInputStreamBuilder, createDataInputStreamBuilder, createDataOutputStreamBuilder, createMultipartUploader, createNewFile, createNonRecursive, createNonRecursive, createSnapshot, createSnapshot, createSymlink, delete, deleteOnExit, deleteSnapshot, enableSymlinks, exists, fixRelativePart, get, get, get, getAclStatus, getAdditionalTokenIssuers, getAllStatistics, getAllStoragePolicies, getBlockSize, getCanonicalServiceName, getContentSummary, getDefaultPort, getDefaultUri, getDelegationToken, getEnclosingRoot, getFileBlockLocations, getFileChecksum, getFileLinkStatus, getFileSystemClass, getFSofPath, getGlobalStorageStatistics, getLength, getLinkTarget, getLocal, getName, getNamed, getPathHandle, getQuotaUsage, getReplication, getStatistics, getStatistics, getStatus, getStoragePolicy, getStorageStatistics, getTrashRoot, getTrashRoots, getXAttr, getXAttrs, getXAttrs, globStatus, globStatus, isDirectory, isFile, listCorruptFileBlocks, listFiles, listLocatedStatus, listLocatedStatus, listStatus, listStatus, listStatus, listStatusBatch, listStatusIterator, listXAttrs, mkdirs, mkdirs, modifyAclEntries, moveFromLocalFile, moveFromLocalFile, moveToLocalFile, newInstance, newInstance, newInstance, newInstanceLocal, open, open, openFile, openFile, openFileWithOptions, openFileWithOptions, primitiveCreate, primitiveMkdir, primitiveMkdir, printStatistics, processDeleteOnExit, removeAcl, removeAclEntries, removeDefaultAcl, removeXAttr, rename, renameSnapshot, resolveLink, satisfyStoragePolicy, setAcl, setDefaultUri, setDefaultUri, setQuota, setQuotaByStorageType, setStoragePolicy, setVerifyChecksum, setWriteChecksum, setXAttr, setXAttr, supportsSymlinks, unsetStoragePolicyMethods inherited from class org.apache.hadoop.conf.Configured
setConfMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.hadoop.security.token.DelegationTokenIssuer
addDelegationTokens
-
Field Details
-
METADATA_CACHE_ENTRIES_KEY
- See Also:
-
METADATA_CACHE_ENTRIES_DEFAULT
public static final int METADATA_CACHE_ENTRIES_DEFAULT- See Also:
-
VERSION
public static final int VERSION- See Also:
-
-
Constructor Details
-
HarFileSystem
public HarFileSystem()public construction of harfilesystem -
HarFileSystem
Constructor to create a HarFileSystem with an underlying filesystem.- Parameters:
fs- underlying file system
-
-
Method Details
-
getScheme
Return the protocol scheme for the FileSystem.- Overrides:
getSchemein classFileSystem- Returns:
har
-
initialize
Initialize a Har filesystem per har archive. The archive home directory is the top level directory in the filesystem that contains the HAR archive. Be careful with this method, you do not want to go on creating new Filesystem instances per call to path.getFileSystem(). the uri of Har is har://underlyingfsscheme-host:port/archivepath. or har:///archivepath. This assumes the underlying filesystem to be used in case not specified.- Overrides:
initializein classFileSystem- Parameters:
name- a URI whose authority section names the host, port, etc. for this FileSystemconf- the configuration- Throws:
IOException- on any failure to initialize this instance.
-
getConf
Description copied from interface:ConfigurableReturn the configuration used by this object.- Specified by:
getConfin interfaceConfigurable- Overrides:
getConfin classConfigured- Returns:
- Configuration
-
getHarVersion
- Throws:
IOException
-
getWorkingDirectory
return the top level archive.- Specified by:
getWorkingDirectoryin classFileSystem- Returns:
- the directory pathname
-
getInitialWorkingDirectory
Description copied from class:FileSystemNote: with the new FileContext class, getWorkingDirectory() will be removed. The working directory is implemented in FileContext. Some FileSystems like LocalFileSystem have an initial workingDir that we use as the starting workingDir. For other file systems like HDFS there is no built in notion of an initial workingDir.- Overrides:
getInitialWorkingDirectoryin classFileSystem- Returns:
- if there is built in notion of workingDir then it is returned; else a null is returned.
-
getStatus
Description copied from class:FileSystemReturns a status object describing the use and capacity of the filesystem. If the filesystem has multiple partitions, the use and capacity of the partition pointed to by the specified path is reflected.- Overrides:
getStatusin classFileSystem- Parameters:
p- Path for which status should be obtained. null means the default partition.- Returns:
- a FsStatus object
- Throws:
IOException- see specific implementation
-
getCanonicalUri
Used for delegation token related functionality. Must delegate to underlying file system.- Overrides:
getCanonicalUriin classFileSystem- Returns:
- the URI of this filesystem.
- See Also:
-
canonicalizeUri
Description copied from class:FileSystemCanonicalize the given URI. This is implementation-dependent, and may for example consist of canonicalizing the hostname using DNS and adding the default port if not specified. The default implementation simply fills in the default port if not specified and ifFileSystem.getDefaultPort()returns a default port.- Overrides:
canonicalizeUriin classFileSystem- Parameters:
uri- url.- Returns:
- URI
- See Also:
-
getUri
Returns the uri of this filesystem. The uri is of the form har://underlyingfsschema-host:port/pathintheunderlyingfs- Specified by:
getUriin classFileSystem- Returns:
- the URI of this filesystem.
-
checkPath
Description copied from class:FileSystemCheck that a Path belongs to this FileSystem. The base implementation performs case insensitive equality checks of the URIs' schemes and authorities. Subclasses may implement slightly different checks.- Overrides:
checkPathin classFileSystem- Parameters:
path- to check
-
resolvePath
Description copied from class:FileSystemReturn the fully-qualified path of path, resolving the path through any symlinks or mount point.- Overrides:
resolvePathin classFileSystem- Parameters:
p- path to be resolved- Returns:
- fully qualified path
- Throws:
FileNotFoundException- if the path is not presentIOException- for any other error
-
makeQualified
Description copied from class:FileSystemQualify a path to one which uses this FileSystem and, if relative, made absolute.- Overrides:
makeQualifiedin classFileSystem- Parameters:
path- to qualify.- Returns:
- this path if it contains a scheme and authority and is absolute, or a new path that includes a path and authority and is fully qualified
- See Also:
-
getFileBlockLocations
public BlockLocation[] getFileBlockLocations(FileStatus file, long start, long len) throws IOException Get block locations from the underlying fs and fix their offsets and lengths.- Overrides:
getFileBlockLocationsin classFileSystem- Parameters:
file- the input file status to get block locationsstart- the start of the desired range in the contained filelen- the length of the desired range- Returns:
- block locations for this segment of file
- Throws:
IOException- raised on errors performing I/O.
-
getHarHash
the hash of the path p inside the filesystem- Parameters:
p- the path in the harfilesystem- Returns:
- the hash code of the path.
-
getFileStatus
return the filestatus of files in har archive. The permission returned are that of the archive index files. The permissions are not persisted while creating a hadoop archive.- Specified by:
getFileStatusin classFileSystem- Parameters:
f- the path in har filesystem- Returns:
- filestatus.
- Throws:
IOException- raised on errors performing I/O.
-
msync
Description copied from class:FileSystemSynchronize client metadata state.In some FileSystem implementations such as HDFS metadata synchronization is essential to guarantee consistency of read requests particularly in HA setting.
- Overrides:
msyncin classFileSystem- Throws:
IOException- If an I/O error occurred.UnsupportedOperationException- if the operation is unsupported.
-
getFileChecksum
Description copied from class:FileSystemGet the checksum of a file, from the beginning of the file till the specific length.- Overrides:
getFileChecksumin classFileSystem- Parameters:
f- The file pathlength- The length of the file range for checksum calculation- Returns:
- null since no checksum algorithm is implemented.
-
open
Returns a har input stream which fakes end of file. It reads the index files to get the part file name and the size and start of the file.- Specified by:
openin classFileSystem- Parameters:
f- the file name to openbufferSize- the size of the buffer to be used.- Returns:
- input stream.
- Throws:
IOException- IO failure
-
createPathHandle
Description copied from class:FileSystemHook to implement support forPathHandleoperations.- Overrides:
createPathHandlein classFileSystem- Parameters:
stat- Referent in the target FileSystemopts- Constraints that determine the validity of thePathHandlereference.- Returns:
- path handle.
-
open
Description copied from class:FileSystemOpen an FSDataInputStream matching the PathHandle instance. The implementation may encode metadata in PathHandle to address the resource directly and verify that the resource referenced satisfies constraints specified at its construciton.- Overrides:
openin classFileSystem- Parameters:
fd- PathHandle object returned by the FS authority.bufferSize- the size of the buffer to use- Returns:
- input stream.
- Throws:
InvalidPathHandleException- IfPathHandleconstraints are not satisfiedIOException- IO failure
-
getChildFileSystems
Used for delegation token related functionality. Must delegate to underlying file system.- Overrides:
getChildFileSystemsin classFileSystem- Returns:
- FileSystems that are direct children of this FileSystem, or null for "no children"
-
create
public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException Description copied from class:FileSystemCreate an FSDataOutputStream at the indicated Path with write-progress reporting.- Specified by:
createin classFileSystem- Parameters:
f- the file name to openpermission- file permissionoverwrite- if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.bufferSize- the size of the buffer to be used.replication- required block replication for the file.blockSize- block sizeprogress- the progress reporter- Returns:
- output stream.
- Throws:
IOException- IO failure- See Also:
-
createNonRecursive
public FSDataOutputStream createNonRecursive(Path f, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException Description copied from class:FileSystemOpens an FSDataOutputStream at the indicated Path with write-progress reporting. Same as create(), except fails if parent directory doesn't already exist.- Overrides:
createNonRecursivein classFileSystem- Parameters:
f- the file name to openoverwrite- if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.bufferSize- the size of the buffer to be used.replication- required block replication for the file.blockSize- block sizeprogress- the progress reporter- Returns:
- output stream.
- Throws:
IOException- IO failure- See Also:
-
append
Description copied from class:FileSystemAppend to an existing file (optional operation).- Specified by:
appendin classFileSystem- Parameters:
f- the existing file to be appended.bufferSize- the size of the buffer to be used.progress- for reporting progress if it is not null.- Returns:
- output stream.
- Throws:
IOException- IO failure
-
close
Description copied from class:FileSystemClose this FileSystem instance. Will release any held locks, delete all files queued for deletion through calls toFileSystem.deleteOnExit(Path), and remove this FS instance from the cache, if cached. After this operation, the outcome of any method call on this FileSystem instance, or any input/output stream created by it is undefined.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classFileSystem- Throws:
IOException- IO failure
-
setReplication
Not implemented.- Overrides:
setReplicationin classFileSystem- Parameters:
src- file namereplication- new replication- Returns:
- true if successful, or the feature in unsupported; false if replication is supported but the file does not exist, or is a directory
- Throws:
IOException- an IO failure.
-
rename
Description copied from class:FileSystemRenames Path src to Path dst.- Specified by:
renamein classFileSystem- Parameters:
src- path to be renameddst- new path after rename- Returns:
- true if rename is successful
- Throws:
IOException- on failure
-
append
Description copied from class:FileSystemAppend to an existing file (optional operation). Same asappend(f, getConf().getInt(IO_FILE_BUFFER_SIZE_KEY, IO_FILE_BUFFER_SIZE_DEFAULT), null)- Overrides:
appendin classFileSystem- Parameters:
f- the existing file to be appended.- Returns:
- output stream.
- Throws:
IOException- IO failure
-
truncate
Not implemented.- Overrides:
truncatein classFileSystem- Parameters:
f- The path to the file to be truncatednewLength- The size the file is to be truncated to- Returns:
trueif the file has been truncated to the desirednewLengthand is immediately available to be reused for write operations such asappend, orfalseif a background process of adjusting the length of the last block has been started, and clients should wait for it to complete before proceeding with further file updates.- Throws:
IOException- IO failure
-
delete
Not implemented.- Specified by:
deletein classFileSystem- Parameters:
f- the path to delete.recursive- if path is a directory and set to true, the directory is deleted else throws an exception. In case of a file the recursive can be set to either true or false.- Returns:
- true if delete is successful else false.
- Throws:
IOException- IO failure
-
listStatus
liststatus returns the children of a directory after looking up the index files.- Specified by:
listStatusin classFileSystem- Parameters:
f- given path- Returns:
- the statuses of the files/directories in the given patch
- Throws:
FileNotFoundException- when the path does not existIOException- see specific implementation
-
getHomeDirectory
return the top level archive path.- Overrides:
getHomeDirectoryin classFileSystem- Returns:
- the path.
-
setWorkingDirectory
Description copied from class:FileSystemSet the current working directory for the given FileSystem. All relative paths will be resolved relative to it.- Specified by:
setWorkingDirectoryin classFileSystem- Parameters:
newDir- Path of new working directory
-
mkdirs
not implemented.- Specified by:
mkdirsin classFileSystem- Parameters:
f- path to createpermission- to apply to f- Returns:
- if mkdir success true, not false.
- Throws:
IOException- IO failure
-
copyFromLocalFile
public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path src, Path dst) throws IOException not implemented.- Overrides:
copyFromLocalFilein classFileSystem- Parameters:
delSrc- whether to delete the srcoverwrite- whether to overwrite an existing filesrc- pathdst- path- Throws:
IOException- IO failure
-
copyFromLocalFile
public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path[] srcs, Path dst) throws IOException Description copied from class:FileSystemThe src files are on the local disk. Add it to the filesystem at the given dst name. delSrc indicates if the source should be removed- Overrides:
copyFromLocalFilein classFileSystem- Parameters:
delSrc- whether to delete the srcoverwrite- whether to overwrite an existing filesrcs- array of paths which are sourcedst- path- Throws:
IOException- IO failure
-
copyToLocalFile
copies the file in the har filesystem to a local file.- Overrides:
copyToLocalFilein classFileSystem- Parameters:
delSrc- whether to delete the srcsrc- path src file in the remote filesystemdst- path local destination- Throws:
IOException- IO failure
-
startLocalOutput
not implemented.- Overrides:
startLocalOutputin classFileSystem- Parameters:
fsOutputFile- path of output filetmpLocalFile- path of local tmp file- Returns:
- the path.
- Throws:
IOException- IO failure
-
completeLocalOutput
not implemented.- Overrides:
completeLocalOutputin classFileSystem- Parameters:
fsOutputFile- path of output filetmpLocalFile- path to local tmp file- Throws:
IOException- IO failure
-
setOwner
not implemented.- Overrides:
setOwnerin classFileSystem- Parameters:
p- The pathusername- If it is null, the original username remains unchanged.groupname- If it is null, the original groupname remains unchanged.- Throws:
IOException- IO failure
-
setTimes
Description copied from class:FileSystemSet access time of a file.- Overrides:
setTimesin classFileSystem- Parameters:
p- The pathmtime- Set the modification time of this file. The number of milliseconds since Jan 1, 1970. A value of -1 means that this call should not set modification time.atime- Set the access time of this file. The number of milliseconds since Jan 1, 1970. A value of -1 means that this call should not set access time.- Throws:
IOException- IO failure
-
setPermission
Not implemented.- Overrides:
setPermissionin classFileSystem- Parameters:
p- The pathpermission- permission- Throws:
IOException- IO failure
-
hasPathCapability
Declare that this filesystem connector is always read only. The base FileSystem implementation generally has no knowledge of the capabilities of actual implementations. Unless it has a way to explicitly determine the capabilities, this method returns false. Probe for a specific capability under the given path. If the function returnstrue, this instance is explicitly declaring that the capability is available. If the function returnsfalse, it can mean one of:- The capability is not known.
- The capability is known but it is not supported.
- The capability is known but the filesystem does not know if it is supported under the supplied path.
Implementors:
PathCapabilitiesSupportcan be used to help implement this method.- Specified by:
hasPathCapabilityin interfacePathCapabilities- Overrides:
hasPathCapabilityin classFileSystem- Parameters:
path- path to query the capability of.capability- non-null, non-empty string to query the path for support.- Returns:
- true if the capability is supported under that part of the FS.
- Throws:
IOException- this should not be raised, except on problems resolving paths or relaying the call.
-
getServerDefaults
Description copied from class:FileSystemReturn a set of server default configuration values.- Overrides:
getServerDefaultsin classFileSystem- Returns:
- server default configuration values
- Throws:
IOException- IO failure
-
getServerDefaults
Description copied from class:FileSystemReturn a set of server default configuration values.- Overrides:
getServerDefaultsin classFileSystem- Parameters:
f- path is used to identify an FS since an FS could have another FS that it could be delegating the call to- Returns:
- server default configuration values
- Throws:
IOException- IO failure
-
getUsed
Description copied from class:FileSystemReturn the total size of all files in the filesystem.- Overrides:
getUsedin classFileSystem- Returns:
- the number of path used.
- Throws:
IOException- IO failure
-
getUsed
Return the total size of all files from a specified path.- Overrides:
getUsedin classFileSystem- Parameters:
path- the path.- Returns:
- the number of path content summary.
- Throws:
IOException- IO failure
-
getDefaultBlockSize
public long getDefaultBlockSize()Description copied from class:FileSystemReturn the number of bytes that large input files should be optimally be split into to minimize I/O time.- Overrides:
getDefaultBlockSizein classFileSystem- Returns:
- default block size.
-
getDefaultBlockSize
Description copied from class:FileSystemReturn the number of bytes that large input files should be optimally be split into to minimize I/O time. The given path will be used to locate the actual filesystem. The full path does not have to exist.- Overrides:
getDefaultBlockSizein classFileSystem- Parameters:
f- path of file- Returns:
- the default block size for the path's filesystem
-
getDefaultReplication
public short getDefaultReplication()Description copied from class:FileSystemGet the default replication.- Overrides:
getDefaultReplicationin classFileSystem- Returns:
- the replication; the default value is "1".
-
getDefaultReplication
Description copied from class:FileSystemGet the default replication for a path. The given path will be used to locate the actual FileSystem to query. The full path does not have to exist.- Overrides:
getDefaultReplicationin classFileSystem- Parameters:
f- of the file- Returns:
- default replication for the path's filesystem
-
createFile
Description copied from class:FileSystemCreate a new FSDataOutputStreamBuilder for the file with path. Files are overwritten by default.- Overrides:
createFilein classFileSystem- Parameters:
path- file path- Returns:
- a FSDataOutputStreamBuilder object to build the file HADOOP-14384. Temporarily reduce the visibility of method before the builder interface becomes stable.
-
appendFile
Description copied from class:FileSystemCreate a Builder to append a file.- Overrides:
appendFilein classFileSystem- Parameters:
path- file path.- Returns:
- a
FSDataOutputStreamBuilderto build file append request.
-