Class HarFileSystem

All Implemented Interfaces:
Closeable, AutoCloseable, Configurable, BulkDeleteSource, PathCapabilities, DelegationTokenIssuer

public class HarFileSystem extends FileSystem
This is an implementation of the Hadoop Archive Filesystem. This archive Filesystem has index files of the form _index* and has contents of the form part-*. The index files store the indexes of the real files. The index files are of the form _masterindex and _index. The master index is a level of indirection in to the index file to make the look ups faster. the index file is sorted with hash code of the paths that it contains and the master index contains pointers to the positions in index for ranges of hashcodes.
  • Field Details

  • Constructor Details

    • HarFileSystem

      public HarFileSystem()
      public construction of harfilesystem
    • HarFileSystem

      public HarFileSystem(FileSystem fs)
      Constructor to create a HarFileSystem with an underlying filesystem.
      Parameters:
      fs - underlying file system
  • Method Details

    • getScheme

      public String getScheme()
      Return the protocol scheme for the FileSystem.

      Overrides:
      getScheme in class FileSystem
      Returns:
      har
    • initialize

      public void initialize(URI name, Configuration conf) throws IOException
      Initialize a Har filesystem per har archive. The archive home directory is the top level directory in the filesystem that contains the HAR archive. Be careful with this method, you do not want to go on creating new Filesystem instances per call to path.getFileSystem(). the uri of Har is har://underlyingfsscheme-host:port/archivepath. or har:///archivepath. This assumes the underlying filesystem to be used in case not specified.
      Overrides:
      initialize in class FileSystem
      Parameters:
      name - a URI whose authority section names the host, port, etc. for this FileSystem
      conf - the configuration
      Throws:
      IOException - on any failure to initialize this instance.
    • getConf

      public Configuration getConf()
      Description copied from interface: Configurable
      Return the configuration used by this object.
      Specified by:
      getConf in interface Configurable
      Overrides:
      getConf in class Configured
      Returns:
      Configuration
    • getHarVersion

      public int getHarVersion() throws IOException
      Throws:
      IOException
    • getWorkingDirectory

      public Path getWorkingDirectory()
      return the top level archive.
      Specified by:
      getWorkingDirectory in class FileSystem
      Returns:
      the directory pathname
    • getInitialWorkingDirectory

      public Path getInitialWorkingDirectory()
      Description copied from class: FileSystem
      Note: with the new FileContext class, getWorkingDirectory() will be removed. The working directory is implemented in FileContext. Some FileSystems like LocalFileSystem have an initial workingDir that we use as the starting workingDir. For other file systems like HDFS there is no built in notion of an initial workingDir.
      Overrides:
      getInitialWorkingDirectory in class FileSystem
      Returns:
      if there is built in notion of workingDir then it is returned; else a null is returned.
    • getStatus

      public FsStatus getStatus(Path p) throws IOException
      Description copied from class: FileSystem
      Returns a status object describing the use and capacity of the filesystem. If the filesystem has multiple partitions, the use and capacity of the partition pointed to by the specified path is reflected.
      Overrides:
      getStatus in class FileSystem
      Parameters:
      p - Path for which status should be obtained. null means the default partition.
      Returns:
      a FsStatus object
      Throws:
      IOException - see specific implementation
    • getCanonicalUri

      protected URI getCanonicalUri()
      Used for delegation token related functionality. Must delegate to underlying file system.
      Overrides:
      getCanonicalUri in class FileSystem
      Returns:
      the URI of this filesystem.
      See Also:
    • canonicalizeUri

      protected URI canonicalizeUri(URI uri)
      Description copied from class: FileSystem
      Canonicalize the given URI. This is implementation-dependent, and may for example consist of canonicalizing the hostname using DNS and adding the default port if not specified. The default implementation simply fills in the default port if not specified and if FileSystem.getDefaultPort() returns a default port.
      Overrides:
      canonicalizeUri in class FileSystem
      Parameters:
      uri - url.
      Returns:
      URI
      See Also:
    • getUri

      public URI getUri()
      Returns the uri of this filesystem. The uri is of the form har://underlyingfsschema-host:port/pathintheunderlyingfs
      Specified by:
      getUri in class FileSystem
      Returns:
      the URI of this filesystem.
    • checkPath

      protected void checkPath(Path path)
      Description copied from class: FileSystem
      Check that a Path belongs to this FileSystem. The base implementation performs case insensitive equality checks of the URIs' schemes and authorities. Subclasses may implement slightly different checks.
      Overrides:
      checkPath in class FileSystem
      Parameters:
      path - to check
    • resolvePath

      public Path resolvePath(Path p) throws IOException
      Description copied from class: FileSystem
      Return the fully-qualified path of path, resolving the path through any symlinks or mount point.
      Overrides:
      resolvePath in class FileSystem
      Parameters:
      p - path to be resolved
      Returns:
      fully qualified path
      Throws:
      FileNotFoundException - if the path is not present
      IOException - for any other error
    • makeQualified

      public Path makeQualified(Path path)
      Description copied from class: FileSystem
      Qualify a path to one which uses this FileSystem and, if relative, made absolute.
      Overrides:
      makeQualified in class FileSystem
      Parameters:
      path - to qualify.
      Returns:
      this path if it contains a scheme and authority and is absolute, or a new path that includes a path and authority and is fully qualified
      See Also:
    • getFileBlockLocations

      public BlockLocation[] getFileBlockLocations(FileStatus file, long start, long len) throws IOException
      Get block locations from the underlying fs and fix their offsets and lengths.
      Overrides:
      getFileBlockLocations in class FileSystem
      Parameters:
      file - the input file status to get block locations
      start - the start of the desired range in the contained file
      len - the length of the desired range
      Returns:
      block locations for this segment of file
      Throws:
      IOException - raised on errors performing I/O.
    • getHarHash

      public static int getHarHash(Path p)
      the hash of the path p inside the filesystem
      Parameters:
      p - the path in the harfilesystem
      Returns:
      the hash code of the path.
    • getFileStatus

      public FileStatus getFileStatus(Path f) throws IOException
      return the filestatus of files in har archive. The permission returned are that of the archive index files. The permissions are not persisted while creating a hadoop archive.
      Specified by:
      getFileStatus in class FileSystem
      Parameters:
      f - the path in har filesystem
      Returns:
      filestatus.
      Throws:
      IOException - raised on errors performing I/O.
    • msync

      public void msync() throws IOException, UnsupportedOperationException
      Description copied from class: FileSystem
      Synchronize client metadata state.

      In some FileSystem implementations such as HDFS metadata synchronization is essential to guarantee consistency of read requests particularly in HA setting.

      Overrides:
      msync in class FileSystem
      Throws:
      IOException - If an I/O error occurred.
      UnsupportedOperationException - if the operation is unsupported.
    • getFileChecksum

      public FileChecksum getFileChecksum(Path f, long length)
      Description copied from class: FileSystem
      Get the checksum of a file, from the beginning of the file till the specific length.
      Overrides:
      getFileChecksum in class FileSystem
      Parameters:
      f - The file path
      length - The length of the file range for checksum calculation
      Returns:
      null since no checksum algorithm is implemented.
    • open

      public FSDataInputStream open(Path f, int bufferSize) throws IOException
      Returns a har input stream which fakes end of file. It reads the index files to get the part file name and the size and start of the file.
      Specified by:
      open in class FileSystem
      Parameters:
      f - the file name to open
      bufferSize - the size of the buffer to be used.
      Returns:
      input stream.
      Throws:
      IOException - IO failure
    • createPathHandle

      protected PathHandle createPathHandle(FileStatus stat, Options.HandleOpt... opts)
      Description copied from class: FileSystem
      Hook to implement support for PathHandle operations.
      Overrides:
      createPathHandle in class FileSystem
      Parameters:
      stat - Referent in the target FileSystem
      opts - Constraints that determine the validity of the PathHandle reference.
      Returns:
      path handle.
    • open

      public FSDataInputStream open(PathHandle fd, int bufferSize) throws IOException
      Description copied from class: FileSystem
      Open an FSDataInputStream matching the PathHandle instance. The implementation may encode metadata in PathHandle to address the resource directly and verify that the resource referenced satisfies constraints specified at its construciton.
      Overrides:
      open in class FileSystem
      Parameters:
      fd - PathHandle object returned by the FS authority.
      bufferSize - the size of the buffer to use
      Returns:
      input stream.
      Throws:
      InvalidPathHandleException - If PathHandle constraints are not satisfied
      IOException - IO failure
    • getChildFileSystems

      public FileSystem[] getChildFileSystems()
      Used for delegation token related functionality. Must delegate to underlying file system.
      Overrides:
      getChildFileSystems in class FileSystem
      Returns:
      FileSystems that are direct children of this FileSystem, or null for "no children"
    • create

      public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException
      Description copied from class: FileSystem
      Create an FSDataOutputStream at the indicated Path with write-progress reporting.
      Specified by:
      create in class FileSystem
      Parameters:
      f - the file name to open
      permission - file permission
      overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
      bufferSize - the size of the buffer to be used.
      replication - required block replication for the file.
      blockSize - block size
      progress - the progress reporter
      Returns:
      output stream.
      Throws:
      IOException - IO failure
      See Also:
    • createNonRecursive

      public FSDataOutputStream createNonRecursive(Path f, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException
      Description copied from class: FileSystem
      Opens an FSDataOutputStream at the indicated Path with write-progress reporting. Same as create(), except fails if parent directory doesn't already exist.
      Overrides:
      createNonRecursive in class FileSystem
      Parameters:
      f - the file name to open
      overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
      bufferSize - the size of the buffer to be used.
      replication - required block replication for the file.
      blockSize - block size
      progress - the progress reporter
      Returns:
      output stream.
      Throws:
      IOException - IO failure
      See Also:
    • append

      public FSDataOutputStream append(Path f, int bufferSize, Progressable progress) throws IOException
      Description copied from class: FileSystem
      Append to an existing file (optional operation).
      Specified by:
      append in class FileSystem
      Parameters:
      f - the existing file to be appended.
      bufferSize - the size of the buffer to be used.
      progress - for reporting progress if it is not null.
      Returns:
      output stream.
      Throws:
      IOException - IO failure
    • close

      public void close() throws IOException
      Description copied from class: FileSystem
      Close this FileSystem instance. Will release any held locks, delete all files queued for deletion through calls to FileSystem.deleteOnExit(Path), and remove this FS instance from the cache, if cached. After this operation, the outcome of any method call on this FileSystem instance, or any input/output stream created by it is undefined.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class FileSystem
      Throws:
      IOException - IO failure
    • setReplication

      public boolean setReplication(Path src, short replication) throws IOException
      Not implemented.
      Overrides:
      setReplication in class FileSystem
      Parameters:
      src - file name
      replication - new replication
      Returns:
      true if successful, or the feature in unsupported; false if replication is supported but the file does not exist, or is a directory
      Throws:
      IOException - an IO failure.
    • rename

      public boolean rename(Path src, Path dst) throws IOException
      Description copied from class: FileSystem
      Renames Path src to Path dst.
      Specified by:
      rename in class FileSystem
      Parameters:
      src - path to be renamed
      dst - new path after rename
      Returns:
      true if rename is successful
      Throws:
      IOException - on failure
    • append

      public FSDataOutputStream append(Path f) throws IOException
      Description copied from class: FileSystem
      Append to an existing file (optional operation). Same as append(f, getConf().getInt(IO_FILE_BUFFER_SIZE_KEY, IO_FILE_BUFFER_SIZE_DEFAULT), null)
      Overrides:
      append in class FileSystem
      Parameters:
      f - the existing file to be appended.
      Returns:
      output stream.
      Throws:
      IOException - IO failure
    • truncate

      public boolean truncate(Path f, long newLength) throws IOException
      Not implemented.
      Overrides:
      truncate in class FileSystem
      Parameters:
      f - The path to the file to be truncated
      newLength - The size the file is to be truncated to
      Returns:
      true if the file has been truncated to the desired newLength and is immediately available to be reused for write operations such as append, or false if a background process of adjusting the length of the last block has been started, and clients should wait for it to complete before proceeding with further file updates.
      Throws:
      IOException - IO failure
    • delete

      public boolean delete(Path f, boolean recursive) throws IOException
      Not implemented.
      Specified by:
      delete in class FileSystem
      Parameters:
      f - the path to delete.
      recursive - if path is a directory and set to true, the directory is deleted else throws an exception. In case of a file the recursive can be set to either true or false.
      Returns:
      true if delete is successful else false.
      Throws:
      IOException - IO failure
    • listStatus

      public FileStatus[] listStatus(Path f) throws IOException
      liststatus returns the children of a directory after looking up the index files.
      Specified by:
      listStatus in class FileSystem
      Parameters:
      f - given path
      Returns:
      the statuses of the files/directories in the given patch
      Throws:
      FileNotFoundException - when the path does not exist
      IOException - see specific implementation
    • getHomeDirectory

      public Path getHomeDirectory()
      return the top level archive path.
      Overrides:
      getHomeDirectory in class FileSystem
      Returns:
      the path.
    • setWorkingDirectory

      public void setWorkingDirectory(Path newDir)
      Description copied from class: FileSystem
      Set the current working directory for the given FileSystem. All relative paths will be resolved relative to it.
      Specified by:
      setWorkingDirectory in class FileSystem
      Parameters:
      newDir - Path of new working directory
    • mkdirs

      public boolean mkdirs(Path f, FsPermission permission) throws IOException
      not implemented.
      Specified by:
      mkdirs in class FileSystem
      Parameters:
      f - path to create
      permission - to apply to f
      Returns:
      if mkdir success true, not false.
      Throws:
      IOException - IO failure
    • copyFromLocalFile

      public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path src, Path dst) throws IOException
      not implemented.
      Overrides:
      copyFromLocalFile in class FileSystem
      Parameters:
      delSrc - whether to delete the src
      overwrite - whether to overwrite an existing file
      src - path
      dst - path
      Throws:
      IOException - IO failure
    • copyFromLocalFile

      public void copyFromLocalFile(boolean delSrc, boolean overwrite, Path[] srcs, Path dst) throws IOException
      Description copied from class: FileSystem
      The src files are on the local disk. Add it to the filesystem at the given dst name. delSrc indicates if the source should be removed
      Overrides:
      copyFromLocalFile in class FileSystem
      Parameters:
      delSrc - whether to delete the src
      overwrite - whether to overwrite an existing file
      srcs - array of paths which are source
      dst - path
      Throws:
      IOException - IO failure
    • copyToLocalFile

      public void copyToLocalFile(boolean delSrc, Path src, Path dst) throws IOException
      copies the file in the har filesystem to a local file.
      Overrides:
      copyToLocalFile in class FileSystem
      Parameters:
      delSrc - whether to delete the src
      src - path src file in the remote filesystem
      dst - path local destination
      Throws:
      IOException - IO failure
    • startLocalOutput

      public Path startLocalOutput(Path fsOutputFile, Path tmpLocalFile) throws IOException
      not implemented.
      Overrides:
      startLocalOutput in class FileSystem
      Parameters:
      fsOutputFile - path of output file
      tmpLocalFile - path of local tmp file
      Returns:
      the path.
      Throws:
      IOException - IO failure
    • completeLocalOutput

      public void completeLocalOutput(Path fsOutputFile, Path tmpLocalFile) throws IOException
      not implemented.
      Overrides:
      completeLocalOutput in class FileSystem
      Parameters:
      fsOutputFile - path of output file
      tmpLocalFile - path to local tmp file
      Throws:
      IOException - IO failure
    • setOwner

      public void setOwner(Path p, String username, String groupname) throws IOException
      not implemented.
      Overrides:
      setOwner in class FileSystem
      Parameters:
      p - The path
      username - If it is null, the original username remains unchanged.
      groupname - If it is null, the original groupname remains unchanged.
      Throws:
      IOException - IO failure
    • setTimes

      public void setTimes(Path p, long mtime, long atime) throws IOException
      Description copied from class: FileSystem
      Set access time of a file.
      Overrides:
      setTimes in class FileSystem
      Parameters:
      p - The path
      mtime - Set the modification time of this file. The number of milliseconds since Jan 1, 1970. A value of -1 means that this call should not set modification time.
      atime - Set the access time of this file. The number of milliseconds since Jan 1, 1970. A value of -1 means that this call should not set access time.
      Throws:
      IOException - IO failure
    • setPermission

      public void setPermission(Path p, FsPermission permission) throws IOException
      Not implemented.
      Overrides:
      setPermission in class FileSystem
      Parameters:
      p - The path
      permission - permission
      Throws:
      IOException - IO failure
    • hasPathCapability

      public boolean hasPathCapability(Path path, String capability) throws IOException
      Declare that this filesystem connector is always read only. The base FileSystem implementation generally has no knowledge of the capabilities of actual implementations. Unless it has a way to explicitly determine the capabilities, this method returns false. Probe for a specific capability under the given path. If the function returns true, this instance is explicitly declaring that the capability is available. If the function returns false, it can mean one of:
      • The capability is not known.
      • The capability is known but it is not supported.
      • The capability is known but the filesystem does not know if it is supported under the supplied path.
      The core guarantee which a caller can rely on is: if the predicate returns true, then the specific operation/behavior can be expected to be supported. However a specific call may be rejected for permission reasons, the actual file/directory not being present, or some other failure during the attempted execution of the operation.

      Implementors: PathCapabilitiesSupport can be used to help implement this method.

      Specified by:
      hasPathCapability in interface PathCapabilities
      Overrides:
      hasPathCapability in class FileSystem
      Parameters:
      path - path to query the capability of.
      capability - non-null, non-empty string to query the path for support.
      Returns:
      true if the capability is supported under that part of the FS.
      Throws:
      IOException - this should not be raised, except on problems resolving paths or relaying the call.
    • getServerDefaults

      public FsServerDefaults getServerDefaults() throws IOException
      Description copied from class: FileSystem
      Return a set of server default configuration values.
      Overrides:
      getServerDefaults in class FileSystem
      Returns:
      server default configuration values
      Throws:
      IOException - IO failure
    • getServerDefaults

      public FsServerDefaults getServerDefaults(Path f) throws IOException
      Description copied from class: FileSystem
      Return a set of server default configuration values.
      Overrides:
      getServerDefaults in class FileSystem
      Parameters:
      f - path is used to identify an FS since an FS could have another FS that it could be delegating the call to
      Returns:
      server default configuration values
      Throws:
      IOException - IO failure
    • getUsed

      public long getUsed() throws IOException
      Description copied from class: FileSystem
      Return the total size of all files in the filesystem.
      Overrides:
      getUsed in class FileSystem
      Returns:
      the number of path used.
      Throws:
      IOException - IO failure
    • getUsed

      public long getUsed(Path path) throws IOException
      Return the total size of all files from a specified path.
      Overrides:
      getUsed in class FileSystem
      Parameters:
      path - the path.
      Returns:
      the number of path content summary.
      Throws:
      IOException - IO failure
    • getDefaultBlockSize

      public long getDefaultBlockSize()
      Description copied from class: FileSystem
      Return the number of bytes that large input files should be optimally be split into to minimize I/O time.
      Overrides:
      getDefaultBlockSize in class FileSystem
      Returns:
      default block size.
    • getDefaultBlockSize

      public long getDefaultBlockSize(Path f)
      Description copied from class: FileSystem
      Return the number of bytes that large input files should be optimally be split into to minimize I/O time. The given path will be used to locate the actual filesystem. The full path does not have to exist.
      Overrides:
      getDefaultBlockSize in class FileSystem
      Parameters:
      f - path of file
      Returns:
      the default block size for the path's filesystem
    • getDefaultReplication

      public short getDefaultReplication()
      Description copied from class: FileSystem
      Get the default replication.
      Overrides:
      getDefaultReplication in class FileSystem
      Returns:
      the replication; the default value is "1".
    • getDefaultReplication

      public short getDefaultReplication(Path f)
      Description copied from class: FileSystem
      Get the default replication for a path. The given path will be used to locate the actual FileSystem to query. The full path does not have to exist.
      Overrides:
      getDefaultReplication in class FileSystem
      Parameters:
      f - of the file
      Returns:
      default replication for the path's filesystem
    • createFile

      public FSDataOutputStreamBuilder createFile(Path path)
      Description copied from class: FileSystem
      Create a new FSDataOutputStreamBuilder for the file with path. Files are overwritten by default.
      Overrides:
      createFile in class FileSystem
      Parameters:
      path - file path
      Returns:
      a FSDataOutputStreamBuilder object to build the file HADOOP-14384. Temporarily reduce the visibility of method before the builder interface becomes stable.
    • appendFile

      public FSDataOutputStreamBuilder appendFile(Path path)
      Description copied from class: FileSystem
      Create a Builder to append a file.
      Overrides:
      appendFile in class FileSystem
      Parameters:
      path - file path.
      Returns:
      a FSDataOutputStreamBuilder to build file append request.