Use the HDFS Snap to read/write to the HDFS file system in delimited format. A variety of delimiters can be used. HDFS files that are in delimited file format can be the consumed by the HDFS Reader component and the HDFS Writer Component can generate delimited files that can be consumed by other Hadoop tools like Hive.
The HDFS Read/Write Snap supports read and write capabilities on files delimited by single characters or arbitrary strings on Hadoop DFS file system.
- This Snap reads and writes to delimited file sources from HDFS file system
- Delimiters can be arbitrary strings
- If a field in the source being read has no data, the HDFS Snap produces an empty string for a corresponding output field of STRING type, or a null value for an output field of NUMBER type
- If a field in the source being read is missing, the HDFS Snap produces a null value
- If an input record field has no data, the HDFS Snap outputs nothing for the field
- The HDFS Snap terminates each record it writes with an ‘\n’. It only writes “\n” on all platforms. It does not make OS specific interpretation of ‘\n’ (such as writing ‘\r\n’ on Windows)
Supported and tested versions: Apache Hadoop 1.0.3, CDH (Cloudera’s Distribution including Apache Hadoop) 4.0, CDH 4.1.x
Supported versions: Apache Hadoop 1.1.0 (beta)
Learn more about Big Data Integration here.
Learn more about cloud application integration here.
Learn more about cloud data integration here.