Konektor GCS di lingkungan non cloud

Aug 17 2020

Saya telah menginstal konektor GCS versi hadoop 3 dan menambahkan konfigurasi di bawah ini ke core-site.xml seperti yang dijelaskan di Install.md . Tujuannya adalah untuk memigrasi data dari hdfs di cluster lokal ke penyimpanan cloud.

core-site.xml

fs.gs.project.id=<project-id>
fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
fs.AbstractFileSystem.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS
google.cloud.auth.service.account.enable=true
google.cloud.auth.service.account.json.keyfile=<path to key file>

Memulai kembali layanan.

Ketika saya mencoba mengakses bucket di cloud untuk mencantumkan file, gagal.

 hdfs --loglevel TRACE dfs -ls gs://data-store/
    20/08/17 15:44:09 DEBUG gcs.GoogleHadoopFileSystemBase: GHFS version: hadoop3-2.1.4
    20/08/17 15:44:09 DEBUG fs.FileSystem: gs:// = class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem from /usr/hdp/3.0.0.0-1634/hadoop/lib/gcs-connector-hadoop3-latest.jar
    20/08/17 15:44:09 DEBUG fs.FileSystem: file:// = class org.apache.hadoop.fs.LocalFileSystem from /usr/hdp/3.0.0.0-1634/hadoop/hadoop-common-3.1.0.3.0.0.0-1634.jar
    20/08/17 15:44:09 DEBUG fs.FileSystem: viewfs:// = class org.apache.hadoop.fs.viewfs.ViewFileSystem from /usr/hdp/3.0.0.0-1634/hadoop/hadoop-common-3.1.0.3.0.0.0-1634.jar
    20/08/17 15:44:09 DEBUG fs.FileSystem: har:// = class org.apache.hadoop.fs.HarFileSystem from /usr/hdp/3.0.0.0-1634/hadoop/hadoop-common-3.1.0.3.0.0.0-1634.jar
    20/08/17 15:44:09 DEBUG fs.FileSystem: http:// = class org.apache.hadoop.fs.http.HttpFileSystem from /usr/hdp/3.0.0.0-1634/hadoop/hadoop-common-3.1.0.3.0.0.0-1634.jar
    20/08/17 15:44:09 DEBUG fs.FileSystem: https:// = class org.apache.hadoop.fs.http.HttpsFileSystem from /usr/hdp/3.0.0.0-1634/hadoop/hadoop-common-3.1.0.3.0.0.0-1634.jar
    20/08/17 15:44:09 DEBUG fs.FileSystem: hdfs:// = class org.apache.hadoop.hdfs.DistributedFileSystem from /usr/hdp/3.0.0.0-1634/hadoop-hdfs/hadoop-hdfs-client-3.1.0.3.0.0.0-1634.jar
    20/08/17 15:44:09 DEBUG fs.FileSystem: webhdfs:// = class org.apache.hadoop.hdfs.web.WebHdfsFileSystem from /usr/hdp/3.0.0.0-1634/hadoop-hdfs/hadoop-hdfs-client-3.1.0.3.0.0.0-1634.jar
    20/08/17 15:44:09 DEBUG fs.FileSystem: swebhdfs:// = class org.apache.hadoop.hdfs.web.SWebHdfsFileSystem from /usr/hdp/3.0.0.0-1634/hadoop-hdfs/hadoop-hdfs-client-3.1.0.3.0.0.0-1634.jar
    20/08/17 15:44:09 DEBUG fs.FileSystem: s3n:// = class org.apache.hadoop.fs.s3native.NativeS3FileSystem from /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-aws-3.1.0.3.0.0.0-1634.jar
    20/08/17 15:44:09 DEBUG fs.FileSystem: Looking for FS supporting gs
    20/08/17 15:44:09 DEBUG fs.FileSystem: looking for configuration option fs.gs.impl
    20/08/17 15:44:09 DEBUG fs.FileSystem: Filesystem gs defined in configuration option
    20/08/17 15:44:09 DEBUG fs.FileSystem: FS for gs is class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
    20/08/17 15:44:09 DEBUG gcs.GoogleHadoopFileSystemBase: initialize(path: gs://data-store/, config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, initSuperclass: true)
    20/08/17 15:44:09 DEBUG gcs.GoogleHadoopFileSystemBase: initializeDelegationTokenSupport(config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, path: gs://data-store/)
    20/08/17 15:44:09 TRACE gcs.GoogleHadoopFileSystemBase: Failed to initialize delegation token support
    java.lang.IllegalStateException: Delegation Tokens are not configured
            at com.google.cloud.hadoop.repackaged.gcs.com.google.common.base.Preconditions.checkState(Preconditions.java:508)
            at com.google.cloud.hadoop.fs.gcs.auth.GcsDelegationTokens.init(GcsDelegationTokens.java:65)
            at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initializeDelegationTokenSupport(GoogleHadoopFileSystemBase.java:578)
            at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:555)
            at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:510)
            at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
            at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
            at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
            at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
            at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
            at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325)
            at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:249)
            at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:232)
            at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104)
            at org.apache.hadoop.fs.shell.Command.run(Command.java:176)
            at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
            at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
    20/08/17 15:44:09 DEBUG gcs.GoogleHadoopFileSystemBase: GHFS_ID=GHFS/hadoop3-2.1.4: configure(config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml)

Tidak yakin apakah saya melewatkan sesuatu sehubungan dengan konfigurasi. Cluster ini di-kerbero dan ada tiket kerberos yang valid (tidak yakin apakah itu memiliki relevansi dalam skenario ini.)

Apakah ada hal yang hilang dalam konfigurasi? Ada saran?

Jawaban

2 cyxxy Aug 18 2020 at 05:42

Jejak tumpukan tentang Delegation Tokens are not configuredsebenarnya adalah ikan haring merah. Jika Anda membaca kode konektor GCS di sini , Anda akan melihat bahwa konektor akan selalu mencoba mengonfigurasi dukungan token delegasi, tetapi jika Anda tidak menentukan pengikatan melalui fs.gs.delegation.token.bindingkonfigurasi akan gagal, tetapi pengecualian yang Anda lihat di pelacakan akan dihentikan.

Sekarang mengapa perintah Anda gagal, saya ingin tahu apakah Anda memiliki kesalahan ketik di file konfigurasi Anda:

google.cloud.auth.service.account.enable-true

-bukannya =? Atau apakah ini hanya kesalahan salin-tempel?