Powershell XMLDocument simpan sebagai UTF-8 tanpa BOM

Aug 19 2020

Saya membuat objek XML dengan tipe System.Xml.XmlDocument.

$scheme.gettype()
IsPublic IsSerial Name BaseType                                                         
-------- -------- ---- --------                                                         
True     False    XmlDocument System.Xml.XmlNode 

Saya menggunakan metode save () untuk menyimpannya ke file.

$scheme.save()

Ini menyimpan file dalam format UTF-8 dengan BOM. BOM menyebabkan masalah dengan skrip lain di telepon.

Ketika kami membuka file XML di Notepad ++ dan menyimpannya sebagai UTF-8 (tanpa BOM), skrip lain di telepon tidak memiliki masalah. Jadi saya telah diminta untuk menyimpan skrip tanpa BOM.

The MS dokumentasi untuk menyimpan metode negara:

The value of the encoding attribute is taken from the XmlDeclaration.Encoding property. If the XmlDocument does not have an XmlDeclaration, or if the XmlDeclaration does not have an encoding attribute, the saved document will not have one either.

The MS documentation on XmlDeclaration lists encoding properties of UTF-8, UTF-16 and others. It does not mention a BOM.

Does the XmlDeclaration have an encoding property that leaves out the BOM?

PS. This behavior is identical in Powershell 5 and Powershell 7.

Jawaban

2 mklement0 Aug 19 2020 at 09:39

Unfortunately, the explicit presence of an encoding="utf-8" attribute in the declaration of an XML document causes .NET to .Save() the document to an UTF-8-encoded file with BOM if a target file path is given, which can indeed cause problems.

A request to change this was rejected for fear of breaking backward compatibility; here's a request to at least document the behavior.

Somewhat ironically, the absence of an encoding attribute causes .Save() to create UTF-8-encoded files without a BOM.

A simple solution is therefore to remove the encoding attribute[1]; e.g.:

# Create a sample XML document:
$xmlDoc = [xml] '<?xml version="1.0" encoding="utf-8"?><foo>bar</foo>' # Remove the 'encoding' attribute from the declaration. # Without this, the .Save() method below would create a UTF-8 file *with* BOM. $xmlDoc.ChildNodes[0].Encoding = $null # Now, saving produces a UTf-8 file *without* a BOM. $xmlDoc.Save("$PWD/out.xml")

[1] This is safe to do, because the XML W3C Recommendation effectively mandates UTF-8 as the default in the absence of both a BOM and an encoding attribute.

1 MathiasR.Jessen Aug 19 2020 at 05:20

As BACON explains in the comments, the string value of the Encoding attribute in the XML declaration doesn't have any bearing on how the file containing the document is encoded.

You can control this by creating either a StreamWriter or an XmlWriter with a non-BOM UTF8Encoding, then pass that to Save($writer):

$filename = Resolve-Path path\to\output.xml

# Create UTF8Encoding instance, sans BOM
$encoding = [System.Text.UTF8Encoding]::new($false)

# Create StreamWriter instance
$writer = [System.IO.StreamWriter]::new($filename, $false, $encoding)

# Save using (either) writer
$scheme.Save($writer)

# Dispose of writer
$writer.Dispose()

Alternatively use an [XmlWriter]:

# XmlWriter Example
$writer = [System.Xml.XmlWriter]::Create($filename, @{ Encoding = $encoding })

The second argument is an [XmlWriterSettings] object, through which we can exercise greater control over formatting options in addition to explicitly set encoding:

$settings = [System.Xml.XmlWriterSettings]@{ Encoding = $encoding
  Indent = $true NewLineOnAttributes = $true
}
$writer = [System.Xml.XmlWriter]::Create($filename, $settings)

#  <?xml version="1.0" encoding="utf-8"?>
#  <Config>
#    <Group
#      name="PropertyGroup">
#      <Property
#        id="1"
#        value="Foo" />
#      <Property
#        id="2"
#        value="Bar"
#        exclude="false" />
#    </Group>
#  </Config>