Notepad ++ XML - Hapus Tag Bersyarat Berdasarkan Konten Tag Anak yang tidak mengandung konten tertentu
Aug 19 2020
Saya punya file xml besar di mana saya perlu menghapus beberapa elemen anak jika mereka tidak berisi awal yang benar dari konten elemen anak bersarang.
File xml saya terlihat seperti ini:
<product>
<catalogEntry>
<idPath><![CDATA[K212/G425638/G425649/G426239/G426265/G601769]]></idPath>
<namePath><![CDATA[Web Katalog DK/Solar Plus/Solar Plus EL/Afsnit 12 - Kommunikations- & sikringsmateriel/Racks/Vægracks]]></namePath>
<ImagePath><![CDATA[K212-{\pics\_catalogmanager\sz2\ikon_solarplus.jpg}{\pics\_catmandk_kampagner\sz2\ikon solar plus_el.jpg}{\pics\_catmandk_solar plus\sz2\solarplusel_afs.13.jpg}{\pics\cubic cabinet\sz2\5709832021591p.jpg}{\pics\mass creation\sz2\0000101760-6he2060020med20plade20a.jpg}]]></ImagePath>
</catalogEntry>
<catalogEntry>
<idPath><![CDATA[K352/G600248/G600247]]></idPath>
<namePath><![CDATA[Solar plus mini guide/Rack og tilbehør/Vægrack]]></namePath>
<ImagePath><![CDATA[K352-{}{}]]></ImagePath>
</catalogEntry>
<catalogEntry>
<idPath><![CDATA[K212/G425642/G444580/G444590/G444598]]></idPath>
<namePath><![CDATA[Web Katalog DK/Kommunikation/Rack, tilbehør, kabel management/Vægrack/Solar Plus Vægrack]]></namePath>
<ImagePath><![CDATA[K212-{\pics\_catalogmanager\sz2\ikon_kommunikation.jpg}{\pics\_catalogmanager\sz2\kommunikation_rack-skabe_.jpg}{\pics\lk dataconnect\sz2\5703302138918p.jpg}{\pics\mass creation\sz2\0000101760-6he2060020med20plade20a.jpg}]]></ImagePath>
</catalogEntry>
<catalogEntry>
<idPath><![CDATA[K193/G389888/G395066/G585958/G586999/G600567]]></idPath>
<namePath><![CDATA[PRODUCTS NOT VISIBLE IN WEB KATALOG DK/Grp7 - Kabel § Føringsveje § Data/157R - Rune Agersnap/Kampagnemails/Afsluttede kampagner/Nye Solar plus vægrack - Gældende til op med d. 05.05.19]]></namePath>
<ImagePath><![CDATA[K193-{}{}{}{}{\pics\mass creation\sz2\0000101760-10he2050020med20plade20fri.jpg}]]></ImagePath>
</catalogEntry>
<catalogEntry>
<idPath><![CDATA[K212/G425639/G426577/G426699/G426927/G426940/G600572]]></idPath>
<namePath><![CDATA[Web Katalog DK/EL/(10.00 - 29.99) Stærkstrømsmateriel/12.00 Kapslings- og tavlemateriel/12.30 Rack-skabe inkl. tilbehør/Vægrack/Solar plus vægracks]]></namePath>
<ImagePath><![CDATA[K212-{\pics\_catalogmanager\sz2\ikon_el.jpg}{\pics\_catalogmanager\sz2\10.00_29.99.jpg}{\pics\_catalogmanager\sz2\12.00.jpg}{\pics\cubic cabinet\sz2\5709832045535p.jpg}{\pics\cubic cabinet\sz2\5709832045399p.jpg}{\pics\mass creation\sz2\0000101760-6he2060020med20plade20a.jpg}]]></ImagePath>
</catalogEntry>
Saya hanya perlu menyimpan elemen di mana elemen tersebut berisi elemen <.)+</catalogEntry>\R
tapi saya mendapatkan ekspresi yang tidak valid.
Jawaban
Toto Aug 19 2020 at 15:07
- Ctrl+H
- Menemukan apa:
<(catalogEntry)>(?:(?!\1)(?!\[K212).)+</\1>\R?
- Ubah dengan:
LEAVE EMPTY
- PERIKSA Kasus yang cocok
- PERIKSA Bungkus sekitar
- PERIKSA Ekspresi reguler
- MEMERIKSA
. matches newline
- Replace all
Penjelasan:
<(catalogEntry)> # open tag and capture tag name in group 1
# Tempered Greedy Token
(?: # non capture group
(?!\1) # negative lookahead, make sure we haven't catalogEntry after
(?!\[K212) # negative lookahead, make sure we haven't [K212 after
. # any character
)+ # end group, must appear 1 or more times
</\1> # close tag
\R? # optional linebreak
Tangkapan layar (sebelum):

Tangkapan layar (setelah):

Taylor Sheridan Baru Menambahkan 1 Bintang 'Yellowstone' Favoritnya ke Pemeran 'Lawmen: Bass Reeves'