Notepad ++ XML - Excluir tags condicionalmente com base no conteúdo da tag filho que não contém determinado conteúdo

Aug 19 2020

Eu tenho um grande arquivo xml onde preciso remover alguns elementos filho, se eles não contiverem o início correto de um conteúdo de um elemento filho aninhado.

Meu arquivo xml se parece com este:

<product>
    <catalogEntry>
      <idPath><![CDATA[K212/G425638/G425649/G426239/G426265/G601769]]></idPath>
      <namePath><![CDATA[Web Katalog DK/Solar Plus/Solar Plus EL/Afsnit 12 - Kommunikations- & sikringsmateriel/Racks/Vægracks]]></namePath>
      <ImagePath><![CDATA[K212-{\pics\_catalogmanager\sz2\ikon_solarplus.jpg}{\pics\_catmandk_kampagner\sz2\ikon solar plus_el.jpg}{\pics\_catmandk_solar plus\sz2\solarplusel_afs.13.jpg}{\pics\cubic cabinet\sz2\5709832021591p.jpg}{\pics\mass creation\sz2\0000101760-6he2060020med20plade20a.jpg}]]></ImagePath>
    </catalogEntry>
    <catalogEntry>
      <idPath><![CDATA[K352/G600248/G600247]]></idPath>
      <namePath><![CDATA[Solar plus mini guide/Rack og tilbehør/Vægrack]]></namePath>
      <ImagePath><![CDATA[K352-{}{}]]></ImagePath>
    </catalogEntry>
    <catalogEntry>
      <idPath><![CDATA[K212/G425642/G444580/G444590/G444598]]></idPath>
      <namePath><![CDATA[Web Katalog DK/Kommunikation/Rack, tilbehør, kabel management/Vægrack/Solar Plus Vægrack]]></namePath>
      <ImagePath><![CDATA[K212-{\pics\_catalogmanager\sz2\ikon_kommunikation.jpg}{\pics\_catalogmanager\sz2\kommunikation_rack-skabe_.jpg}{\pics\lk dataconnect\sz2\5703302138918p.jpg}{\pics\mass creation\sz2\0000101760-6he2060020med20plade20a.jpg}]]></ImagePath>
    </catalogEntry>
    <catalogEntry>
      <idPath><![CDATA[K193/G389888/G395066/G585958/G586999/G600567]]></idPath>
      <namePath><![CDATA[PRODUCTS NOT VISIBLE IN WEB KATALOG DK/Grp7 - Kabel § Føringsveje § Data/157R - Rune Agersnap/Kampagnemails/Afsluttede kampagner/Nye Solar plus vægrack - Gældende til op med d. 05.05.19]]></namePath>
      <ImagePath><![CDATA[K193-{}{}{}{}{\pics\mass creation\sz2\0000101760-10he2050020med20plade20fri.jpg}]]></ImagePath>
    </catalogEntry>
    <catalogEntry>
      <idPath><![CDATA[K212/G425639/G426577/G426699/G426927/G426940/G600572]]></idPath>
      <namePath><![CDATA[Web Katalog DK/EL/(10.00 - 29.99) Stærkstrømsmateriel/12.00 Kapslings- og tavlemateriel/12.30 Rack-skabe inkl. tilbehør/Vægrack/Solar plus vægracks]]></namePath>
      <ImagePath><![CDATA[K212-{\pics\_catalogmanager\sz2\ikon_el.jpg}{\pics\_catalogmanager\sz2\10.00_29.99.jpg}{\pics\_catalogmanager\sz2\12.00.jpg}{\pics\cubic cabinet\sz2\5709832045535p.jpg}{\pics\cubic cabinet\sz2\5709832045399p.jpg}{\pics\mass creation\sz2\0000101760-6he2060020med20plade20a.jpg}]]></ImagePath>
    </catalogEntry>

Eu só preciso manter os elementos onde o elemento contém <![CDATA[K212os outros <catelogEntry>elementos que preciso excluir

eu tentei com algumas variações desta declaração em localizar e substituir <catalogEntry>(?:(?!</catalogEntry>.)+[^K212](?:(?!<catalogEntry>).)+</catalogEntry>\R

mas recebo uma expressão inválida.

Respostas

Toto Aug 19 2020 at 15:07
  • Ctrl+H
  • Encontre o que: <(catalogEntry)>(?:(?!\1)(?!\[K212).)+</\1>\R?
  • Substituir com: LEAVE EMPTY
  • CHECK Match case
  • CHECK Wrap around
  • CHECK regular expression
  • VERIFICA . matches newline
  • Replace all

Explicação:

<(catalogEntry)>        # open tag and capture tag name in group 1
                # Tempered Greedy Token
(?:                     # non capture group
  (?!\1)                  # negative lookahead, make sure we haven't catalogEntry after
  (?!\[K212)              # negative lookahead, make sure we haven't [K212 after
  .                       # any character
)+                      # end group, must appear 1 or more times
</\1>                   # close tag
\R?                     # optional linebreak

Captura de tela (antes):

Captura de tela (depois):