Notepad ++ XML - Elimina i tag in base al contenuto di tag figlio che non contiene determinati contenuti

Aug 19 2020

Ho un file xml di grandi dimensioni in cui devo rimuovere alcuni elementi figlio se non contengono l'inizio corretto di un contenuto di un elemento figlio nidificato.

Il mio file xml ha questo aspetto:

<product>
    <catalogEntry>
      <idPath><![CDATA[K212/G425638/G425649/G426239/G426265/G601769]]></idPath>
      <namePath><![CDATA[Web Katalog DK/Solar Plus/Solar Plus EL/Afsnit 12 - Kommunikations- & sikringsmateriel/Racks/Vægracks]]></namePath>
      <ImagePath><![CDATA[K212-{\pics\_catalogmanager\sz2\ikon_solarplus.jpg}{\pics\_catmandk_kampagner\sz2\ikon solar plus_el.jpg}{\pics\_catmandk_solar plus\sz2\solarplusel_afs.13.jpg}{\pics\cubic cabinet\sz2\5709832021591p.jpg}{\pics\mass creation\sz2\0000101760-6he2060020med20plade20a.jpg}]]></ImagePath>
    </catalogEntry>
    <catalogEntry>
      <idPath><![CDATA[K352/G600248/G600247]]></idPath>
      <namePath><![CDATA[Solar plus mini guide/Rack og tilbehør/Vægrack]]></namePath>
      <ImagePath><![CDATA[K352-{}{}]]></ImagePath>
    </catalogEntry>
    <catalogEntry>
      <idPath><![CDATA[K212/G425642/G444580/G444590/G444598]]></idPath>
      <namePath><![CDATA[Web Katalog DK/Kommunikation/Rack, tilbehør, kabel management/Vægrack/Solar Plus Vægrack]]></namePath>
      <ImagePath><![CDATA[K212-{\pics\_catalogmanager\sz2\ikon_kommunikation.jpg}{\pics\_catalogmanager\sz2\kommunikation_rack-skabe_.jpg}{\pics\lk dataconnect\sz2\5703302138918p.jpg}{\pics\mass creation\sz2\0000101760-6he2060020med20plade20a.jpg}]]></ImagePath>
    </catalogEntry>
    <catalogEntry>
      <idPath><![CDATA[K193/G389888/G395066/G585958/G586999/G600567]]></idPath>
      <namePath><![CDATA[PRODUCTS NOT VISIBLE IN WEB KATALOG DK/Grp7 - Kabel § Føringsveje § Data/157R - Rune Agersnap/Kampagnemails/Afsluttede kampagner/Nye Solar plus vægrack - Gældende til op med d. 05.05.19]]></namePath>
      <ImagePath><![CDATA[K193-{}{}{}{}{\pics\mass creation\sz2\0000101760-10he2050020med20plade20fri.jpg}]]></ImagePath>
    </catalogEntry>
    <catalogEntry>
      <idPath><![CDATA[K212/G425639/G426577/G426699/G426927/G426940/G600572]]></idPath>
      <namePath><![CDATA[Web Katalog DK/EL/(10.00 - 29.99) Stærkstrømsmateriel/12.00 Kapslings- og tavlemateriel/12.30 Rack-skabe inkl. tilbehør/Vægrack/Solar plus vægracks]]></namePath>
      <ImagePath><![CDATA[K212-{\pics\_catalogmanager\sz2\ikon_el.jpg}{\pics\_catalogmanager\sz2\10.00_29.99.jpg}{\pics\_catalogmanager\sz2\12.00.jpg}{\pics\cubic cabinet\sz2\5709832045535p.jpg}{\pics\cubic cabinet\sz2\5709832045399p.jpg}{\pics\mass creation\sz2\0000101760-6he2060020med20plade20a.jpg}]]></ImagePath>
    </catalogEntry>

Ho solo bisogno di mantenere gli elementi in cui l'elemento contiene <![CDATA[K212gli altri <catelogEntry>elementi che devo eliminare

ho provato con alcune variazioni su questa affermazione in trova e sostituisci <catalogEntry>(?:(?!</catalogEntry>.)+[^K212](?:(?!<catalogEntry>).)+</catalogEntry>\R

ma ottengo un'espressione non valida.

Risposte

Toto Aug 19 2020 at 15:07
  • Ctrl+H
  • Trovare cosa: <(catalogEntry)>(?:(?!\1)(?!\[K212).)+</\1>\R?
  • Sostituirlo con: LEAVE EMPTY
  • CONTROLLA Caso di corrispondenza
  • CONTROLLA Avvolgere
  • CONTROLLA Espressione regolare
  • DAI UN'OCCHIATA . matches newline
  • Replace all

Spiegazione:

<(catalogEntry)>        # open tag and capture tag name in group 1
                # Tempered Greedy Token
(?:                     # non capture group
  (?!\1)                  # negative lookahead, make sure we haven't catalogEntry after
  (?!\[K212)              # negative lookahead, make sure we haven't [K212 after
  .                       # any character
)+                      # end group, must appear 1 or more times
</\1>                   # close tag
\R?                     # optional linebreak

Screenshot (prima):

Screenshot (dopo):