[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
parenthesised regular expressions and non-greedy operator ? - non standa
From: |
dirk |
Subject: |
parenthesised regular expressions and non-greedy operator ? - non standard bash behaviour |
Date: |
Fri, 1 Dec 2017 18:40:35 +0100 (CET) |
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64'
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-pc-linux-gnu'
-DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL
-DHAVE_CONFIG_H -I. -I../. -I.././include -I.././lib -Wdate-time
-D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector-strong -Wformat
-Werror=format-security -Wall -no-pie -Wno-parentheses -Wno-format-security
uname output: Linux dilbert 4.10.0-41-generic #45~16.04.1-Ubuntu SMP Fri Nov 24
15:06:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu
Bash Version: 4.4
Patch Level: 12
Release Status: release
Description:
I'm sanitising urls from advertisement crap. As described below I'm getting a
wrong resolution of parenthesised expression defined with non-greedy operator
'?'.
The test url is:
http://toolbox.contentspread.net/container/medimops/track/xxxxxxxxxx.dyn?csRdu=https://www.medimops.de/?anid=M9999999999&cl=details&wdm=M9999999999&utm_source=CRM&utm_medium=email&utm_campaign=OS
The regular expression is:
https?:\/\/toolbox.contentspread.net\/(.*?)=(.+?)&.*
As I understand the specification and verified with 'visual regexp' and
https://regex101.com/ the result should be:
1 â container/medimops/track/xxxxxxxxxx.dyn?csRdu
2 â https://www.medimops.de/?anid=M9999999999
Running the script below I got instead:
1 â
container/medimops/track/xxxxxxxxxx.dyn?csRdu=https://www.medimops.de/?anid=M9999999999&cl=details&wdm=M9999999999&utm_source=CRM&utm_medium
2 â email
Repeat-By:
Test script:
#!/bin/bash
url='http://toolbox.contentspread.net/container/medimops/track/xxxxxxxxxx.dyn?csRdu=https://www.medimops.de/?anid=M9999999999&cl=details&wdm=M9999999999&utm_source=CRM&utm_medium=email&utm_campaign=OS'
re='https?:\/\/toolbox.contentspread.net\/(.*?)=(.+?)&.*'
if [[ ${url} =~ ${re} ]]
then
echo "0 â ${BASH_REMATCH[0]}"
echo "1 â ${BASH_REMATCH[1]}"
echo "2 â ${BASH_REMATCH[2]}"
fi
- parenthesised regular expressions and non-greedy operator ? - non standard bash behaviour,
dirk <=