• 隐藏侧边栏
  • 展开分类目录
  • 关注微信公众号
  • 我的GitHub
  • QQ:1753970025
Chen Jiehua

正则之坑 

最近在用golang重构一段php的代码,其中php用了大量正则,重构成golang之后,却发现性能大大下降…….. 于是乎就来好好对比一下. 假如我们php的要实现的匹配是这样的:

php实现

<?php
$pattern = '/samsung|htc|motorola|moto|gt-|sh-|sgh-|sch-|sph-|lg|sonyericsson|sony-ericsson|sony ericsson|huawei|zte|lenovo|semc|xiaomi|dopod|blackberry|tianyu|coolpad|yulong|sharp|t-mobile|tmobile|acer|verizon|k-touch|ktouch|blur|sprint|telechips|rockchip|series60|dell|meizu|garmin-asus|nec|oppo|rim|bambook|alcatel|apanda|sgh|docomo|google|tcl|tct|ty-|asus|aux|motus|sky|viewsonic|maui|toshiba|bbk|vivo|aigo|philips|amoi|hisense|sony|gionee|doov|koobee|hyundai|changhong|pantech|mot|haier|thl|cube|yusu|hesens|hosin|hp|ccpo|cher|onda|newpad|newman|eben|ebest|eton|voto|malata|delta|ainol|vollo|cooco|saihon|postcom|kttech|chuwi|kpt|ployer|infotmic|hot|roadrover|leepoo|ephone|afti|birda|bird|bifer|hedy|softwiner|hlmobile|ramos|huaqin|jls|zhane|rayhov|wishway|sxz|innos|longcheer|lovme|kingsun|qobo|sprd|bestsonny|hoow|my|moral|broncho|hkcsl_cht|ifive|amazon|konka|teclast|t-smart|alps|baiducloud|wst|bl|100jia|100jia v6|lumia|10moons|moons|spreadtrum|3gnet|kyocera|uniscope|hct|kliton|klt|xmi|xmsd|zopo|oneplus|OnePlus|micromax|mparty|m.party|paid|paidui|ximi|mito|cayon|cross|snda|yunduomi|exynos|nicai|ouki|dewav|compal|holatek|iobey|xolo|bimi|vmi|allview|epade|fmee|fomi|foxconn|foxconn international holdings limited|sanei|xdl|gfive|zoobii|zb|saga|qmobile|sanmeng|jeasung|advan|ares|intel|akai|allsmartest|hxtong|infocus|wondermedia|iball|laime|istone|meetuu|yoord|mstar semiconductor|aoledior|aoson|aqua|intex|archos|chinaleap|ouge|basicom|baiyu|benwee|best sonny|jxd|blu|boway|BOWAY|bror|phicomm|miu|casio|callbar|cikaa|colorfly|kmo|coww|conor|vertu|coolux|coship|crave|cenxin|cymi|xiami|great|simdo|daq|DAQ|datang|desay|detel|lingwin|doeasy|domi|dty|dowee|pioneer|huamei|intki|suning|mipai|miki|guangxin|eoom|ctyon|fadar|feimiao|mgo|enspert|getek|sanmei|godonie|goly|gigabyte|huomi|hongda|hasee|hike|hongmi|iguo|infinix|lava|iusai|jiayu|jiimee|karbonn|kopo|koridy|kevoo|kumai|laaboo|zomi|sict|ytone|lanmi|leagoo|loobee|lordor|cmdc|CMDC|mastone|meeg|meitu|minte|mlled|mofut|mogu|moii|moloo|ireadygo|iReadyGo|geniatech|nokia|vido|yuandao|itouch|noain|nubia|obee|qualcomm|orange|OUKI|pole|gocan|forme|symphony|voda|pulid|marvell|owwo|humi|qedirs|qiangzhe|rlt|heeyu|hualing|sunvan|smartisan|soaiy|sugar|SUGAR|tensent|ubtel|unitone|uoogou|utime|veion|vimoo|vsun|nvidia|letv|x-apple|cosun|yepen|yuyi|zhuomi|zuoku|opsson|dostyle|aole|nomi|ioco|upoo|gxq|qmi|ampe|coosee|guomi|sancup|shy|sbyh|oplus|dakele|viettel|iriver|balong|zdreal|ordro|hkc|aoc|doogee/';
$start = microtime(true);
for($i=0; $i<500000; $i++){
    $raw = array("motorola", "epade", "balong");
    foreach ($raw as $r){
        preg_match($pattern, $r, $result);
    }
}
echo microtime(true) - $start;
?>

可以看到运行需要的时间:15.811277866364s

Regexp实现

然后我们再用golang官方的正则库regexp进行实现:

package main

import (
    "fmt"
    "regexp"
    "time"
)

func main() {
    var mbstring string = samsung|htc|motorola|moto|gt-|sh-|sgh-|sch-|sph-|lg|sonyericsson|sony-ericsson|sony ericsson|huawei|zte|lenovo|semc|xiaomi|dopod|blackberry|tianyu|coolpad|yulong|sharp|t-mobile|tmobile|acer|verizon|k-touch|ktouch|blur|sprint|telechips|rockchip|series60|dell|meizu|garmin-asus|nec|oppo|rim|bambook|alcatel|apanda|sgh|docomo|google|tcl|tct|ty-|asus|aux|motus|sky|viewsonic|maui|toshiba|bbk|vivo|aigo|philips|amoi|hisense|sony|gionee|doov|koobee|hyundai|changhong|pantech|mot|haier|thl|cube|yusu|hesens|hosin|hp|ccpo|cher|onda|newpad|newman|eben|ebest|eton|voto|malata|delta|ainol|vollo|cooco|saihon|postcom|kttech|chuwi|kpt|ployer|infotmic|hot|roadrover|leepoo|ephone|afti|birda|bird|bifer|hedy|softwiner|hlmobile|ramos|huaqin|jls|zhane|rayhov|wishway|sxz|innos|longcheer|lovme|kingsun|qobo|sprd|bestsonny|hoow|my|moral|broncho|hkcsl_cht|ifive|amazon|konka|teclast|t-smart|alps|baiducloud|wst|bl|100jia|100jia v6|lumia|10moons|moons|spreadtrum|3gnet|kyocera|uniscope|hct|kliton|klt|xmi|xmsd|zopo|oneplus|OnePlus|micromax|mparty|m.party|paid|paidui|ximi|mito|cayon|cross|snda|yunduomi|exynos|nicai|ouki|dewav|compal|holatek|iobey|xolo|bimi|vmi|allview|epade|fmee|fomi|foxconn|foxconn international holdings limited|sanei|xdl|gfive|zoobii|zb|saga|qmobile|sanmeng|jeasung|advan|ares|intel|akai|allsmartest|hxtong|infocus|wondermedia|iball|laime|istone|meetuu|yoord|mstar semiconductor|aoledior|aoson|aqua|intex|archos|chinaleap|ouge|basicom|baiyu|benwee|best sonny|jxd|blu|boway|BOWAY|bror|phicomm|miu|casio|callbar|cikaa|colorfly|kmo|coww|conor|vertu|coolux|coship|crave|cenxin|cymi|xiami|great|simdo|daq|DAQ|datang|desay|detel|lingwin|doeasy|domi|dty|dowee|pioneer|huamei|intki|suning|mipai|miki|guangxin|eoom|ctyon|fadar|feimiao|mgo|enspert|getek|sanmei|godonie|goly|gigabyte|huomi|hongda|hasee|hike|hongmi|iguo|infinix|lava|iusai|jiayu|jiimee|karbonn|kopo|koridy|kevoo|kumai|laaboo|zomi|sict|ytone|lanmi|leagoo|loobee|lordor|cmdc|CMDC|mastone|meeg|meitu|minte|mlled|mofut|mogu|moii|moloo|ireadygo|iReadyGo|geniatech|nokia|vido|yuandao|itouch|noain|nubia|obee|qualcomm|orange|OUKI|pole|gocan|forme|symphony|voda|pulid|marvell|owwo|humi|qedirs|qiangzhe|rlt|heeyu|hualing|sunvan|smartisan|soaiy|sugar|SUGAR|tensent|ubtel|unitone|uoogou|utime|veion|vimoo|vsun|nvidia|letv|x-apple|cosun|yepen|yuyi|zhuomi|zuoku|opsson|dostyle|aole|nomi|ioco|upoo|gxq|qmi|ampe|coosee|guomi|sancup|shy|sbyh|oplus|dakele|viettel|iriver|balong|zdreal|ordro|hkc|aoc|doogee
    raw := []string{"motorola", "epade", "balong"}
    reg := regexp.MustCompile(mbstring)
    start := time.Now()
    for i := 0; i &amp;amp;amp;amp;lt; 500000; i++ {
        test(raw, reg)
    }
    fmt.Println(time.Since(start).Seconds())
}

func test(raw []string, reg *regexp.Regexp) (result []string) {
    result = make([]string, 0, len(raw))
    for _, r := range raw {
        brand := reg.FindStringSubmatch(r)
        result = append(result, brand[0])
    }
    return
}

可以看到运行需要的时间:252.739116391s.

效率是如此之低下………

rubex实现

不过,后来找到了另外一个golang的正则包, https://github.com/moovweb/rubex

测试了一下,时间只要6.177984456s

Slice实现

既然只是简单的匹配,那如果我们换个角度来解决这个问题,看看效率有没有提高.

package main

import (
    "fmt"
    "strings"
    "time"
)

func main() {
    var mbslice = []string{"samsung", "htc", "motorola", "moto", "gt-", "sh-", "sgh-", "sch-", "sph-", "lg", "sonyericsson", "sony-ericsson", "sony ericsson", "huawei", "zte", "lenovo", "semc", "xiaomi", "dopod", "blackberry", "tianyu", "coolpad", "yulong", "sharp", "t-mobile", "tmobile", "acer", "verizon", "k-touch", "ktouch", "blur", "sprint", "telechips", "rockchip", "series60", "dell", "meizu", "garmin-asus", "nec", "oppo", "rim", "bambook", "alcatel", "apanda", "sgh", "docomo", "google", "tcl", "tct", "ty-", "asus", "aux", "motus", "sky", "viewsonic", "maui", "toshiba", "bbk", "vivo", "aigo", "philips", "amoi", "hisense", "sony", "gionee", "doov", "koobee", "hyundai", "changhong", "pantech", "mot", "haier", "thl", "cube", "yusu", "hesens", "hosin", "hp", "ccpo", "cher", "onda", "newpad", "newman", "eben", "ebest", "eton", "voto", "malata", "delta", "ainol", "vollo", "cooco", "saihon", "postcom", "kttech", "chuwi", "kpt", "ployer", "infotmic", "hot", "roadrover", "leepoo", "ephone", "afti", "birda", "bird", "bifer", "hedy", "softwiner", "hlmobile", "ramos", "huaqin", "jls", "zhane", "rayhov", "wishway", "sxz", "innos", "longcheer", "lovme", "kingsun", "qobo", "sprd", "bestsonny", "hoow", "my", "moral", "broncho", "hkcsl_cht", "ifive", "amazon", "konka", "teclast", "t-smart", "alps", "baiducloud", "wst", "bl", "100jia", "100jia v6", "lumia", "10moons", "moons", "spreadtrum", "3gnet", "kyocera", "uniscope", "hct", "kliton", "klt", "xmi", "xmsd", "zopo", "oneplus", "OnePlus", "micromax", "mparty", "m.party", "paid", "paidui", "ximi", "mito", "cayon", "cross", "snda", "yunduomi", "exynos", "nicai", "ouki", "dewav", "compal", "holatek", "iobey", "xolo", "bimi", "vmi", "allview", "epade", "fmee", "fomi", "foxconn", "foxconn international holdings limited", "sanei", "xdl", "gfive", "zoobii", "zb", "saga", "qmobile", "sanmeng", "jeasung", "advan", "ares", "intel", "akai", "allsmartest", "hxtong", "infocus", "wondermedia", "iball", "laime", "istone", "meetuu", "yoord", "mstar semiconductor", "aoledior", "aoson", "aqua", "intex", "archos", "chinaleap", "ouge", "basicom", "baiyu", "benwee", "best sonny", "jxd", "blu", "boway", "BOWAY", "bror", "phicomm", "miu", "casio", "callbar", "cikaa", "colorfly", "kmo", "coww", "conor", "vertu", "coolux", "coship", "crave", "cenxin", "cymi", "xiami", "great", "simdo", "daq", "DAQ", "datang", "desay", "detel", "lingwin", "doeasy", "domi", "dty", "dowee", "pioneer", "huamei", "intki", "suning", "mipai", "miki", "guangxin", "eoom", "ctyon", "fadar", "feimiao", "mgo", "enspert", "getek", "sanmei", "godonie", "goly", "gigabyte", "huomi", "hongda", "hasee", "hike", "hongmi", "iguo", "infinix", "lava", "iusai", "jiayu", "jiimee", "karbonn", "kopo", "koridy", "kevoo", "kumai", "laaboo", "zomi", "sict", "ytone", "lanmi", "leagoo", "loobee", "lordor", "cmdc", "CMDC", "mastone", "meeg", "meitu", "minte", "mlled", "mofut", "mogu", "moii", "moloo", "ireadygo", "iReadyGo", "geniatech", "nokia", "vido", "yuandao", "itouch", "noain", "nubia", "obee", "qualcomm", "orange", "OUKI", "pole", "gocan", "forme", "symphony", "voda", "pulid", "marvell", "owwo", "humi", "qedirs", "qiangzhe", "rlt", "heeyu", "hualing", "sunvan", "smartisan", "soaiy", "sugar", "SUGAR", "tensent", "ubtel", "unitone", "uoogou", "utime", "veion", "vimoo", "vsun", "nvidia", "letv", "x-apple", "cosun", "yepen", "yuyi", "zhuomi", "zuoku", "opsson", "dostyle", "aole", "nomi", "ioco", "upoo", "gxq", "qmi", "ampe", "coosee", "guomi", "sancup", "shy", "sbyh", "oplus", "dakele", "viettel", "iriver", "balong", "zdreal", "ordro", "hkc", "aoc", "doogee"}
    raw := []string{"motorola", "epade", "balong"}

    start := time.Now()
    for i := 0; i &amp;lt; 500000; i++ {
        test1(raw, mbslice)
    }
    fmt.Println(time.Since(start).Seconds())
}

func test1(raw []string, mbslice []string) (result []string) {
    result = make([]string, 0, len(raw))
    for _, r := range raw {
        for _, value := range mbslice {
            if strings.Contains(r, value) {
                result = append(result, value)
                break
            }
        }
    }
    return
}

可以看到运行需要的时间:8.41121398s.

Compile时间

对于正则,使用时需要compile一次,然后再Find

reg := regexp.MustCompile(pattern)
reg.FindStringSubmatch(str)

进行10000次测试,可以看到正则的时间为26.098313702s,其中compile花费的时间为16.886362915S

Compile竟然花费了如此之多的时间!

码字很辛苦,转载请注明来自ChenJiehua《正则之坑》

评论