bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gawk] awk big file dead loop


From: dragan legic
Subject: [bug-gawk] awk big file dead loop
Date: Fri, 31 Oct 2014 17:21:58 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Icedove/24.8.1

Program-utility works properly till the output file reaches a certain
file size, at which point it blocks up the memory. When the output is
checked from command [??] top, it can be seen that CPU is busy with
almost nothing, but the memory is at 80%. I have 8 GB of RAM, input file
is 670 MB, that's certainly a lot smaller than 8 GB. The program is
doing a trivial operation where it eliminates the double syllables [??],
such operations were in IBM tests a long time ago. The program does the
following: with the prerequisite that the file is sorted [??], the
program loads one syllable, loads the next one, compares them, and if
they are the same it loads the next syllable and compares it with the
first one it keep saved. If the next loaded syllable is not the same,
the saved syllable gets written to the output file, and the new syllable
gets put in a buffer and is used for a new comparison. This is where the
utility is making a mess it seems, it takes up all the RAM and bothers
itself with swapping which isn't its strong point. The essential mistake
is that it takes up all the RAM for a file that's much smaller than all
the available RAM. Most likely awk eats up all the memory with buffers,
which it uses for comparing, because it's not releasing the memory. It
can be clearly seen that the program awk is working 2 hours on a PC with
8 GB RAM, AMD-FX-6100 CPU six core, SATA-3 HDD 2TB. Something like this
could have already been completed with 512KB memory on an old Facom.I
must kill awk program becauses it work in dead loop. I use sort -u and
this operation finish for 3 minuts.

|address@hidden
|
||cat izlaz.txt | awk '!seen[$0]++' >>  izlaz1.txt

|top
Tasks: 214 total,   2 running, 212 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,8 us,  0,6 sy,  0,0 ni, 83,7 id, 14,9 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem:   8059252 total,  7921212 used,   138040 free,     1800 buffers
KiB Swap: 10251260 total,  4649756 used,  5601504 free.   201920 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND    
2831 dragan    20   0 3110044  48940  33448 S   4,3  0,6  12:00.57 kwin        
2854 dragan    20   0 4385988  80660  19788 S   3,3  1,0  15:14.00 plasma-des+
1580 root      20   0  378232 129836  95588 S   2,3  1,6  16:18.19 Xorg        
11157 dragan    20   0  523628  17104   6552 S   0,7  0,2   0:05.19 konsole    
1954 dirmngr   20   0   21980    120     48 S   0,3  0,0   0:01.88 dirmngr    
2881 dragan    20   0 1840204  11128      0 S   0,3  0,1   0:14.25 mysqld      
10936 dragan    20   0  825844   7812   3108 S   0,3  0,1   0:05.40 dolphin    
10969 dragan    20   0 8294648 6,772g    164 D   0,3 88,1   1:31.92 awk

lsb_release -a
LSB Version:    
core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch:core-4.1-amd64:core-4.1-noarch:security-4.0-amd64:security-4.0-noarch:security-4.1-amd64:security-4.1-noarch
Distributor ID: LinuxMint
Description:    Linux Mint 17 Qiana
Release:        17
Codename:       qiana

inxi -F
System:    Host: dragan-MS-7693 Kernel: 3.14.21-031421-generic x86_64 (64 bit)
           Desktop: KDE 4.13.3 Distro: Linux Mint 17 Qiana
Machine:   Mobo: MSI model: 970A-G46 (MS-7693) version: 2.0 Bios: American 
Megatrends version: V2.6 date: 10/08/2013
CPU:       Hexa core AMD FX-6100 Six-Core (-MCP-) cache: 12288 KB flags: (lm nx 
sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm)
           Clock Speeds: 1: 3300.00 MHz 2: 3300.00 MHz 3: 1400.00 MHz 4: 
3300.00 MHz 5: 1400.00 MHz 6: 3300.00 MHz
Graphics:  Card: Advanced Micro Devices [AMD/ATI] Bonaire XTX [Radeon R7 260X]
           X.Org: 1.15.1 driver: fglrx Resolution: address@hidden
           GLX Renderer: AMD Radeon R7 200 Series GLX Version: 4.4.13084 - CPC 
14.301.1001
Audio:     Card-1: Advanced Micro Devices [AMD/ATI] Device aac0 driver: 
snd_hda_intel
           Card-2: Advanced Micro Devices [AMD/ATI] SBx00 Azalia (Intel HDA) 
driver: snd_hda_intel
           Card-3: Logitech Portable Webcam C905 driver: USB Audio
           Sound: Advanced Linux Sound Architecture ver: k3.14.21-031421-generic
Network:   Card-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet 
Controller driver: r8169
           IF: eth0 state: up speed: 1000 Mbps duplex: full mac: 
8c:89:a5:9c:d9:b3
           Card-2: Ovislink AirLive WL-1600USB 802.11g Adapter [Realtek 
RTL8187L] driver: rtl8187
           IF: wlan0 state: down mac: 00:4f:78:01:5b:4a
Drives:    HDD Total Size: 4060.8GB (35.0% used) 1: id: /dev/sda model: 
Patriot_Pyro size: 60.0GB
           2: id: /dev/sdb model: ST1000DL002 size: 1000.2GB 3: id: /dev/sdc 
model: WDC_WD10EZEX size: 1000.2GB
           4: id: /dev/sdd model: ST2000DM001 size: 2000.4GB
Partition: ID: / size: 69G used: 17G (25%) fs: ext4 ID: /home size: 70G used: 
22G (34%) fs: ext4
           ID: swap-1 size: 10.50GB used: 0.00GB (0%) fs: swap
RAID:      No RAID devices detected - /proc/mdstat and md_mod kernel raid 
module present
Sensors:   System Temperatures: cpu: 54.0C mobo: 37.1C gpu: 54.00C
           Fan Speeds (in rpm): cpu: 1118 fan-1: 3685 fan-3: 1639
Info:      Processes: 222 Uptime: 3:55 Memory: 2021.9/7870.4MB Client: Shell 
inxi: 1.8.4




reply via email to

[Prev in Thread] Current Thread [Next in Thread]