smsd, 5110, duplicate SMS, phone hangs (was: Re: duplicate sms with conc

From: Valerio Granato
Subject: smsd, 5110, duplicate SMS, phone hangs (was: Re: duplicate sms with concurrent arrivals)
Date: Sat, 11 Feb 2006 15:35:56 +0100

---- Original Message ----
From: "Jan Derfinak" <address@hidden>
To: "Discussion forum for gnokii users." <address@hidden>
Sent: Thursday, December 15, 2005 2:38 PM
Subject: Re: duplicate sms with concurrent arrivals

Not good. Please use gdb and show me the stack trace.


First of all, my apologize for the very late reply.
Let me write a little bit of story: I've some phones connected
to a Cylades multiserial. Sometimes, on heavy load phones,
the smsd duplicates SMSs, inserting more than one copy into
the (postgresql) database.
Other times the smsd dies.
Jan sent to me a patch to reconnect instead to go into dumb
mode, but at a first test the smsd's gone in a segmentation
fault. At this point I had to stop tests because of some urgent
work. <the end>

To temporarily solve the problem I've made a little php
script that search for duplicates and removes them 'after'
the db insertion.
The smsd has two patches applied: the one 'realconnect'
by Jan and another made by me to let the smsd die on
db failure. I 'serviced' the smsd using daemontools, so if
one of the db servers dies or a segfault happens, a shell
script waits for 10 seconds and tries to restart smsd.

I've seen that using a 6210 the smsd *never* go in dumb
mode and duplicates only a few messages (1 o 2 dups
every 100 recvd); unsing the 5110 I can see even 33% dups.
These are the stats, sms gateway server reboot about 4
days ago:

/service/5110a: up (pid 12073) 11127 seconds
/service/5110b: up (pid 20287) 82094 seconds
/service/5110c: up (pid 21090) 3955 seconds
/service/5100d: up (pid 23451) 2492 seconds
/service/6210a: up (pid 20427) 290065 seconds

I've ran all smsd instances with debug; the last line of
every restart is or <null> or

Feb 10 19:23:31 alix 5110a: 1139595811.131506 SM_Block: exiting the retry loop Feb 10 19:23:35 alix 5110a: 1139595815.132893 GSM/FBUS init failed! (Unknown model ?). Quitting.

usually restart works, sometimes the phone completely hang
and I need to remove and reinsert battery to switch on again.

A new thing: I have, on another server, a 3100 used only
to send out messages and receive delivery report.
It happens that after sending out _one message_ the
smsd go in dumb mode and stop sending/receiving.

Now I'm going to server farm because of the debug
attached. I'll have to remove the 3100 battery, wait for a
few seconds and then reattach the phone.

Thank you for your patience :-)


