[Kea-users] Weird stats from a shared database

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Kea-users] Weird stats from a shared database

Munroe Sollog
Let me know if this should be a bug, but I have noticed some weird stats when running two kea-1.3 DHCP servers from the same mysql database (using Galera).

I have included a screen grab of the stats.

Between noon and 2pm yesterday we when I migrated rogi from the memfile to the mysql database.  I migrated all of the existing leases from the CSV to mysql and started rogi.

Igor from around 2:30pm until about 7:45am the next day it steadily declines all the way to -436 leases.  How can it possibly have *negative* leases?

This subnet is configured with 1hr leases.  With that in mind, the fact that rogi steadily climbed overnight is very suspicious.  This seems like rogi maybe wasn't reclaiming leases correctly?

At just after 8am (towards the end of the graph), I made a configuration change to rogi and restarted it.  This seemed to have triggered the reclamation process and also brought igor back from the negative.

All and all a very weird graph.  With two servers handing out IPs for the same subnet from the same database, how does the daemon track which one it handed out vs which one the other server handed out?



--
Munroe Sollog
Senior Network Engineer

_______________________________________________
Kea-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/kea-users

kea-dhcp-stats.png (31K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Kea-users] Weird stats from a shared database

Rasmus Edgar

Hi Munroe,

As far as I understand (Kea Users please correct me if I am wrong) using two active Kea servers (load balanced) with a shared database is not supported before Kea version 1.4.

See: http://kea.isc.org/wiki/HADesign

Perhaps you are not using the same configuration on the two Kea servers. but your mail seems to suggest the Kea servers are handing out IPs from the same range.

Br,

Rasmus


Munroe Sollog skrev den 2017-12-12 15:48:

Let me know if this should be a bug, but I have noticed some weird stats when running two kea-1.3 DHCP servers from the same mysql database (using Galera).
 
I have included a screen grab of the stats.
 
Between noon and 2pm yesterday we when I migrated rogi from the memfile to the mysql database.  I migrated all of the existing leases from the CSV to mysql and started rogi.
 
Igor from around 2:30pm until about 7:45am the next day it steadily declines all the way to -436 leases.  How can it possibly have *negative* leases?
 
This subnet is configured with 1hr leases.  With that in mind, the fact that rogi steadily climbed overnight is very suspicious.  This seems like rogi maybe wasn't reclaiming leases correctly?
 
At just after 8am (towards the end of the graph), I made a configuration change to rogi and restarted it.  This seemed to have triggered the reclamation process and also brought igor back from the negative.
 
All and all a very weird graph.  With two servers handing out IPs for the same subnet from the same database, how does the daemon track which one it handed out vs which one the other server handed out?
 

 
--
Munroe Sollog
Senior Network Engineer

_______________________________________________
Kea-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/kea-users



_______________________________________________
Kea-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/kea-users
Reply | Threaded
Open this post in threaded view
|

Re: [Kea-users] Weird stats from a shared database

Tomek Mrugalski
In reply to this post by Munroe Sollog
W dniu 12.12.2017 o 15:48, Munroe Sollog pisze:

> Let me know if this should be a bug, but I have noticed some weird stats
> when running two kea-1.3 DHCP servers from the same mysql database
> (using Galera).
>
> I have included a screen grab of the stats.
>
> Between noon and 2pm yesterday we when I migrated rogi from the memfile
> to the mysql database.  I migrated all of the existing leases from the
> CSV to mysql and started rogi.
>
> Igor from around 2:30pm until about 7:45am the next day it steadily
> declines all the way to -436 leases.  How can it possibly have
> *negative* leases?
Rasmus is right. Running more than one Kea server using the same
database is not officially supported.

Here's what is likely to happen: each Kea instance allocates leases to
clients. For each allocation, the statistic is increased. The statistic
is observed on each instance. It is likely to be incorrect as there is
another instance that also allocates leases.

Now, unless you took extra steps to disable lease expiration on one
instance and keep it running on another, there are two instances
periodically looking for leases that are expired. Depending on how many
leases are expired during exact moment when the expiration triggers, one
server may get more expired leases to process than the other. Only that
server will decrease the statistic.

Finally, I don't know how you set this up, but I presume that the server
that allocated a lease will send its own server-id and thus the release
messages will be processed only by that server. So this shouldn't
contribute to the confusion, unless you did some clever things with
server-id.

You may perceive it as a bug. It's a valid point of view. But I see it
as Kea being run in a configuration that is not officially supported.
There's nothing wrong with it. We're happy it provides service and
generally works. It's just there are quirks like this.

We do have recountLeaseStats4 and recountLeaseStats6 method, but it is
only used internally. I suppose we could expose it as a command that you
could call. Kea instance would then consult the database and recalculate
the values.

As Rasmus mentioned, we do plan to improve the situation significantly
in 1.4. We want to provide a high availability solution, but also
improve many aspects of running multiple Kea servers at the same time.

I don't have any specific solution for you right now, just some things
to consider. Kea doesn't have any notion (at least not yet) of a server
instance owning a lease. You could try generating the statistic by
pooling both servers and adding the values together. Consider it an
experiment. It may or may not work. I'd love to hear about the results.

I'd like to ask you a favour. Can you describe how you did set up Galera
for MySQL on kea wiki? There are installation instructions here:
http://kea.isc.org/wiki/Install I was thinking about something similar,
but with detailed instructions how to set up Galera cluster. This would
be useful for two reasons. First, other users could set it up in similar
fashion. Second, one of ISC engineers will get to look at this problem
one day. It will be very helpful to have an instruction to replicate
your environment.

Finally, can you submit a bug for this? It would great if this bug
report had a link to the installation instruction.

Hope that helps,
Tomek
_______________________________________________
Kea-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/kea-users
Reply | Threaded
Open this post in threaded view
|

Re: [Kea-users] Weird stats from a shared database

Jason Guy
Hi Tomek,

I too was entertaining the idea of deploying 3 Kea servers for HA/Load Balancing. I had not realized this would not work in 1.3, but your explanation certainly helps. I suppose there would need to be some type of sync done between instances (message queue, multicast, etc.). 

I wanted to share my design idea as it relates to this discussion, but I have not done any testing. I was thinking the easiest and most extensible way to create HA is to use IP "anycast". This requires the hosts are configured with the same loopback IP address on all servers, and advertise it out into the network using FRR (routing on the host). This allows the network to balance the traffic to the servers using ECMP Layer-3 routing. I would bind Kea to this "anycast" loopback address. Since we use DHCP-Relay on the ToR/Leaf switches, I would configure them to relay to the Kea servers' loopback address. All the Kea servers would access the same backend SQL database cluster. I have not figured out the database side of things yet either, but assume the backend database is clustered properly and everything is in sync.

Cheers,
Jason
 

On Wed, Dec 13, 2017 at 7:12 AM, Tomek Mrugalski <[hidden email]> wrote:
W dniu 12.12.2017 o 15:48, Munroe Sollog pisze:
> Let me know if this should be a bug, but I have noticed some weird stats
> when running two kea-1.3 DHCP servers from the same mysql database
> (using Galera).
>
> I have included a screen grab of the stats.
>
> Between noon and 2pm yesterday we when I migrated rogi from the memfile
> to the mysql database.  I migrated all of the existing leases from the
> CSV to mysql and started rogi.
>
> Igor from around 2:30pm until about 7:45am the next day it steadily
> declines all the way to -436 leases.  How can it possibly have
> *negative* leases?
Rasmus is right. Running more than one Kea server using the same
database is not officially supported.

Here's what is likely to happen: each Kea instance allocates leases to
clients. For each allocation, the statistic is increased. The statistic
is observed on each instance. It is likely to be incorrect as there is
another instance that also allocates leases.

Now, unless you took extra steps to disable lease expiration on one
instance and keep it running on another, there are two instances
periodically looking for leases that are expired. Depending on how many
leases are expired during exact moment when the expiration triggers, one
server may get more expired leases to process than the other. Only that
server will decrease the statistic.

Finally, I don't know how you set this up, but I presume that the server
that allocated a lease will send its own server-id and thus the release
messages will be processed only by that server. So this shouldn't
contribute to the confusion, unless you did some clever things with
server-id.

You may perceive it as a bug. It's a valid point of view. But I see it
as Kea being run in a configuration that is not officially supported.
There's nothing wrong with it. We're happy it provides service and
generally works. It's just there are quirks like this.

We do have recountLeaseStats4 and recountLeaseStats6 method, but it is
only used internally. I suppose we could expose it as a command that you
could call. Kea instance would then consult the database and recalculate
the values.

As Rasmus mentioned, we do plan to improve the situation significantly
in 1.4. We want to provide a high availability solution, but also
improve many aspects of running multiple Kea servers at the same time.

I don't have any specific solution for you right now, just some things
to consider. Kea doesn't have any notion (at least not yet) of a server
instance owning a lease. You could try generating the statistic by
pooling both servers and adding the values together. Consider it an
experiment. It may or may not work. I'd love to hear about the results.

I'd like to ask you a favour. Can you describe how you did set up Galera
for MySQL on kea wiki? There are installation instructions here:
http://kea.isc.org/wiki/Install I was thinking about something similar,
but with detailed instructions how to set up Galera cluster. This would
be useful for two reasons. First, other users could set it up in similar
fashion. Second, one of ISC engineers will get to look at this problem
one day. It will be very helpful to have an instruction to replicate
your environment.

Finally, can you submit a bug for this? It would great if this bug
report had a link to the installation instruction.

Hope that helps,
Tomek
_______________________________________________
Kea-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/kea-users


_______________________________________________
Kea-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/kea-users
Reply | Threaded
Open this post in threaded view
|

Re: [Kea-users] Weird stats from a shared database

Sten Carlsen

Just a thought, could the sync be done as part of the database records? That would give a couple of advantages:

- granularity can be very fine if this is part of each record.

- all information is available even when one server is down.

- practically no delay for syncing as it is written at the same time as the new lease info.


On 13/12/2017 15:27, Jason Guy wrote:
Hi Tomek,

I too was entertaining the idea of deploying 3 Kea servers for HA/Load Balancing. I had not realized this would not work in 1.3, but your explanation certainly helps. I suppose there would need to be some type of sync done between instances (message queue, multicast, etc.). 

I wanted to share my design idea as it relates to this discussion, but I have not done any testing. I was thinking the easiest and most extensible way to create HA is to use IP "anycast". This requires the hosts are configured with the same loopback IP address on all servers, and advertise it out into the network using FRR (routing on the host). This allows the network to balance the traffic to the servers using ECMP Layer-3 routing. I would bind Kea to this "anycast" loopback address. Since we use DHCP-Relay on the ToR/Leaf switches, I would configure them to relay to the Kea servers' loopback address. All the Kea servers would access the same backend SQL database cluster. I have not figured out the database side of things yet either, but assume the backend database is clustered properly and everything is in sync.

Cheers,
Jason
 

On Wed, Dec 13, 2017 at 7:12 AM, Tomek Mrugalski <[hidden email]> wrote:
W dniu 12.12.2017 o 15:48, Munroe Sollog pisze:
> Let me know if this should be a bug, but I have noticed some weird stats
> when running two kea-1.3 DHCP servers from the same mysql database
> (using Galera).
>
> I have included a screen grab of the stats.
>
> Between noon and 2pm yesterday we when I migrated rogi from the memfile
> to the mysql database.  I migrated all of the existing leases from the
> CSV to mysql and started rogi.
>
> Igor from around 2:30pm until about 7:45am the next day it steadily
> declines all the way to -436 leases.  How can it possibly have
> *negative* leases?
Rasmus is right. Running more than one Kea server using the same
database is not officially supported.

Here's what is likely to happen: each Kea instance allocates leases to
clients. For each allocation, the statistic is increased. The statistic
is observed on each instance. It is likely to be incorrect as there is
another instance that also allocates leases.

Now, unless you took extra steps to disable lease expiration on one
instance and keep it running on another, there are two instances
periodically looking for leases that are expired. Depending on how many
leases are expired during exact moment when the expiration triggers, one
server may get more expired leases to process than the other. Only that
server will decrease the statistic.

Finally, I don't know how you set this up, but I presume that the server
that allocated a lease will send its own server-id and thus the release
messages will be processed only by that server. So this shouldn't
contribute to the confusion, unless you did some clever things with
server-id.

You may perceive it as a bug. It's a valid point of view. But I see it
as Kea being run in a configuration that is not officially supported.
There's nothing wrong with it. We're happy it provides service and
generally works. It's just there are quirks like this.

We do have recountLeaseStats4 and recountLeaseStats6 method, but it is
only used internally. I suppose we could expose it as a command that you
could call. Kea instance would then consult the database and recalculate
the values.

As Rasmus mentioned, we do plan to improve the situation significantly
in 1.4. We want to provide a high availability solution, but also
improve many aspects of running multiple Kea servers at the same time.

I don't have any specific solution for you right now, just some things
to consider. Kea doesn't have any notion (at least not yet) of a server
instance owning a lease. You could try generating the statistic by
pooling both servers and adding the values together. Consider it an
experiment. It may or may not work. I'd love to hear about the results.

I'd like to ask you a favour. Can you describe how you did set up Galera
for MySQL on kea wiki? There are installation instructions here:
http://kea.isc.org/wiki/Install I was thinking about something similar,
but with detailed instructions how to set up Galera cluster. This would
be useful for two reasons. First, other users could set it up in similar
fashion. Second, one of ISC engineers will get to look at this problem
one day. It will be very helpful to have an instruction to replicate
your environment.

Finally, can you submit a bug for this? It would great if this bug
report had a link to the installation instruction.

Hope that helps,
Tomek
_______________________________________________
Kea-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/kea-users



_______________________________________________
Kea-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/kea-users

-- 
Best regards

Sten Carlsen

No improvements come from shouting:

       "MALE BOVINE MANURE!!!" 

_______________________________________________
Kea-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/kea-users
Reply | Threaded
Open this post in threaded view
|

Re: [Kea-users] Weird stats from a shared database

Munroe Sollog
In reply to this post by Tomek Mrugalski
All great info to know and other than this stats weirdness everything is working fine. The only detail I didn’t share was that the servers are handing out non-overlapping IPs.  I'd be happy to provide my configuration experience for Galera.  I'll work on writing it up and share when it's done.  I will also open a bug for this issue.

Thanks for the help. 

On Wed, Dec 13, 2017 at 7:13 AM Tomek Mrugalski <[hidden email]> wrote:
W dniu 12.12.2017 o 15:48, Munroe Sollog pisze:
> Let me know if this should be a bug, but I have noticed some weird stats
> when running two kea-1.3 DHCP servers from the same mysql database
> (using Galera).
>
> I have included a screen grab of the stats.
>
> Between noon and 2pm yesterday we when I migrated rogi from the memfile
> to the mysql database.  I migrated all of the existing leases from the
> CSV to mysql and started rogi.
>
> Igor from around 2:30pm until about 7:45am the next day it steadily
> declines all the way to -436 leases.  How can it possibly have
> *negative* leases?
Rasmus is right. Running more than one Kea server using the same
database is not officially supported.

Here's what is likely to happen: each Kea instance allocates leases to
clients. For each allocation, the statistic is increased. The statistic
is observed on each instance. It is likely to be incorrect as there is
another instance that also allocates leases.

Now, unless you took extra steps to disable lease expiration on one
instance and keep it running on another, there are two instances
periodically looking for leases that are expired. Depending on how many
leases are expired during exact moment when the expiration triggers, one
server may get more expired leases to process than the other. Only that
server will decrease the statistic.

Finally, I don't know how you set this up, but I presume that the server
that allocated a lease will send its own server-id and thus the release
messages will be processed only by that server. So this shouldn't
contribute to the confusion, unless you did some clever things with
server-id.

You may perceive it as a bug. It's a valid point of view. But I see it
as Kea being run in a configuration that is not officially supported.
There's nothing wrong with it. We're happy it provides service and
generally works. It's just there are quirks like this.

We do have recountLeaseStats4 and recountLeaseStats6 method, but it is
only used internally. I suppose we could expose it as a command that you
could call. Kea instance would then consult the database and recalculate
the values.

As Rasmus mentioned, we do plan to improve the situation significantly
in 1.4. We want to provide a high availability solution, but also
improve many aspects of running multiple Kea servers at the same time.

I don't have any specific solution for you right now, just some things
to consider. Kea doesn't have any notion (at least not yet) of a server
instance owning a lease. You could try generating the statistic by
pooling both servers and adding the values together. Consider it an
experiment. It may or may not work. I'd love to hear about the results.

I'd like to ask you a favour. Can you describe how you did set up Galera
for MySQL on kea wiki? There are installation instructions here:
http://kea.isc.org/wiki/Install I was thinking about something similar,
but with detailed instructions how to set up Galera cluster. This would
be useful for two reasons. First, other users could set it up in similar
fashion. Second, one of ISC engineers will get to look at this problem
one day. It will be very helpful to have an instruction to replicate
your environment.

Finally, can you submit a bug for this? It would great if this bug
report had a link to the installation instruction.

Hope that helps,
Tomek
_______________________________________________
Kea-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/kea-users


_______________________________________________
Kea-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/kea-users