Skip to content

Commit 82f148e

Browse files
committed
Merge branch 'bonding'
Veaceslav Falico says: ==================== bonding: add an option to rely on unvalidated arp packets v4 -> v5: Again per Nik's advise correct the bond_opts restrictions for arp_validate - set it the same as arp_interval. v3 -> v4: Per Nikolay's advise, remove the new bond_opts restriction on modes setting for arp_validate. v2 -> v3: Per Jay's advise, use the 'filter' keyword instead of 'arp' one, and use his text for documentation. Also, rebase on the latest net-next. Sorry for the delay, didn't manage to send it before net-next was closed. v1 -> v2: Don't remove the 'all traffic' functionality - rather, add new arp_validate options to specify that we want *only* unvalidated arps. Currently, if arp_validate is off (0), slave_last_rx() returns the slave->dev->last_rx, which is always updated on *any* packet received by slave, and not only arps. This means that, if the validation of arps is off, we're treating *any* incoming packet as a proof of slave being up, and not only arps. This might seem logical at the first glance, however it can cause a lot of troubles and false-positives, one example would be: The arp_ip_target is NOT accessible, however someone in the broadcast domain spams with any broadcast traffic. This way bonding will be tricked that the slave is still up (as in - can access arp_ip_target), while it's not. The net_device->last_rx is already used in a lot of drivers (even though the comment states to NOT do it :)), and it's also ugly to modify it from bonding. However, some loadbalance setups might rely on the fact that even non-arp traffic is a sign of slave being up - and we definitely can't break anyones config - so an extension to arp_validate is needed. So, to fix this, add an option for the user to specify if he wants to filter out non-arp traffic on unvalidated slaves, remove the last_rx from bonding, *always* call bond_arp_rcv() in slave's rx_handler (which is bond_handle_frame), and if we spot an arp there with this option on - update the slave->last_arp_rx - and use it instead of net_device->last_rx. Finally, rename last_arp_rx to last_rx to reflect the changes. Also rename slave->jiffies to ->last_link_up, to reflect better its meaning, add the new option's documentation and update the arp_validate one to be a bit more descriptive. ==================== Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2 parents 814ce14 + 49f17de commit 82f148e

File tree

5 files changed

+119
-86
lines changed

5 files changed

+119
-86
lines changed

Documentation/networking/bonding.txt

Lines changed: 66 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -270,16 +270,15 @@ arp_ip_target
270270
arp_validate
271271

272272
Specifies whether or not ARP probes and replies should be
273-
validated in the active-backup mode. This causes the ARP
274-
monitor to examine the incoming ARP requests and replies, and
275-
only consider a slave to be up if it is receiving the
276-
appropriate ARP traffic.
273+
validated in any mode that supports arp monitoring, or whether
274+
non-ARP traffic should be filtered (disregarded) for link
275+
monitoring purposes.
277276

278277
Possible values are:
279278

280279
none or 0
281280

282-
No validation is performed. This is the default.
281+
No validation or filtering is performed.
283282

284283
active or 1
285284

@@ -293,31 +292,68 @@ arp_validate
293292

294293
Validation is performed for all slaves.
295294

296-
For the active slave, the validation checks ARP replies to
297-
confirm that they were generated by an arp_ip_target. Since
298-
backup slaves do not typically receive these replies, the
299-
validation performed for backup slaves is on the ARP request
300-
sent out via the active slave. It is possible that some
301-
switch or network configurations may result in situations
302-
wherein the backup slaves do not receive the ARP requests; in
303-
such a situation, validation of backup slaves must be
304-
disabled.
305-
306-
The validation of ARP requests on backup slaves is mainly
307-
helping bonding to decide which slaves are more likely to
308-
work in case of the active slave failure, it doesn't really
309-
guarantee that the backup slave will work if it's selected
310-
as the next active slave.
311-
312-
This option is useful in network configurations in which
313-
multiple bonding hosts are concurrently issuing ARPs to one or
314-
more targets beyond a common switch. Should the link between
315-
the switch and target fail (but not the switch itself), the
316-
probe traffic generated by the multiple bonding instances will
317-
fool the standard ARP monitor into considering the links as
318-
still up. Use of the arp_validate option can resolve this, as
319-
the ARP monitor will only consider ARP requests and replies
320-
associated with its own instance of bonding.
295+
filter or 4
296+
297+
Filtering is applied to all slaves. No validation is
298+
performed.
299+
300+
filter_active or 5
301+
302+
Filtering is applied to all slaves, validation is performed
303+
only for the active slave.
304+
305+
filter_backup or 6
306+
307+
Filtering is applied to all slaves, validation is performed
308+
only for backup slaves.
309+
310+
Validation:
311+
312+
Enabling validation causes the ARP monitor to examine the incoming
313+
ARP requests and replies, and only consider a slave to be up if it
314+
is receiving the appropriate ARP traffic.
315+
316+
For an active slave, the validation checks ARP replies to confirm
317+
that they were generated by an arp_ip_target. Since backup slaves
318+
do not typically receive these replies, the validation performed
319+
for backup slaves is on the broadcast ARP request sent out via the
320+
active slave. It is possible that some switch or network
321+
configurations may result in situations wherein the backup slaves
322+
do not receive the ARP requests; in such a situation, validation
323+
of backup slaves must be disabled.
324+
325+
The validation of ARP requests on backup slaves is mainly helping
326+
bonding to decide which slaves are more likely to work in case of
327+
the active slave failure, it doesn't really guarantee that the
328+
backup slave will work if it's selected as the next active slave.
329+
330+
Validation is useful in network configurations in which multiple
331+
bonding hosts are concurrently issuing ARPs to one or more targets
332+
beyond a common switch. Should the link between the switch and
333+
target fail (but not the switch itself), the probe traffic
334+
generated by the multiple bonding instances will fool the standard
335+
ARP monitor into considering the links as still up. Use of
336+
validation can resolve this, as the ARP monitor will only consider
337+
ARP requests and replies associated with its own instance of
338+
bonding.
339+
340+
Filtering:
341+
342+
Enabling filtering causes the ARP monitor to only use incoming ARP
343+
packets for link availability purposes. Arriving packets that are
344+
not ARPs are delivered normally, but do not count when determining
345+
if a slave is available.
346+
347+
Filtering operates by only considering the reception of ARP
348+
packets (any ARP packet, regardless of source or destination) when
349+
determining if a slave has received traffic for link availability
350+
purposes.
351+
352+
Filtering is useful in network configurations in which significant
353+
levels of third party broadcast traffic would fool the standard
354+
ARP monitor into considering the links as still up. Use of
355+
filtering can resolve this, as only ARP traffic is considered for
356+
link availability purposes.
321357

322358
This option was added in bonding version 3.1.0.
323359

drivers/net/bonding/bond_main.c

Lines changed: 24 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -798,7 +798,7 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
798798
return;
799799

800800
if (new_active) {
801-
new_active->jiffies = jiffies;
801+
new_active->last_link_up = jiffies;
802802

803803
if (new_active->link == BOND_LINK_BACK) {
804804
if (USES_PRIMARY(bond->params.mode)) {
@@ -1115,9 +1115,6 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
11151115
slave = bond_slave_get_rcu(skb->dev);
11161116
bond = slave->bond;
11171117

1118-
if (bond->params.arp_interval)
1119-
slave->dev->last_rx = jiffies;
1120-
11211118
recv_probe = ACCESS_ONCE(bond->recv_probe);
11221119
if (recv_probe) {
11231120
ret = recv_probe(skb, bond, slave);
@@ -1400,10 +1397,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
14001397

14011398
bond_update_speed_duplex(new_slave);
14021399

1403-
new_slave->last_arp_rx = jiffies -
1400+
new_slave->last_rx = jiffies -
14041401
(msecs_to_jiffies(bond->params.arp_interval) + 1);
14051402
for (i = 0; i < BOND_MAX_ARP_TARGETS; i++)
1406-
new_slave->target_last_arp_rx[i] = new_slave->last_arp_rx;
1403+
new_slave->target_last_arp_rx[i] = new_slave->last_rx;
14071404

14081405
if (bond->params.miimon && !bond->params.use_carrier) {
14091406
link_reporting = bond_check_dev_link(bond, slave_dev, 1);
@@ -1447,7 +1444,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
14471444
}
14481445

14491446
if (new_slave->link != BOND_LINK_DOWN)
1450-
new_slave->jiffies = jiffies;
1447+
new_slave->last_link_up = jiffies;
14511448
pr_debug("Initial state of slave_dev is BOND_LINK_%s\n",
14521449
new_slave->link == BOND_LINK_DOWN ? "DOWN" :
14531450
(new_slave->link == BOND_LINK_UP ? "UP" : "BACK"));
@@ -1894,7 +1891,7 @@ static int bond_miimon_inspect(struct bonding *bond)
18941891
* recovered before downdelay expired
18951892
*/
18961893
slave->link = BOND_LINK_UP;
1897-
slave->jiffies = jiffies;
1894+
slave->last_link_up = jiffies;
18981895
pr_info("%s: link status up again after %d ms for interface %s\n",
18991896
bond->dev->name,
19001897
(bond->params.downdelay - slave->delay) *
@@ -1969,7 +1966,7 @@ static void bond_miimon_commit(struct bonding *bond)
19691966

19701967
case BOND_LINK_UP:
19711968
slave->link = BOND_LINK_UP;
1972-
slave->jiffies = jiffies;
1969+
slave->last_link_up = jiffies;
19731970

19741971
if (bond->params.mode == BOND_MODE_8023AD) {
19751972
/* prevent it from being the active one */
@@ -2245,7 +2242,7 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
22452242
pr_debug("bva: sip %pI4 not found in targets\n", &sip);
22462243
return;
22472244
}
2248-
slave->last_arp_rx = jiffies;
2245+
slave->last_rx = jiffies;
22492246
slave->target_last_arp_rx[i] = jiffies;
22502247
}
22512248

@@ -2255,15 +2252,16 @@ int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
22552252
struct arphdr *arp = (struct arphdr *)skb->data;
22562253
unsigned char *arp_ptr;
22572254
__be32 sip, tip;
2258-
int alen;
2255+
int alen, is_arp = skb->protocol == __cpu_to_be16(ETH_P_ARP);
22592256

2260-
if (skb->protocol != __cpu_to_be16(ETH_P_ARP))
2257+
if (!slave_do_arp_validate(bond, slave)) {
2258+
if ((slave_do_arp_validate_only(bond, slave) && is_arp) ||
2259+
!slave_do_arp_validate_only(bond, slave))
2260+
slave->last_rx = jiffies;
22612261
return RX_HANDLER_ANOTHER;
2262-
2263-
read_lock(&bond->lock);
2264-
2265-
if (!slave_do_arp_validate(bond, slave))
2266-
goto out_unlock;
2262+
} else if (!is_arp) {
2263+
return RX_HANDLER_ANOTHER;
2264+
}
22672265

22682266
alen = arp_hdr_len(bond->dev);
22692267

@@ -2314,11 +2312,10 @@ int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
23142312
bond_validate_arp(bond, slave, sip, tip);
23152313
else if (bond->curr_active_slave &&
23162314
time_after(slave_last_rx(bond, bond->curr_active_slave),
2317-
bond->curr_active_slave->jiffies))
2315+
bond->curr_active_slave->last_link_up))
23182316
bond_validate_arp(bond, slave, tip, sip);
23192317

23202318
out_unlock:
2321-
read_unlock(&bond->lock);
23222319
if (arp != (struct arphdr *)skb->data)
23232320
kfree(arp);
23242321
return RX_HANDLER_ANOTHER;
@@ -2361,9 +2358,9 @@ static void bond_loadbalance_arp_mon(struct work_struct *work)
23612358
oldcurrent = ACCESS_ONCE(bond->curr_active_slave);
23622359
/* see if any of the previous devices are up now (i.e. they have
23632360
* xmt and rcv traffic). the curr_active_slave does not come into
2364-
* the picture unless it is null. also, slave->jiffies is not needed
2365-
* here because we send an arp on each slave and give a slave as
2366-
* long as it needs to get the tx/rx within the delta.
2361+
* the picture unless it is null. also, slave->last_link_up is not
2362+
* needed here because we send an arp on each slave and give a slave
2363+
* as long as it needs to get the tx/rx within the delta.
23672364
* TODO: what about up/down delay in arp mode? it wasn't here before
23682365
* so it can wait
23692366
*/
@@ -2372,7 +2369,7 @@ static void bond_loadbalance_arp_mon(struct work_struct *work)
23722369

23732370
if (slave->link != BOND_LINK_UP) {
23742371
if (bond_time_in_interval(bond, trans_start, 1) &&
2375-
bond_time_in_interval(bond, slave->dev->last_rx, 1)) {
2372+
bond_time_in_interval(bond, slave->last_rx, 1)) {
23762373

23772374
slave->link = BOND_LINK_UP;
23782375
slave_state_changed = 1;
@@ -2401,7 +2398,7 @@ static void bond_loadbalance_arp_mon(struct work_struct *work)
24012398
* if we don't know our ip yet
24022399
*/
24032400
if (!bond_time_in_interval(bond, trans_start, 2) ||
2404-
!bond_time_in_interval(bond, slave->dev->last_rx, 2)) {
2401+
!bond_time_in_interval(bond, slave->last_rx, 2)) {
24052402

24062403
slave->link = BOND_LINK_DOWN;
24072404
slave_state_changed = 1;
@@ -2489,7 +2486,7 @@ static int bond_ab_arp_inspect(struct bonding *bond)
24892486
* active. This avoids bouncing, as the last receive
24902487
* times need a full ARP monitor cycle to be updated.
24912488
*/
2492-
if (bond_time_in_interval(bond, slave->jiffies, 2))
2489+
if (bond_time_in_interval(bond, slave->last_link_up, 2))
24932490
continue;
24942491

24952492
/*
@@ -2690,7 +2687,7 @@ static bool bond_ab_arp_probe(struct bonding *bond)
26902687
new_slave->link = BOND_LINK_BACK;
26912688
bond_set_slave_active_flags(new_slave);
26922689
bond_arp_send_all(bond, new_slave);
2693-
new_slave->jiffies = jiffies;
2690+
new_slave->last_link_up = jiffies;
26942691
rcu_assign_pointer(bond->current_arp_slave, new_slave);
26952692
rtnl_unlock();
26962693

@@ -3060,8 +3057,7 @@ static int bond_open(struct net_device *bond_dev)
30603057

30613058
if (bond->params.arp_interval) { /* arp interval, in milliseconds. */
30623059
queue_delayed_work(bond->wq, &bond->arp_work, 0);
3063-
if (bond->params.arp_validate)
3064-
bond->recv_probe = bond_arp_rcv;
3060+
bond->recv_probe = bond_arp_rcv;
30653061
}
30663062

30673063
if (bond->params.mode == BOND_MODE_8023AD) {
@@ -4186,10 +4182,6 @@ static int bond_check_params(struct bond_params *params)
41864182
}
41874183

41884184
if (arp_validate) {
4189-
if (bond_mode != BOND_MODE_ACTIVEBACKUP) {
4190-
pr_err("arp_validate only supported in active-backup mode\n");
4191-
return -EINVAL;
4192-
}
41934185
if (!arp_interval) {
41944186
pr_err("arp_validate requires arp_interval\n");
41954187
return -EINVAL;

drivers/net/bonding/bond_options.c

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -47,11 +47,14 @@ static struct bond_opt_value bond_xmit_hashtype_tbl[] = {
4747
};
4848

4949
static struct bond_opt_value bond_arp_validate_tbl[] = {
50-
{ "none", BOND_ARP_VALIDATE_NONE, BOND_VALFLAG_DEFAULT},
51-
{ "active", BOND_ARP_VALIDATE_ACTIVE, 0},
52-
{ "backup", BOND_ARP_VALIDATE_BACKUP, 0},
53-
{ "all", BOND_ARP_VALIDATE_ALL, 0},
54-
{ NULL, -1, 0},
50+
{ "none", BOND_ARP_VALIDATE_NONE, BOND_VALFLAG_DEFAULT},
51+
{ "active", BOND_ARP_VALIDATE_ACTIVE, 0},
52+
{ "backup", BOND_ARP_VALIDATE_BACKUP, 0},
53+
{ "all", BOND_ARP_VALIDATE_ALL, 0},
54+
{ "filter", BOND_ARP_FILTER, 0},
55+
{ "filter_active", BOND_ARP_FILTER_ACTIVE, 0},
56+
{ "filter_backup", BOND_ARP_FILTER_BACKUP, 0},
57+
{ NULL, -1, 0},
5558
};
5659

5760
static struct bond_opt_value bond_arp_all_targets_tbl[] = {
@@ -151,7 +154,8 @@ static struct bond_option bond_opts[] = {
151154
.id = BOND_OPT_ARP_VALIDATE,
152155
.name = "arp_validate",
153156
.desc = "validate src/dst of ARP probes",
154-
.unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_ACTIVEBACKUP)),
157+
.unsuppmodes = BIT(BOND_MODE_8023AD) | BIT(BOND_MODE_TLB) |
158+
BIT(BOND_MODE_ALB),
155159
.values = bond_arp_validate_tbl,
156160
.set = bond_option_arp_validate_set
157161
},
@@ -809,8 +813,7 @@ int bond_option_arp_interval_set(struct bonding *bond,
809813
cancel_delayed_work_sync(&bond->arp_work);
810814
} else {
811815
/* arp_validate can be set only in active-backup mode */
812-
if (bond->params.arp_validate)
813-
bond->recv_probe = bond_arp_rcv;
816+
bond->recv_probe = bond_arp_rcv;
814817
cancel_delayed_work_sync(&bond->mii_work);
815818
queue_delayed_work(bond->wq, &bond->arp_work, 0);
816819
}

drivers/net/bonding/bonding.h

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -188,8 +188,9 @@ struct slave {
188188
struct net_device *dev; /* first - useful for panic debug */
189189
struct bonding *bond; /* our master */
190190
int delay;
191-
unsigned long jiffies;
192-
unsigned long last_arp_rx;
191+
/* all three in jiffies */
192+
unsigned long last_link_up;
193+
unsigned long last_rx;
193194
unsigned long target_last_arp_rx[BOND_MAX_ARP_TARGETS];
194195
s8 link; /* one of BOND_LINK_XXXX */
195196
s8 new_link;
@@ -342,13 +343,24 @@ static inline bool bond_is_active_slave(struct slave *slave)
342343
#define BOND_ARP_VALIDATE_BACKUP (1 << BOND_STATE_BACKUP)
343344
#define BOND_ARP_VALIDATE_ALL (BOND_ARP_VALIDATE_ACTIVE | \
344345
BOND_ARP_VALIDATE_BACKUP)
346+
#define BOND_ARP_FILTER (BOND_ARP_VALIDATE_ALL + 1)
347+
#define BOND_ARP_FILTER_ACTIVE (BOND_ARP_VALIDATE_ACTIVE | \
348+
BOND_ARP_FILTER)
349+
#define BOND_ARP_FILTER_BACKUP (BOND_ARP_VALIDATE_BACKUP | \
350+
BOND_ARP_FILTER)
345351

346352
static inline int slave_do_arp_validate(struct bonding *bond,
347353
struct slave *slave)
348354
{
349355
return bond->params.arp_validate & (1 << bond_slave_state(slave));
350356
}
351357

358+
static inline int slave_do_arp_validate_only(struct bonding *bond,
359+
struct slave *slave)
360+
{
361+
return bond->params.arp_validate & BOND_ARP_FILTER;
362+
}
363+
352364
/* Get the oldest arp which we've received on this slave for bond's
353365
* arp_targets.
354366
*/
@@ -368,14 +380,10 @@ static inline unsigned long slave_oldest_target_arp_rx(struct bonding *bond,
368380
static inline unsigned long slave_last_rx(struct bonding *bond,
369381
struct slave *slave)
370382
{
371-
if (slave_do_arp_validate(bond, slave)) {
372-
if (bond->params.arp_all_targets == BOND_ARP_TARGETS_ALL)
373-
return slave_oldest_target_arp_rx(bond, slave);
374-
else
375-
return slave->last_arp_rx;
376-
}
383+
if (bond->params.arp_all_targets == BOND_ARP_TARGETS_ALL)
384+
return slave_oldest_target_arp_rx(bond, slave);
377385

378-
return slave->dev->last_rx;
386+
return slave->last_rx;
379387
}
380388

381389
#ifdef CONFIG_NET_POLL_CONTROLLER

include/linux/netdevice.h

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1312,13 +1312,7 @@ struct net_device {
13121312
/*
13131313
* Cache lines mostly used on receive path (including eth_type_trans())
13141314
*/
1315-
unsigned long last_rx; /* Time of last Rx
1316-
* This should not be set in
1317-
* drivers, unless really needed,
1318-
* because network stack (bonding)
1319-
* use it if/when necessary, to
1320-
* avoid dirtying this cache line.
1321-
*/
1315+
unsigned long last_rx; /* Time of last Rx */
13221316

13231317
/* Interface address info used in eth_type_trans() */
13241318
unsigned char *dev_addr; /* hw address, (before bcast

0 commit comments

Comments
 (0)