DIV Research Notes
Issues
-
How does session manager assign master ?- can we tell it to make the starting
host master ?
currently takes the first one with lowest load
-
how not to distribute an application ?
not possible, but we will include this in the Zajic applicationkit
-
window positions etc. are not distributed ?!
they are somehow, at least if dragged with a pen
-
crash with user without pen (maybe pip as well)
-
crash in exit event code by Zsolt
-
assert on useResource message, problems with PipKit
-
are contexts ever systemwide deleted ?
STB_DELETE_SLAVE_CONTEXT is nowhere sent,
STB_SLAVE_CONTEXT_DELETED, STB_MASTER_CONTEXT_DELETED are only
send by hosts and consumed by sman
only for the whole app, not if only the context is removed ! leaves slaves
in slave mode !
this works because all hosts delete their instances !! however it might
not be true for an application that is within several locales anymore...
- crash when opening a second context of the same app
in div destructor (delete crashes). FIX THAT !
current fix : don't delete :)
questions
Are pip sheets shared somehow ? while windows are referenced from a part in
the SoContextKit , nothing else is !
pip sheets could be shared now in the Zajic implementation, otherwise they
are excluded from distribution.
Remodeling DIV
The following additions should make it possible to put a lot of sman functionality
closer to the actual div groups that are involved in it. The ACE DIV version
simplifies the implementation as it already provides reliable communication
where all hosts can be senders (masters). the rest can be implemented on top
of that by extending the DIV protocol.
distribution of nodes to connecting slaves
DIV is to general to have the idea of a node being in different state. only
deltas are communicated, but the scenegraph has to be the same before this works.
There are some reasons for that :
- only the master copy has an idea of the nodes that are shared, because it
sets sensors on them.
- any references to nodes are communicated using unique node names.
- therefore the clients only fetch the nodes by name and change them.
- it is also possible that a master shares more than one node.
- statelessness at the slaves is very interesting because it simplifies things
a lot.
I'm not sure whether we should change that simple design, because of its generality.
however to achieve the distribution of the nodes to connecting slaves, we need
some more information :
- the nodes to be changed must be eiter replaceable (all their parents are
known)
- or of some type so that they can be copied (for example SoGroup )
- for a slave it must be clear when initialization is finished and no new
nodes need to be shared
The same mechanism could then be used to share new nodes on the fly.
I propose the following design to achieve distribution of new nodes and nodes
to connecting slaves :
- any additional functionality necessary is implemented in a class derived
from CDivMain
- an additional pair of messages is implemented DIV_REQUEST_NODETRANSFER,
DIV_TRANSFERNODE
- DIV_REQUEST_NODETRANSFER is used by a slave to request a transfer of
all nodes set to be shared using shareNode. these can be identified by
walking the list of node sensors. a parameter of this message is a randomly
generated token that the slave holds.
- DIV_TRANSFERNODE is then used to transfer the set of shared nodes together
with the token. slaves ignore this message if they already are synchronized.
the requesting slave identifies the message by the token and updates the
nodes to get a synchronized scene graph.
- to be able to update the nodes, only SoGroup nodes (or possibly derived
ones) are allowed as shared root nodes. also no field values are copied.
- DIV_TRANSFERNODE can also be used to copy a new scene graph. this is
achieved by using the token 0 which all slaves interpret and resync the
identified nodes.
switching master - slave
again this could be implemented ontop of the current div. an additional pair
of messages is implemented: DIV_REQUEST_MASTER, DIV_TRANSFER_MASTER.
- DIV_REQUEST_MASTER is send by the slave to the master using a unique token
again.
- Then the master answers using DIV_TRANSFER_MASTER and the token is used
by the slave to identify itself as the receiver.
sman2
Every operation is more or less idempotent. most act both ways, that is they
are send to the server as well as to hosts.
no checks are made on the validity or coherence of the data. only data necessary
to perform a certain operation is checked and error messages are produced if
something is wrong. in this case the operation is not performed and nothing
is changed or send to the hosts. it is assumed that the hosts follow the correct
protocol.
The following messages exist :
START_APPLICATION
- tells a host to initialize an empty application object under the given
name in the given locale.There is a flag to define whether an already running
application is added to the locale or a new application is started. In the
second case, only the issuing host actually loads the app from a file.
- the sman only keeps book and forwards the message to all hosts in the given
locale.
- the starting host is automatically the master, because it may be the only
one having the necessary file to load the application. the state is then distributed
via DIV to any slaves. later on the master can be reassigned.
SET_DIV_PARAM
- tells all hosts the given div parameters for an application. This is necessary
to start or stop sharing and inform hosts of the necessary parameters. This
is done independend of the START_APPLICATION message. So first hosts create
the necessary application structure in response to a START_APPLICATION message,
then they start using DIV in response to a SET_DIV_PARAM message.
- is never send by the hosts to the manager !
STOP_APPLICATION
- send from the host there to the manager, it removes the application from
a certain locale. if the app is not present in any locale, it will be removed
from the sman.
- tells hosts to stop an application. this will remove the application from
the given locale it is part of at the host.
SET_APPLICATION_MODE
- because the actual master mode changes are communciated now via DIV, this
is used by hosts to inform the manager of the current status.
- the manager uses this message to tell a certain host to become the master
for an application. the hosts than inform the manager about the outcome of
the operation.
JOIN_LOCALE
- tells the manager that a certain host joins a given locale. if the locale
doesn't exist an appropriate entry is created at the manager. (note that hosts
may have private locales that are not communicated to the manager !).
- if the locale already contains applications and users, these will be communicated
to the host via START_APPLICATION, SET_DIV_PARAM and ADD_USER messages.
LEAVE_LOCALE
- tells the manager that a certain host leaves a given locale. no clean up
is performed by the manager, it is assumed that the application did that before
hand.
- TODO the manager should check for applications that loose their master in
the host and therefore need reassignment.
ADD_USER
- a host has configured users that it is responsible for. typically these
are users the view of which is rendered by the host. it only needs to add
those users to the system and remove again.
- simply adds the necessary userkit information to a given locale. this contains
the userid and a description of the userkit. no checks are done to make sure
the user doesn't exist in other locales.
- all hosts (except the sender) in a given locale will be notified of the
new user.
REMOVE_USER
- simply tells the manager to remove a user from a locale. this contains
only the userid.
- all hosts (except the sender) in the locale will be notified of the removed
user.
USERKIT_PARAM
- a simple wrapper message around various userkit related messages distributed.
it contains the userid, the underlying message type and the message data as
a data block.
- all hosts in the locale of the userkit will receive this message.
- this is basically a quick adaption of the necessary functionality. it would
be interesting to solve this with shared userkits ??
The following events are no commands per se, but as something meaningful should
happen, it is treated like an event happening in the system.
Host connecting
- not much, necessary data structures are constructed at the manager.
Host disconnecting
- The protocol is that any host disconnecting should propably clean up. however
what todo if that is not happening ? The following checks will make sure that
the manager and the system stay in a meaningful state.
- check for any master applications the host ran. these will be stopped (will
at least draw attention to the users that something went wrong).
- check for any slave applications the host ran. these will be reevaluated
for sharing.
- check for any users the host added. a host only adds users that it renders
(or manages anyway), therefore these will be removed as well.
application id name space
In the DIV 2.0 version all hosts were closely coupled and started all applications
at the same time. In this setup it was enough to let one host establish a unique
context id and share it with the others via streaming etc.
Locales break this assumption because hosts may start applications unknown
from each other and integrate them at a later point into their view of the world.
Again unique identifiers need to be established but know in a situation where
concurrency issues can evolve.
The solution is to establish a name space for each host so that each host can
construct unique application ids without colliding with other hosts. This could
be done at two points :
- in the component managing the local application instances. This requires
that a local component concerns itself with a problem that is not really in
its domain.
- the distribution component at each host maps the local app ids to a name
space structure that is shared by all hosts. the mapping works transparent
to the local component and the distribution system.
I implemented the 2. solution because I think it adequately separates the different
responsibilities to the appropriate components. This is implemented by prefixing
the local id with the host ip address and local port number used to connect
to the session manager. These form a unique name space for each process, therefore
no conflicts can occur.
common actions
loading shared apps at startup
triggered at startup, involves the following messages :
STB_QUERY_APPS_FROM_SMAN
STB_STREAM_CONTEXT
STB_CONTEXT_STREAMED
STB_CONTEXT_STREAMED_WITH_DIV
STB_START_DIV_STREAMED
load a new application
load a new application, involves the following messages :
STB_LOAD_APPLICATION
STB_CREATE_SLAVE_CONTEXT
STB_SLAVE_CONTEXT_CREATED
STB_START_DIV
request master mode on a certain host
get master to host, involves the following messages :
STB_GET_MASTER_CONTEXT
STB_SET_SLAVE_CONTEXT
STB_SLAVE_CONTEXT_SET
STB_SET_MASTER_CONTEXT
STB_MASTER_CONTEXT_SET
STB_MIGRATE_SLAVE_CONTEXT
STB_SLAVE_CONTEXT_MIGRATED
STB_START_DIV_MIGRATED