1==========================
2Bulk Register Access (BRA)
3==========================
4
5Conventions
6-----------
7
8Capitalized words used in this documentation are intentional and refer
9to concepts of the SoundWire 1.x specification.
10
11Introduction
12------------
13
14The SoundWire 1.x specification provides a mechanism to speed-up
15command/control transfers by reclaiming parts of the audio
16bandwidth. The Bulk Register Access (BRA) protocol is a standard
17solution based on the Bulk Payload Transport (BPT) definitions.
18
19The regular control channel uses Column 0 and can only send/retrieve
20one byte per frame with write/read commands. With a typical 48kHz
21frame rate, only 48kB/s can be transferred.
22
23The optional Bulk Register Access capability can transmit up to 12
24Mbits/s and reduce transfer times by several orders of magnitude, but
25has multiple design constraints:
26
27  (1) Each frame can only support a read or a write transfer, with a
28      10-byte overhead per frame (header and footer response).
29
30  (2) The read/writes SHALL be from/to contiguous register addresses
31      in the same frame. A fragmented register space decreases the
32      efficiency of the protocol by requiring multiple BRA transfers
33      scheduled in different frames.
34
35  (3) The targeted Peripheral device SHALL support the optional Data
36      Port 0, and likewise the Manager SHALL expose audio-like Ports
37      to insert BRA packets in the audio payload using the concepts of
38      Sample Interval, HSTART, HSTOP, etc.
39
40  (4) The BRA transport efficiency depends on the available
41      bandwidth. If there are no on-going audio transfers, the entire
42      frame minus Column 0 can be reclaimed for BRA. The frame shape
43      also impacts efficiency: since Column0 cannot be used for
44      BTP/BRA, the frame should rely on a large number of columns and
45      minimize the number of rows. The bus clock should be as high as
46      possible.
47
48  (5) The number of bits transferred per frame SHALL be a multiple of
49      8 bits. Padding bits SHALL be inserted if necessary at the end
50      of the data.
51
52  (6) The regular read/write commands can be issued in parallel with
53      BRA transfers. This is convenient to e.g. deal with alerts, jack
54      detection or change the volume during firmware download, but
55      accessing the same address with two independent protocols has to
56      be avoided to avoid undefined behavior.
57
58  (7) Some implementations may not be capable of handling the
59      bandwidth of the BRA protocol, e.g. in the case of a slow I2C
60      bus behind the SoundWire IP. In this case, the transfers may
61      need to be spaced in time or flow-controlled.
62
63  (8) Each BRA packet SHALL be marked as 'Active' when valid data is
64      to be transmitted. This allows for software to allocate a BRA
65      stream but not transmit/discard data while processing the
66      results or preparing the next batch of data, or allowing the
67      peripheral to deal with the previous transfer. In addition BRA
68      transfer can be started early on without data being ready.
69
70  (9) Up to 470 bytes may be transmitted per frame.
71
72  (10) The address is represented with 32 bits and does not rely on
73       the paging registers used for the regular command/control
74       protocol in Column 0.
75
76
77Error checking
78--------------
79
80Firmware download is one of the key usages of the Bulk Register Access
81protocol. To make sure the binary data integrity is not compromised by
82transmission or programming errors, each BRA packet provides:
83
84  (1) A CRC on the 7-byte header. This CRC helps the Peripheral Device
85      check if it is addressed and set the start address and number of
86      bytes. The Peripheral Device provides a response in Byte 7.
87
88  (2) A CRC on the data block (header excluded). This CRC is
89      transmitted as the last-but-one byte in the packet, prior to the
90      footer response.
91
92The header response can be one of:
93  (a) Ack
94  (b) Nak
95  (c) Not Ready
96
97The footer response can be one of:
98  (1) Ack
99  (2) Nak  (CRC failure)
100  (3) Good (operation completed)
101  (4) Bad  (operation failed)
102
103Example frame
104-------------
105
106The example below is not to scale and makes simplifying assumptions
107for clarity. The different chunks in the BRA packets are not required
108to start on a new SoundWire Row, and the scale of data may vary.
109
110      ::
111
112	+---+--------------------------------------------+
113	+   |                                            |
114	+   |             BRA HEADER                     |
115	+   |                                            |
116	+   +--------------------------------------------+
117	+ C |             HEADER CRC                     |
118	+ O +--------------------------------------------+
119	+ M | 	          HEADER RESPONSE                |
120	+ M +--------------------------------------------+
121	+ A |                                            |
122	+ N |                                            |
123	+ D |                 DATA                       |
124	+   |                                            |
125	+   |                                            |
126	+   |                                            |
127	+   +--------------------------------------------+
128	+   |             DATA CRC                       |
129	+   +--------------------------------------------+
130	+   | 	          FOOTER RESPONSE                |
131	+---+--------------------------------------------+
132
133
134Assuming the frame uses N columns, the configuration shown above can
135be programmed by setting the DP0 registers as:
136
137    - HSTART = 1
138    - HSTOP = N - 1
139    - Sampling Interval = N
140    - WordLength = N - 1
141
142Addressing restrictions
143-----------------------
144
145The Device Number specified in the Header follows the SoundWire
146definitions, and broadcast and group addressing are permitted. For now
147the Linux implementation only allows for a single BPT transfer to a
148single device at a time. This might be revisited at a later point as
149an optimization to send the same firmware to multiple devices, but
150this would only be beneficial for single-link solutions.
151
152In the case of multiple Peripheral devices attached to different
153Managers, the broadcast and group addressing is not supported by the
154SoundWire specification. Each device must be handled with separate BRA
155streams, possibly in parallel - the links are really independent.
156
157Unsupported features
158--------------------
159
160The Bulk Register Access specification provides a number of
161capabilities that are not supported in known implementations, such as:
162
163  (1) Transfers initiated by a Peripheral Device. The BRA Initiator is
164      always the Manager Device.
165
166  (2) Flow-control capabilities and retransmission based on the
167      'NotReady' header response require extra buffering in the
168      SoundWire IP and are not implemented.
169
170Bi-directional handling
171-----------------------
172
173The BRA protocol can handle writes as well as reads, and in each
174packet the header and footer response are provided by the Peripheral
175Target device. On the Peripheral device, the BRA protocol is handled
176by a single DP0 data port, and at the low-level the bus ownership can
177will change for header/footer response as well as the data transmitted
178during a read.
179
180On the host side, most implementations rely on a Port-like concept,
181with two FIFOs consuming/generating data transfers in parallel
182(Host->Peripheral and Peripheral->Host). The amount of data
183consumed/produced by these FIFOs is not symmetrical, as a result
184hardware typically inserts markers to help software and hardware
185interpret raw data
186
187Each packet will typically have:
188
189  (1) a 'Start of Packet' indicator.
190
191  (2) an 'End of Packet' indicator.
192
193  (3) a packet identifier to correlate the data requested and
194      transmitted, and the error status for each frame
195
196Hardware implementations can check errors at the frame level, and
197retry a transfer in case of errors. However, as for the flow-control
198case, this requires extra buffering and intelligence in the
199hardware. The Linux support assumes that the entire transfer is
200cancelled if a single error is detected in one of the responses.
201
202Abstraction required
203~~~~~~~~~~~~~~~~~~~~
204
205There are no standard registers or mandatory implementation at the
206Manager level, so the low-level BPT/BRA details must be hidden in
207Manager-specific code. For example the Cadence IP format above is not
208known to the codec drivers.
209
210Likewise, codec drivers should not have to know the frame size. The
211computation of CRC and handling of responses is handled in helpers and
212Manager-specific code.
213
214The host BRA driver may also have restrictions on pages allocated for
215DMA, or other host-DSP communication protocols. The codec driver
216should not be aware of any of these restrictions, since it might be
217reused in combination with different implementations of Manager IPs.
218
219Concurrency between BRA and regular read/write
220~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
221
222The existing 'nread/nwrite' API already relies on a notion of start
223address and number of bytes, so it would be possible to extend this
224API with a 'hint' requesting BPT/BRA be used.
225
226However BRA transfers could be quite long, and the use of a single
227mutex for regular read/write and BRA is a show-stopper. Independent
228operation of the control/command and BRA transfers is a fundamental
229requirement, e.g. to change the volume level with the existing regmap
230interface while downloading firmware. The integration must however
231ensure that there are no concurrent access to the same address with
232the command/control protocol and the BRA protocol.
233
234In addition, the 'sdw_msg' structure hard-codes support for 16-bit
235addresses and paging registers which are irrelevant for BPT/BRA
236support based on native 32-bit addresses. A separate API with
237'sdw_bpt_msg' makes more sense.
238
239One possible strategy to speed-up all initialization tasks would be to
240start a BRA transfer for firmware download, then deal with all the
241"regular" read/writes in parallel with the command channel, and last
242to wait for the BRA transfers to complete. This would allow for a
243degree of overlap instead of a purely sequential solution. As such,
244the BRA API must support async transfers and expose a separate wait
245function.
246
247
248Peripheral/bus interface
249------------------------
250
251The bus interface for BPT/BRA is made of two functions:
252
253    - sdw_bpt_send_async(bpt_message)
254
255      This function sends the data using the Manager
256      implementation-defined capabilities (typically DMA or IPC
257      protocol).
258
259      Queueing is currently not supported, the caller
260      needs to wait for completion of the requested transfer.
261
262   - sdw_bpt_wait()
263
264      This function waits for the entire message provided by the
265      codec driver in the 'send_async' stage. Intermediate status for
266      smaller chunks will not be provided back to the codec driver,
267      only a return code will be provided.
268
269Regmap use
270~~~~~~~~~~
271
272Existing codec drivers rely on regmap to download firmware to
273Peripherals. regmap exposes an async interface similar to the
274send/wait API suggested above, so at a high-level it would seem
275natural to combine BRA and regmap. The regmap layer could check if BRA
276is available or not, and use a regular read-write command channel in
277the latter case.
278
279The regmap integration will be handled in a second step.
280
281BRA stream model
282----------------
283
284For regular audio transfers, the machine driver exposes a dailink
285connecting CPU DAI(s) and Codec DAI(s).
286
287This model is not required BRA support:
288
289   (1) The SoundWire DAIs are mainly wrappers for SoundWire Data
290       Ports, with possibly some analog or audio conversion
291       capabilities bolted behind the Data Port. In the context of
292       BRA, the DP0 is the destination. DP0 registers are standard and
293       can be programmed blindly without knowing what Peripheral is
294       connected to each link. In addition, if there are multiple
295       Peripherals on a link and some of them do not support DP0, the
296       write commands to program DP0 registers will generate harmless
297       COMMAND_IGNORED responses that will be wired-ORed with
298       responses from Peripherals which support DP0. In other words,
299       the DP0 programming can be done with broadcast commands, and
300       the information on the Target device can be added only in the
301       BRA Header.
302
303   (2) At the CPU level, the DAI concept is not useful for BRA; the
304       machine driver will not create a dailink relying on DP0. The
305       only concept that is needed is the notion of port.
306
307   (3) The stream concept relies on a set of master_rt and slave_rt
308       concepts. All of these entities represent ports and not DAIs.
309
310   (4) With the assumption that a single BRA stream is used per link,
311       that stream can connect master ports as well as all peripheral
312       DP0 ports.
313
314   (5) BRA transfers only make sense in the context of one
315       Manager/Link, so the BRA stream handling does not rely on the
316       concept of multi-link aggregation allowed by regular DAI links.
317
318Audio DMA support
319-----------------
320
321Some DMAs, such as HDaudio, require an audio format field to be
322set. This format is in turn used to define acceptable bursts. BPT/BRA
323support is not fully compatible with these definitions in that the
324format and bandwidth may vary between read and write commands.
325
326In addition, on Intel HDaudio Intel platforms the DMAs need to be
327programmed with a PCM format matching the bandwidth of the BPT/BRA
328transfer. The format is based on 192kHz 32-bit samples, and the number
329of channels varies to adjust the bandwidth. The notion of channel is
330completely notional since the data is not typical audio
331PCM. Programming such channels helps reserve enough bandwidth and adjust
332FIFO sizes to avoid xruns.
333
334Alignment requirements are currently not enforced at the core level
335but at the platform-level, e.g. for Intel the data sizes must be
336multiples of 32 bytes.
337