diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/arm/OMAP/omap_pm | 129 | ||||
-rw-r--r-- | Documentation/cpu-freq/user-guide.txt | 9 | ||||
-rw-r--r-- | Documentation/dontdiff | 1 | ||||
-rw-r--r-- | Documentation/feature-removal-schedule.txt | 10 | ||||
-rw-r--r-- | Documentation/hwmon/pcf8591 | 28 | ||||
-rw-r--r-- | Documentation/hwmon/tmp421 | 36 | ||||
-rw-r--r-- | Documentation/hwmon/wm831x | 37 | ||||
-rw-r--r-- | Documentation/hwmon/wm8350 | 26 | ||||
-rw-r--r-- | Documentation/kernel-doc-nano-HOWTO.txt | 4 | ||||
-rw-r--r-- | Documentation/kernel-parameters.txt | 6 | ||||
-rw-r--r-- | Documentation/kref.txt | 1 | ||||
-rw-r--r-- | Documentation/trace/events.txt | 184 | ||||
-rw-r--r-- | Documentation/trace/ftrace-design.txt | 233 | ||||
-rw-r--r-- | Documentation/trace/ftrace.txt | 6 |
14 files changed, 680 insertions, 30 deletions
diff --git a/Documentation/arm/OMAP/omap_pm b/Documentation/arm/OMAP/omap_pm new file mode 100644 index 00000000000..5389440aade --- /dev/null +++ b/Documentation/arm/OMAP/omap_pm @@ -0,0 +1,129 @@ + +The OMAP PM interface +===================== + +This document describes the temporary OMAP PM interface. Driver +authors use these functions to communicate minimum latency or +throughput constraints to the kernel power management code. +Over time, the intention is to merge features from the OMAP PM +interface into the Linux PM QoS code. + +Drivers need to express PM parameters which: + +- support the range of power management parameters present in the TI SRF; + +- separate the drivers from the underlying PM parameter + implementation, whether it is the TI SRF or Linux PM QoS or Linux + latency framework or something else; + +- specify PM parameters in terms of fundamental units, such as + latency and throughput, rather than units which are specific to OMAP + or to particular OMAP variants; + +- allow drivers which are shared with other architectures (e.g., + DaVinci) to add these constraints in a way which won't affect non-OMAP + systems, + +- can be implemented immediately with minimal disruption of other + architectures. + + +This document proposes the OMAP PM interface, including the following +five power management functions for driver code: + +1. Set the maximum MPU wakeup latency: + (*pdata->set_max_mpu_wakeup_lat)(struct device *dev, unsigned long t) + +2. Set the maximum device wakeup latency: + (*pdata->set_max_dev_wakeup_lat)(struct device *dev, unsigned long t) + +3. Set the maximum system DMA transfer start latency (CORE pwrdm): + (*pdata->set_max_sdma_lat)(struct device *dev, long t) + +4. Set the minimum bus throughput needed by a device: + (*pdata->set_min_bus_tput)(struct device *dev, u8 agent_id, unsigned long r) + +5. Return the number of times the device has lost context + (*pdata->get_dev_context_loss_count)(struct device *dev) + + +Further documentation for all OMAP PM interface functions can be +found in arch/arm/plat-omap/include/mach/omap-pm.h. + + +The OMAP PM layer is intended to be temporary +--------------------------------------------- + +The intention is that eventually the Linux PM QoS layer should support +the range of power management features present in OMAP3. As this +happens, existing drivers using the OMAP PM interface can be modified +to use the Linux PM QoS code; and the OMAP PM interface can disappear. + + +Driver usage of the OMAP PM functions +------------------------------------- + +As the 'pdata' in the above examples indicates, these functions are +exposed to drivers through function pointers in driver .platform_data +structures. The function pointers are initialized by the board-*.c +files to point to the corresponding OMAP PM functions: +.set_max_dev_wakeup_lat will point to +omap_pm_set_max_dev_wakeup_lat(), etc. Other architectures which do +not support these functions should leave these function pointers set +to NULL. Drivers should use the following idiom: + + if (pdata->set_max_dev_wakeup_lat) + (*pdata->set_max_dev_wakeup_lat)(dev, t); + +The most common usage of these functions will probably be to specify +the maximum time from when an interrupt occurs, to when the device +becomes accessible. To accomplish this, driver writers should use the +set_max_mpu_wakeup_lat() function to to constrain the MPU wakeup +latency, and the set_max_dev_wakeup_lat() function to constrain the +device wakeup latency (from clk_enable() to accessibility). For +example, + + /* Limit MPU wakeup latency */ + if (pdata->set_max_mpu_wakeup_lat) + (*pdata->set_max_mpu_wakeup_lat)(dev, tc); + + /* Limit device powerdomain wakeup latency */ + if (pdata->set_max_dev_wakeup_lat) + (*pdata->set_max_dev_wakeup_lat)(dev, td); + + /* total wakeup latency in this example: (tc + td) */ + +The PM parameters can be overwritten by calling the function again +with the new value. The settings can be removed by calling the +function with a t argument of -1 (except in the case of +set_max_bus_tput(), which should be called with an r argument of 0). + +The fifth function above, omap_pm_get_dev_context_loss_count(), +is intended as an optimization to allow drivers to determine whether the +device has lost its internal context. If context has been lost, the +driver must restore its internal context before proceeding. + + +Other specialized interface functions +------------------------------------- + +The five functions listed above are intended to be usable by any +device driver. DSPBridge and CPUFreq have a few special requirements. +DSPBridge expresses target DSP performance levels in terms of OPP IDs. +CPUFreq expresses target MPU performance levels in terms of MPU +frequency. The OMAP PM interface contains functions for these +specialized cases to convert that input information (OPPs/MPU +frequency) into the form that the underlying power management +implementation needs: + +6. (*pdata->dsp_get_opp_table)(void) + +7. (*pdata->dsp_set_min_opp)(u8 opp_id) + +8. (*pdata->dsp_get_opp)(void) + +9. (*pdata->cpu_get_freq_table)(void) + +10. (*pdata->cpu_set_freq)(unsigned long f) + +11. (*pdata->cpu_get_freq)(void) diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt index 5d5f5fadd1c..2a5b850847c 100644 --- a/Documentation/cpu-freq/user-guide.txt +++ b/Documentation/cpu-freq/user-guide.txt @@ -176,7 +176,9 @@ scaling_governor, and by "echoing" the name of another work on some specific architectures or processors. -cpuinfo_cur_freq : Current speed of the CPU, in KHz. +cpuinfo_cur_freq : Current frequency of the CPU as obtained from + the hardware, in KHz. This is the frequency + the CPU actually runs at. scaling_available_frequencies : List of available frequencies, in KHz. @@ -196,7 +198,10 @@ related_cpus : List of CPUs that need some sort of frequency scaling_driver : Hardware driver for cpufreq. -scaling_cur_freq : Current frequency of the CPU, in KHz. +scaling_cur_freq : Current frequency of the CPU as determined by + the governor and cpufreq core, in KHz. This is + the frequency the kernel thinks the CPU runs + at. If you have selected the "userspace" governor which allows you to set the CPU operating frequency to a specific value, you can read out diff --git a/Documentation/dontdiff b/Documentation/dontdiff index 88519daab6e..e1efc400bed 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff @@ -152,7 +152,6 @@ piggy.gz piggyback pnmtologo ppc_defs.h* -promcon_tbl.c pss_boot.h qconf raid6altivec*.c diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 503d21216d5..fa75220f8d3 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -428,16 +428,6 @@ Who: Johannes Berg <johannes@sipsolutions.net> ---------------------------- -What: CONFIG_X86_OLD_MCE -When: 2.6.32 -Why: Remove the old legacy 32bit machine check code. This has been - superseded by the newer machine check code from the 64bit port, - but the old version has been kept around for easier testing. Note this - doesn't impact the old P5 and WinChip machine check handlers. -Who: Andi Kleen <andi@firstfloor.org> - ----------------------------- - What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be exported interface anymore. When: 2.6.33 diff --git a/Documentation/hwmon/pcf8591 b/Documentation/hwmon/pcf8591 index 5628fcf4207..e76a7892f68 100644 --- a/Documentation/hwmon/pcf8591 +++ b/Documentation/hwmon/pcf8591 @@ -2,11 +2,11 @@ Kernel driver pcf8591 ===================== Supported chips: - * Philips PCF8591 + * Philips/NXP PCF8591 Prefix: 'pcf8591' Addresses scanned: I2C 0x48 - 0x4f - Datasheet: Publicly available at the Philips Semiconductor website - http://www.semiconductors.philips.com/pip/PCF8591P.html + Datasheet: Publicly available at the NXP website + http://www.nxp.com/pip/PCF8591_6.html Authors: Aurelien Jarno <aurelien@aurel32.net> @@ -16,9 +16,10 @@ Authors: Description ----------- + The PCF8591 is an 8-bit A/D and D/A converter (4 analog inputs and one -analog output) for the I2C bus produced by Philips Semiconductors. It -is designed to provide a byte I2C interface to up to 4 separate devices. +analog output) for the I2C bus produced by Philips Semiconductors (now NXP). +It is designed to provide a byte I2C interface to up to 4 separate devices. The PCF8591 has 4 analog inputs programmable as single-ended or differential inputs : @@ -58,8 +59,8 @@ Accessing PCF8591 via /sys interface ------------------------------------- ! Be careful ! -The PCF8591 is plainly impossible to detect ! Stupid chip. -So every chip with address in the interval [48..4f] is +The PCF8591 is plainly impossible to detect! Stupid chip. +So every chip with address in the interval [0x48..0x4f] is detected as PCF8591. If you have other chips in this address range, the workaround is to load this module after the one for your others chips. @@ -67,19 +68,20 @@ for your others chips. On detection (i.e. insmod, modprobe et al.), directories are being created for each detected PCF8591: -/sys/bus/devices/<0>-<1>/ +/sys/bus/i2c/devices/<0>-<1>/ where <0> is the bus the chip was detected on (e. g. i2c-0) and <1> the chip address ([48..4f]) Inside these directories, there are such files: -in0, in1, in2, in3, out0_enable, out0_output, name +in0_input, in1_input, in2_input, in3_input, out0_enable, out0_output, name Name contains chip name. -The in0, in1, in2 and in3 files are RO. Reading gives the value of the -corresponding channel. Depending on the current analog inputs configuration, -files in2 and/or in3 do not exist. Values range are from 0 to 255 for single -ended inputs and -128 to +127 for differential inputs (8-bit ADC). +The in0_input, in1_input, in2_input and in3_input files are RO. Reading gives +the value of the corresponding channel. Depending on the current analog inputs +configuration, files in2_input and in3_input may not exist. Values range +from 0 to 255 for single ended inputs and -128 to +127 for differential inputs +(8-bit ADC). The out0_enable file is RW. Reading gives "1" for analog output enabled and "0" for analog output disabled. Writing accepts "0" and "1" accordingly. diff --git a/Documentation/hwmon/tmp421 b/Documentation/hwmon/tmp421 new file mode 100644 index 00000000000..0cf07f82474 --- /dev/null +++ b/Documentation/hwmon/tmp421 @@ -0,0 +1,36 @@ +Kernel driver tmp421 +==================== + +Supported chips: + * Texas Instruments TMP421 + Prefix: 'tmp421' + Addresses scanned: I2C 0x2a, 0x4c, 0x4d, 0x4e and 0x4f + Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp421.html + * Texas Instruments TMP422 + Prefix: 'tmp422' + Addresses scanned: I2C 0x2a, 0x4c, 0x4d, 0x4e and 0x4f + Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp421.html + * Texas Instruments TMP423 + Prefix: 'tmp423' + Addresses scanned: I2C 0x2a, 0x4c, 0x4d, 0x4e and 0x4f + Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp421.html + +Authors: + Andre Prendel <andre.prendel@gmx.de> + +Description +----------- + +This driver implements support for Texas Instruments TMP421, TMP422 +and TMP423 temperature sensor chips. These chips implement one local +and up to one (TMP421), up to two (TMP422) or up to three (TMP423) +remote sensors. Temperature is measured in degrees Celsius. The chips +are wired over I2C/SMBus and specified over a temperature range of -40 +to +125 degrees Celsius. Resolution for both the local and remote +channels is 0.0625 degree C. + +The chips support only temperature measurement. The driver exports +the temperature values via the following sysfs files: + +temp[1-4]_input +temp[2-4]_fault diff --git a/Documentation/hwmon/wm831x b/Documentation/hwmon/wm831x new file mode 100644 index 00000000000..24f47d8f6a4 --- /dev/null +++ b/Documentation/hwmon/wm831x @@ -0,0 +1,37 @@ +Kernel driver wm831x-hwmon +========================== + +Supported chips: + * Wolfson Microelectronics WM831x PMICs + Prefix: 'wm831x' + Datasheet: + http://www.wolfsonmicro.com/products/WM8310 + http://www.wolfsonmicro.com/products/WM8311 + http://www.wolfsonmicro.com/products/WM8312 + +Authors: Mark Brown <broonie@opensource.wolfsonmicro.com> + +Description +----------- + +The WM831x series of PMICs include an AUXADC which can be used to +monitor a range of system operating parameters, including the voltages +of the major supplies within the system. Currently the driver provides +reporting of all the input values but does not provide any alarms. + +Voltage Monitoring +------------------ + +Voltages are sampled by a 12 bit ADC. Voltages in milivolts are 1.465 +times the ADC value. + +Temperature Monitoring +---------------------- + +Temperatures are sampled by a 12 bit ADC. Chip and battery temperatures +are available. The chip temperature is calculated as: + + Degrees celsius = (512.18 - data) / 1.0983 + +while the battery temperature calculation will depend on the NTC +thermistor component. diff --git a/Documentation/hwmon/wm8350 b/Documentation/hwmon/wm8350 new file mode 100644 index 00000000000..98f923bd2e9 --- /dev/null +++ b/Documentation/hwmon/wm8350 @@ -0,0 +1,26 @@ +Kernel driver wm8350-hwmon +========================== + +Supported chips: + * Wolfson Microelectronics WM835x PMICs + Prefix: 'wm8350' + Datasheet: + http://www.wolfsonmicro.com/products/WM8350 + http://www.wolfsonmicro.com/products/WM8351 + http://www.wolfsonmicro.com/products/WM8352 + +Authors: Mark Brown <broonie@opensource.wolfsonmicro.com> + +Description +----------- + +The WM835x series of PMICs include an AUXADC which can be used to +monitor a range of system operating parameters, including the voltages +of the major supplies within the system. Currently the driver provides +simple access to these major supplies. + +Voltage Monitoring +------------------ + +Voltages are sampled by a 12 bit ADC. For the internal supplies the ADC +is referenced to the system VRTC. diff --git a/Documentation/kernel-doc-nano-HOWTO.txt b/Documentation/kernel-doc-nano-HOWTO.txt index 4d04572b654..348b9e5e28f 100644 --- a/Documentation/kernel-doc-nano-HOWTO.txt +++ b/Documentation/kernel-doc-nano-HOWTO.txt @@ -66,7 +66,9 @@ Example kernel-doc function comment: * The longer description can have multiple paragraphs. */ -The first line, with the short description, must be on a single line. +The short description following the subject can span multiple lines +and ends with an @argument description, an empty line or the end of +the comment block. The @argument descriptions must begin on the very next line following this opening short function description line, with no intervening diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 4c12a290bee..0f17d16dc10 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1286,6 +1286,10 @@ and is between 256 and 4096 characters. It is defined in the file (machvec) in a generic kernel. Example: machvec=hpzx1_swiotlb + machtype= [Loongson] Share the same kernel image file between different + yeeloong laptop. + Example: machtype=lemote-yeeloong-2f-7inch + max_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory greater than or equal to this physical address is ignored. @@ -1561,7 +1565,7 @@ and is between 256 and 4096 characters. It is defined in the file of returning the full 64-bit number. The default is to return 64-bit inode numbers. - nmi_debug= [KNL,AVR32] Specify one or more actions to take + nmi_debug= [KNL,AVR32,SH] Specify one or more actions to take when a NMI is triggered. Format: [state][,regs][,debounce][,die] diff --git a/Documentation/kref.txt b/Documentation/kref.txt index 130b6e87aa7..ae203f91ee9 100644 --- a/Documentation/kref.txt +++ b/Documentation/kref.txt @@ -84,7 +84,6 @@ int my_data_handler(void) task = kthread_run(more_data_handling, data, "more_data_handling"); if (task == ERR_PTR(-ENOMEM)) { rv = -ENOMEM; - kref_put(&data->refcount, data_release); goto out; } diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt index 90e8b3383ba..78c45a87be5 100644 --- a/Documentation/trace/events.txt +++ b/Documentation/trace/events.txt @@ -1,7 +1,7 @@ Event Tracing Documentation written by Theodore Ts'o - Updated by Li Zefan + Updated by Li Zefan and Tom Zanussi 1. Introduction =============== @@ -97,3 +97,185 @@ The format of this boot option is the same as described in section 2.1. See The example provided in samples/trace_events +4. Event formats +================ + +Each trace event has a 'format' file associated with it that contains +a description of each field in a logged event. This information can +be used to parse the binary trace stream, and is also the place to +find the field names that can be used in event filters (see section 5). + +It also displays the format string that will be used to print the +event in text mode, along with the event name and ID used for +profiling. + +Every event has a set of 'common' fields associated with it; these are +the fields prefixed with 'common_'. The other fields vary between +events and correspond to the fields defined in the TRACE_EVENT +definition for that event. + +Each field in the format has the form: + + field:field-type field-name; offset:N; size:N; + +where offset is the offset of the field in the trace record and size +is the size of the data item, in bytes. + +For example, here's the information displayed for the 'sched_wakeup' +event: + +# cat /debug/tracing/events/sched/sched_wakeup/format + +name: sched_wakeup +ID: 60 +format: + field:unsigned short common_type; offset:0; size:2; + field:unsigned char common_flags; offset:2; size:1; + field:unsigned char common_preempt_count; offset:3; size:1; + field:int common_pid; offset:4; size:4; + field:int common_tgid; offset:8; size:4; + + field:char comm[TASK_COMM_LEN]; offset:12; size:16; + field:pid_t pid; offset:28; size:4; + field:int prio; offset:32; size:4; + field:int success; offset:36; size:4; + field:int cpu; offset:40; size:4; + +print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid, + REC->prio, REC->success, REC->cpu + +This event contains 10 fields, the first 5 common and the remaining 5 +event-specific. All the fields for this event are numeric, except for +'comm' which is a string, a distinction important for event filtering. + +5. Event filtering +================== + +Trace events can be filtered in the kernel by associating boolean +'filter expressions' with them. As soon as an event is logged into +the trace buffer, its fields are checked against the filter expression +associated with that event type. An event with field values that +'match' the filter will appear in the trace output, and an event whose +values don't match will be discarded. An event with no filter +associated with it matches everything, and is the default when no +filter has been set for an event. + +5.1 Expression syntax +--------------------- + +A filter expression consists of one or more 'predicates' that can be +combined using the logical operators '&&' and '||'. A predicate is +simply a clause that compares the value of a field contained within a +logged event with a constant value and returns either 0 or 1 depending +on whether the field value matched (1) or didn't match (0): + + field-name relational-operator value + +Parentheses can be used to provide arbitrary logical groupings and +double-quotes can be used to prevent the shell from interpreting +operators as shell metacharacters. + +The field-names available for use in filters can be found in the +'format' files for trace events (see section 4). + +The relational-operators depend on the type of the field being tested: + +The operators available for numeric fields are: + +==, !=, <, <=, >, >= + +And for string fields they are: + +==, != + +Currently, only exact string matches are supported. + +Currently, the maximum number of predicates in a filter is 16. + +5.2 Setting filters +------------------- + +A filter for an individual event is set by writing a filter expression +to the 'filter' file for the given event. + +For example: + +# cd /debug/tracing/events/sched/sched_wakeup +# echo "common_preempt_count > 4" > filter + +A slightly more involved example: + +# cd /debug/tracing/events/sched/sched_signal_send +# echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter + +If there is an error in the expression, you'll get an 'Invalid +argument' error when setting it, and the erroneous string along with +an error message can be seen by looking at the filter e.g.: + +# cd /debug/tracing/events/sched/sched_signal_send +# echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter +-bash: echo: write error: Invalid argument +# cat filter +((sig >= 10 && sig < 15) || dsig == 17) && comm != bash +^ +parse_error: Field not found + +Currently the caret ('^') for an error always appears at the beginning of +the filter string; the error message should still be useful though +even without more accurate position info. + +5.3 Clearing filters +-------------------- + +To clear the filter for an event, write a '0' to the event's filter +file. + +To clear the filters for all events in a subsystem, write a '0' to the +subsystem's filter file. + +5.3 Subsystem filters +--------------------- + +For convenience, filters for every event in a subsystem can be set or +cleared as a group by writing a filter expression into the filter file +at the root of the subsytem. Note however, that if a filter for any +event within the subsystem lacks a field specified in the subsystem +filter, or if the filter can't be applied for any other reason, the +filter for that event will retain its previous setting. This can +result in an unintended mixture of filters which could lead to +confusing (to the user who might think different filters are in +effect) trace output. Only filters that reference just the common +fields can be guaranteed to propagate successfully to all events. + +Here are a few subsystem filter examples that also illustrate the +above points: + +Clear the filters on all events in the sched subsytem: + +# cd /sys/kernel/debug/tracing/events/sched +# echo 0 > filter +# cat sched_switch/filter +none +# cat sched_wakeup/filter +none + +Set a filter using only common fields for all events in the sched +subsytem (all events end up with the same filter): + +# cd /sys/kernel/debug/tracing/events/sched +# echo common_pid == 0 > filter +# cat sched_switch/filter +common_pid == 0 +# cat sched_wakeup/filter +common_pid == 0 + +Attempt to set a filter using a non-common field for all events in the +sched subsytem (all events but those that have a prev_pid field retain +their old filters): + +# cd /sys/kernel/debug/tracing/events/sched +# echo prev_pid == 0 > filter +# cat sched_switch/filter +prev_pid == 0 +# cat sched_wakeup/filter +common_pid == 0 diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt new file mode 100644 index 00000000000..7003e10f10f --- /dev/null +++ b/Documentation/trace/ftrace-design.txt @@ -0,0 +1,233 @@ + function tracer guts + ==================== + +Introduction +------------ + +Here we will cover the architecture pieces that the common function tracing +code relies on for proper functioning. Things are broken down into increasing +complexity so that you can start simple and at least get basic functionality. + +Note that this focuses on architecture implementation details only. If you +want more explanation of a feature in terms of common code, review the common +ftrace.txt file. + + +Prerequisites +------------- + +Ftrace relies on these features being implemented: + STACKTRACE_SUPPORT - implement save_stack_trace() + TRACE_IRQFLAGS_SUPPORT - implement include/asm/irqflags.h + + +HAVE_FUNCTION_TRACER +-------------------- + +You will need to implement the mcount and the ftrace_stub functions. + +The exact mcount symbol name will depend on your toolchain. Some call it +"mcount", "_mcount", or even "__mcount". You can probably figure it out by +running something like: + $ echo 'main(){}' | gcc -x c -S -o - - -pg | grep mcount + call mcount +We'll make the assumption below that the symbol is "mcount" just to keep things +nice and simple in the examples. + +Keep in mind that the ABI that is in effect inside of the mcount function is +*highly* architecture/toolchain specific. We cannot help you in this regard, +sorry. Dig up some old documentation and/or find someone more familiar than +you to bang ideas off of. Typically, register usage (argument/scratch/etc...) +is a major issue at this point, especially in relation to the location of the +mcount call (before/after function prologue). You might also want to look at +how glibc has implemented the mcount function for your architecture. It might +be (semi-)relevant. + +The mcount function should check the function pointer ftrace_trace_function +to see if it is set to ftrace_stub. If it is, there is nothing for you to do, +so return immediately. If it isn't, then call that function in the same way +the mcount function normally calls __mcount_internal -- the first argument is +the "frompc" while the second argument is the "selfpc" (adjusted to remove the +size of the mcount call that is embedded in the function). + +For example, if the function foo() calls bar(), when the bar() function calls +mcount(), the arguments mcount() will pass to the tracer are: + "frompc" - the address bar() will use to return to foo() + "selfpc" - the address bar() (with _mcount() size adjustment) + +Also keep in mind that this mcount function will be called *a lot*, so +optimizing for the default case of no tracer will help the smooth running of +your system when tracing is disabled. So the start of the mcount function is +typically the bare min with checking things before returning. That also means +the code flow should usually kept linear (i.e. no branching in the nop case). +This is of course an optimization and not a hard requirement. + +Here is some pseudo code that should help (these functions should actually be +implemented in assembly): + +void ftrace_stub(void) +{ + return; +} + +void mcount(void) +{ + /* save any bare state needed in order to do initial checking */ + + extern void (*ftrace_trace_function)(unsigned long, unsigned long); + if (ftrace_trace_function != ftrace_stub) + goto do_trace; + + /* restore any bare state */ + + return; + +do_trace: + + /* save all state needed by the ABI (see paragraph above) */ + + unsigned long frompc = ...; + unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE; + ftrace_trace_function(frompc, selfpc); + + /* restore all state needed by the ABI */ +} + +Don't forget to export mcount for modules ! +extern void mcount(void); +EXPORT_SYMBOL(mcount); + + +HAVE_FUNCTION_TRACE_MCOUNT_TEST +------------------------------- + +This is an optional optimization for the normal case when tracing is turned off +in the system. If you do not enable this Kconfig option, the common ftrace +code will take care of doing the checking for you. + +To support this feature, you only need to check the function_trace_stop +variable in the mcount function. If it is non-zero, there is no tracing to be +done at all, so you can return. + +This additional pseudo code would simply be: +void mcount(void) +{ + /* save any bare state needed in order to do initial checking */ + ++ if (function_trace_stop) ++ return; + + extern void (*ftrace_trace_function)(unsigned long, unsigned long); + if (ftrace_trace_function != ftrace_stub) +... + + +HAVE_FUNCTION_GRAPH_TRACER +-------------------------- + +Deep breath ... time to do some real work. Here you will need to update the +mcount function to check ftrace graph function pointers, as well as implement +some functions to save (hijack) and restore the return address. + +The mcount function should check the function pointers ftrace_graph_return +(compare to ftrace_stub) and ftrace_graph_entry (compare to +ftrace_graph_entry_stub). If either of those are not set to the relevant stub +function, call the arch-specific function ftrace_graph_caller which in turn +calls the arch-specific function prepare_ftrace_return. Neither of these +function names are strictly required, but you should use them anyways to stay +consistent across the architecture ports -- easier to compare & contrast +things. + +The arguments to prepare_ftrace_return are slightly different than what are +passed to ftrace_trace_function. The second argument "selfpc" is the same, +but the first argument should be a pointer to the "frompc". Typically this is +located on the stack. This allows the function to hijack the return address +temporarily to have it point to the arch-specific function return_to_handler. +That function will simply call the common ftrace_return_to_handler function and +that will return the original return address with which, you can return to the +original call site. + +Here is the updated mcount pseudo code: +void mcount(void) +{ +... + if (ftrace_trace_function != ftrace_stub) + goto do_trace; + ++#ifdef CONFIG_FUNCTION_GRAPH_TRACER ++ extern void (*ftrace_graph_return)(...); ++ extern void (*ftrace_graph_entry)(...); ++ if (ftrace_graph_return != ftrace_stub || ++ ftrace_graph_entry != ftrace_graph_entry_stub) ++ ftrace_graph_caller(); ++#endif + + /* restore any bare state */ +... + +Here is the pseudo code for the new ftrace_graph_caller assembly function: +#ifdef CONFIG_FUNCTION_GRAPH_TRACER +void ftrace_graph_caller(void) +{ + /* save all state needed by the ABI */ + + unsigned long *frompc = &...; + unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE; + prepare_ftrace_return(frompc, selfpc); + + /* restore all state needed by the ABI */ +} +#endif + +For information on how to implement prepare_ftrace_return(), simply look at +the x86 version. The only architecture-specific piece in it is the setup of +the fault recovery table (the asm(...) code). The rest should be the same +across architectures. + +Here is the pseudo code for the new return_to_handler assembly function. Note +that the ABI that applies here is different from what applies to the mcount +code. Since you are returning from a function (after the epilogue), you might +be able to skimp on things saved/restored (usually just registers used to pass +return values). + +#ifdef CONFIG_FUNCTION_GRAPH_TRACER +void return_to_handler(void) +{ + /* save all state needed by the ABI (see paragraph above) */ + + void (*original_return_point)(void) = ftrace_return_to_handler(); + + /* restore all state needed by the ABI */ + + /* this is usually either a return or a jump */ + original_return_point(); +} +#endif + + +HAVE_FTRACE_NMI_ENTER +--------------------- + +If you can't trace NMI functions, then skip this option. + +<details to be filled> + + +HAVE_FTRACE_SYSCALLS +--------------------- + +<details to be filled> + + +HAVE_FTRACE_MCOUNT_RECORD +------------------------- + +See scripts/recordmcount.pl for more info. + +<details to be filled> + + +HAVE_DYNAMIC_FTRACE +--------------------- + +<details to be filled> diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index 355d0f1f8c5..1b6292bbdd6 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt @@ -26,6 +26,12 @@ disabled, and more (ftrace allows for tracer plugins, which means that the list of tracers can always grow). +Implementation Details +---------------------- + +See ftrace-design.txt for details for arch porters and such. + + The File System --------------- |