Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Thursday, January 07, 2016

Scripting: Raw I/O Macros, versus Application Level, versus System Level

The techwars visual basic versus applescript link ids not very good, but is an example.



My own blog entry http://blog.andy.glew.ca/2016/01/dynamic-keypad-user-interface-elements.html is a much larger example.

Related Blog Pages

The original page, http://blog.andy.glew.ca/2016/01/dynamic-keypad-user-interface-elements.html, is more of a shopping project - a page where I take notes, a sort of informal review.  Posted in the hope that others may find useful.

It soon became obvious that I was very hopeful about Quadro, a particular iPhone remote control facility, with multiple buttons, whose main claim to fame seems to be using AppleScript/OSA commands. I was hopeful this might make it more reliable than the many, many, macro facilities I have used in the past.  Perhaps; but Quadero still has reliability problems. Split into a separate page,
Quadro - I love it when it works, but it is often unreliable.

http://blog.andy.glew.ca/2016/01/scripting-raw-io-macros-versus.html is a somewhat generic discussion of the pros and cons of the various levels of abstraction found in such tools.  One of my big frustrations with Quadro, second to reliability, is that I cannot drive all user interface actions because Quadro does not seem to provide all raw keyboard and mouse events.

http://blog.andy.glew.ca/2016/01/notes-and-thoughts-about-dynamic-keypad.html for still more generic thought.

For the umpteenth time, realized that I should have written this in wiki rather than blog.   Googlesb blogspot/blogger blog really does not provide much support for pages that need to evolve by being split and otherwise refactored.  Let alone transclusion.


Macros, scripting, etc.



It is good to be able to automate repetitive or hard to follow tasks.  Workflows. This is often called scripting.

There are at least three different types of scripting:
  • Raw I/O scripting - often called macros
    • Sometimes limited to keyboard, keystroke, macros
    • Sometimes able to generate all user input events, such as mouse clicks, mouse motion, etc.
    • Examples
      • AutoHotkey on Windows
      • Karabiner on Mac - although mainly a keyboard remapping utility, it can generate sequences of actions.
      • EMACS keyboard macros (which can generate mouse clicks)
    • Essentially, maps user input events to raw-ish user input events passed on.
    • Often unidirectional - often only limited ability to see what the application(s) or system have done.
  • Application Level Scripting
    • Examples
      • EMACS elisp
      • Microsoft Visual Basic for Applications - in many MS apps, such as Word/Excel/Office/...
        • Adobe ExtendScript - a Javascript language - in many Adobe apps, like FrameMaker, Illustrator
    • Applications often have scripting languages embedded within them - of varying strengths
      • Sometimes simplistic and of limited ability
        • But sometimes a "real programming language" (elisp, VBA, ExtendScript).
      • Sometimes only a subset of app operations
        • Sometimes able to do ANYTHING the app can do
          • e.g eLisp, where pretty much everything in the app is written in eLisp, apart from low level C functionality
  • System Level Scripting
    • Example:
      • AppleScript
        • Comprehensive framework - assuming applications comply
        • Extendable - applications can export a command vocabulary.  Which is typically an interactive menu hierarchy
        • Requires a messaging / IPC event framework
      • UNIX shells (bash, etc.)
        • More limited abilities than AppleScript
        • Extensible - but only if apps actually provide functionality on command line.
        • Mainly oriented to UNIX like "command = process spawn/execute/exit"
          • But some apps go to the effort of creating special commands that can interact with a running app.  (E.g. emacs-server)
Which is better?   The usual religious wars.  Why do programmers like such useless religious wars, instead of fair and balanced inquiry and evaluation?
     IMHO no single "better" - they each have different abilities, pros and cons.  I wish that scripting like this could work better.

Application scripting and system scripting tends more to generate "command" actions.   They can map user created scripts to commands already provided - encapsulation.  And they often can connect user input events - keystrokes, mouse clicks, etc. - to these user created scripts.

Raw I/O scripting more often maps user entered input evens, like mouse clicks and keystrokes, to the same type of user input events that are actually presented to the application.

MacOS keyboard shortcuts are in-between and limited: they map user input events (a very constrained set of keystrokes) to commands.  Commands apparently as provided by AppleScript and OSA.
     Similarly, Cocoa keyboard defaults map a larger input set to Cocoa commands - which are neither full system commands nor application commands, but commands common to Cocoa input boxes.




User input events can be
  • keystrokes, w/wo modifiers
  • mouse clicks, w/wo modifiers
  • mouse/pointer movement
    • gestures
    • handwriting recognition
  • speech utterances


Mapping of actual user input events can be
  • simplistic - event X gets mapped to event or command C, wherever
  • context based - event X gets mapped to event Y or command C, but only in a certain window or application
  • sequential - a sequence of events X1, X2, X3 can be recognized, and mapped to event Y or command C
Ordering is a classic problem with mappings that involve sequences of user input events:
  • e.g. I used to regularly say something by voice to Dragon, then type a key
    • and have the key action occur before the voice command, that I assumed had been performed first.
  • it used to be common to see such ordering problems between mouse events and keyboard events, or between different keyboards (e.g. a regular keyboard and a separate numeric pad (I have a mouse that is also a numeric pad))
  • similarly, if different keystrokes get routed to different event queues.
Timestamping all user inputs would be a global solution to such ordering problems - assuming there is a way of providing consistent timestamps to multiple devices.

  • such problems are less common nowadays, although I think it is mainly due to faster CPUs, and a reduced repertoire of devices
  • I expect to see such problems crop up again if I use Quadro or keypad - i.e. if I use an I/O device that is across a network on a separate computer (smartphone), at the same time as I am actually using the PC keyboard and mouse.
Similarly ordering issues happen with commands:
  • Send a command to application #1, whether by raw IO macro or by a command interface
  • Then send another command, that assumes the the first command s completed.
Oftentimes "send a command" is asynchronous - the application command is sent, but there is no standard way to wait for completion.
     Scripting often inserts arbitrary delays - "delay 1 second" to give the first command a chance to complete.
     Obvious problems: (1) can produce really slow scripting solutions because of unnecessary delays, and (2) can break, if the delays are insufficient.
     E.g. I have the Parallels virtual machine installed on my MacBook, and can script using Windows apps.  If the VM is up, the commands run fairly quickly; but if the VM is not up, then a delay of minutes (yes! :-() may be required.

Needed: scripting systems with the ability to see if a command has been completed.

Expect was a great step forward in automating such scripts for programs that use a text/terminal interface.   But AFAIK there is no common standard for GUI programs.  

AppleScript's approach is to assume that scripting is not GUO based.  Often true, but loses much of the advantage of scripting.  (E.g. I want to use Quadro to create an alternative UI, more friendly to physical (dis)abilities.)
    TBD: link to the AppleScript doc that says "Don't script interactive".

Opportunity: Standardization of "I'm done now".  Rather like a SYNC or MFENCE instruction, except for scripting - whether for Raw I/O, or command driven. Problem: requires a standard channel back from app to whom commands are being sent, to controlling apps.

Scripting is often blind, unidirectional: send a command, whether raw I/O or command.  Or a sequence of commands.  Assume that they all work.
        Problem: if one command in a sequence des not work, then they remaining commands may do something completely unexpected.

Opportunity: Transactional Command Sequences: give scripting user the ability to say "either do all of these commands, or none of them".  Or, at least, to stop at the first error, and inform the user.

I have said many times to the Transactional Memory people, like Ravi Rajwar and Jim Larus, that the true value of transactions may not so much be in improving performance by parallelism, but in making better error handling possible.
    I have often suggested that filesystem level transactions be exposed to scripting languages like Perl.

Application level scripting often has powerful error handling.
     AppleScript system level scripting has error handling - "try".   But experience shows that many script programmers do not properly use it, and/or the applications that interface to them export commands, but not error handling.
      Raw I/O macro scripting often has almost no error handling ability.

Exceptions (events) are the best error handling system.  Not return codes / I/O that must be greped for.

Command level scripting systems are often associating with ADA (American Disabilities' Act) mandated support for disabled users - blind, etc.
     With the consequence that such "Accessibility" features are often used by speech recognition or handwriting recognition software.


Security:   the security model for scripting is poor. Opportunity for improvement.
      Demand for third party keyboards on cellphones drive some innovations in security:   e.g. OS enforcement that some third party keyboards can be used for most things, but not for passwords.
     Similar, MacOS allows accessibility controls on an app by app basis.
     Any scripting system that involves interaction is really a huge security vulnerability.  I want such improved input systems, but need a security model.

But really need finer grain control.

At least on an app by app basis - e.g if I am using a Quadro style keypad, I want to be able to restrict the commands and I/O events it generates to only certain apps (in my case, email), not others.
     And within such apps, to only certain commands (Archive, Refile) but not others (NOT scan all email looking for contacts or password reset email).

I may have a one track mind with security, but I think this amounts to fine grain capability based security.  And, as I have said elsewhere, I think that capabilities need to almost be at the level of individual methods and objects in the target app. By default, so that programmer doesn't need to work too hard to create.  But then need ways of bundling such capabilities together.

But the requirements of scripting I think have a further corollary: such capabilities need to be attachable to the commands and raw I/O input events generated by the scripting facility. Rather like taints/poison/DIFT.

TBD: more thoughts.


Above, I started by talking about (1) Raw I/O, aka macros, versus (2) Application, versus (3) System.

An intersecting but orthogonal distinction is: what the output looks like:

  • Raw User Input Events
    • once again, keyboard events, mouse events, etc.
  • Menu Commands
  • Fully Capable Commands
Emacs elisp is my category example of fully capable commands: its a full language.

AppleScript / OSA is capable of being a full command language.  It has most of the features of a real programming language, albeit with strange syntax.

But, de-facto, many applications that provide an OSA interface to AppleScript and friends really are just providing a traversal of the menus that they provide on the user interface.   This is facilitated because some programming systems support doing so automatically. I.e. it is easy to code.

Moreover, you don't have to think that much about security and consistency, because you know that, since these commands are already made available to the user, they will maintain whatever datastructure consistency constraints you wish to maintain.  Whereas exporting low level commands, as is done with Emacs Elisp, and to a lesser extent with VBA, might allow a script (or an alternate user interface) to violate such constraints.

The problem with macro scripting emitting Raw User Input events, keyboard shortcuts and mouse clicks, especially those that navigate a WIMPY interfacer such as a menu tree is that MENUS CHANGE.   So, if you have to emit three down arrow keypresses to get to the menu item that you want to select, you will be out of luck if the next version of the software changes the menu.   Or if the current version of the software allows menu items to be greyed out and skipped over.

Menu interfaces that allow menu items to be selected by typing the first few letters of their name are better, slightly more robust - but, again, menus change.

Menu interfaces that allow menu items to be selected by typing the full menu item name are more robust - but, again, menus change.  A common change is a "split", two menu items corresponding to an original, or a submenu where there was originally only only menu item.

The problem with scripting by navigating any menu hierarchy, whether by raw events of by full menu item names, is that MENUS CHANGE.  The hierarchy can change.   Command may be moved completely elsewhere.  A menu of a small number of settings may be changed into a dialog box, and vice versa.

It seems that some scripting systems have "global menu item names" or "menu commands".  For example, Mac OS-X keyboard shortcuts associate a "Menu Title" (Enter the exact name of the menu command you want to add) with a keyboard shortcut (keystroke plus modifier events).   But, again, the problem with this is that MENUS CHANGE.   It is otherwise reasonable to have two menu items, in different parts of the menu tree, that display the same text in the menu, but which map to different commands - but this Mac OS-X keyboard shortcut facility prevents that.

Well, if making the menu title that is displayed to the user unique is a bad way of identifying the macro-scriptable command, and if making the path through the menu hierarchy is similarly a bad idea (because menu paths change), what then?
     This is what I mean by a Full Commands.   
     Like in Emacs, where there is a distinction between command executed, and menu item displayed that invokes the command.  Ditto keyboard and mouse bindings. Speech, etc.
     If desired, one can imagine flyovers or queries on a menu system that would display what the current binding is, providing discoverability.

Oftentimes, there is a further interface layer:   the actual commands bound to menu items, keystrokes, mouse events, speech and gesture commands, etc., may have input medium specific interface code, that does stuff like extracting a keyboard prefix argument, or which mouse key.  Or, prompting for further input.  And which then invokes an actual, scriptable, command.

Finally, we must of course distinguish between the Full Commands provided to a scripting interface, and internal commands and functions.  For the purposes of maintaining consistency in internal datastructures.

But, as usual: although it is important to be able to create such distinctions, it is also important to be able to provide nice defaults. 
  • e.g. automatically create "Global Menu Commands" from menu item titles
    • warning when no (no longer) globally unique
  • e.g. to permute full menu path navigation using menu item names
    • again, warning if menu structure changes so that a path is no longer accurate
  • e.g. to provide raw input event navigation to scripting
    • warning...
    • and otherwise provide facilities to handle errors
  • In addition to a Fully Capable scripting interface
    • derived from the internal function names, if you are okay by that.
IDEA:  just like Emacs has an (interactive) declaration in defun functions, perhaps there should also be a (scriptable) declaration,.

THINK ABOUTL: Elisp (interactive) really is just a shorthand, an easy way to create an interface function.
















No comments: